HQP and NOS DACs

Thanks @jussi_laako for your explanations. This thread has been one of the most informative threads that I have read and has really clarified a lot about upsampling for me. That being said, I do have a couple questions:

  1. I am a bit confused about your math, and your reference to the 384kHz Nyquist frequency. When upsampling to 384kHz, wouldn’t the Nyquist frequency be 192kHz? It seems like it should either be 384 - 44.1 or 192 - 22.05?
  2. You said in an earlier post that at 44.1kHz with a full frequency sweep up to the Nyquist frequency of 22.05kHz, you would get images every multiple of 44.1kHz (which makes sense to me) but that they would be adjacent due to being a full frequency sweep. But since the sweep is only half of the frequency by definition, why are the images adjacent?
  3. If I’m following, this seems to suggest using gentler filters with lower frequency (e.g. Redbook) material than with higher frequency material since more of the frequency range in upsampled lower-frequency material is “blank” compared to what could be present in say 192kHz source material. Or is this line of thinking incorrect based on the fact that any energy in the 22.05-96kHz range of a 192kHz track would be inaudible anyway (or simply non-existent in the first place)? I’ve seen frequency plots of records that suggest that they may contains frequencies up to over 100kHz? Would roll-off of this higher-frequency information be a concern if using a gentler filter with higher frequency files?
  4. I’m guessing this is the reason for having filters in HQPlayer that are designed for upsampling to very high frequencies, e.g. DSD512, that they can have a gentler roll-off because you can assume there was no material in the upper range of the original file?

Ahh, yes sorry, mistakes happen when I just write in a bit hurry and don’t proof-read. The mirroring happens around sampling rate as you can see from the plots… The mirrored bandwidth is the original (unfiltered) bandwidth.

Because of negative frequencies. So with 44.1 kHz sampling rate you get images from 44.1 + 0 to 44.1 + 22.05 (non-inverse spectrum) and below sampling rate you get images from 44.1 - 0 to 44.1 - 22.05 (inverse spectrum). And then the same repeats at every multiple of 44.1 kHz. With 384 kHz sampling rate the images of properly filtered 44.1 kHz source content are 384 + 0 to 384 + 22.05, and 384 - 0 to 384 - 22.05.

So for example 1 kHz tone appears at 44.1 + 1 = 45.1 kHz and at 44.1 - 1 = 43.1 kHz. Or at 384 kHz sampling rate at 384 + 1 = 385 and 384 - 1 = 383. This also means that at each multiple of the output sampling rate, for 1 kHz tone, there is potential for audio band intermodulation due to the 2 kHz difference tone of these two mirrored frequencies (plus it’s harmonic distortion products).

Less concern, you’d still want to retain the high frequency content in order to keep the “hires” portion of the content, such as transient properties that are precisely described by those higher frequency components. With typical good quality recording, for 88.2/96 sampling rates you would still want to have fairly steep filter because recordings can fairly easily utilize the entire bandwidth. For 176.4/192 sampling rates you can typically already begin to have gentler roll-off, unless the microphone used is really high bandwidth (such as the Sanken one) and the instruments are such that they actually produce high frequencies. Sampling rates like 352.8/384 generally allow gentle roll-off.

Note that here “gentle” is in terms of digital filters, in analog filter terms, those are still extremely steep ones, like 96 dB/octave or such. :slight_smile:

Filters need to be defined by source content rate, not by the target rate. So when you start from RedBook content, the filter really cannot be gentle. (I give such options too, but don’t recommend use of such)

Am I right to say, in a DAC chip, there’s different type of digital filters (steep to gentle roll-off) catered for different sample frequencies? For example, steep digital filter for Redbook and gentle roll-off digital filter for hi-res?

Thanks for the responses. This helps a lot. I’m still confused about your last point though — I thought that was the whole premise behind upsampling; that once you upsampled Redbook to e.g., 384kHz, you would be able to have a much gentler filter that would still remove all of the high frequency noise and therefore cause less ringing than at 44.1kHz, which would have to be extremely steep in order to not not excessively roll-off around 20kHz, and due to the steepness, cause a lot of ringing?

Yes, that is quite typical, but depends on a DAC chip (not the case for example with ESS Sabre). Although mostly just because they don’t even have DSP resources to run steeper filters at higher rates. When they cut number of taps to half at every doubling of sampling rate, they keep the processing load constant.

1 Like

No, the premise is that it becomes much more feasible to design the analog reconstruction filter that needs to follow the D/A conversion process in a way that it can actually remove the images and thus reconstruct the original wave. So digital filter moves the images higher up in the frequency so the distance between audio band and the first image band grows. Then it is easier to design an analog filter that does rest of the image removal job.

The filter needs to be steep so that the image frequencies don’t leak at full power right above 22.05 kHz. Filter always need to be there, but upsampling (digital filter) is there to make the life of analog filter easier or even practically realizable in first place.

So here are some examples of the same 0 - 22.05 kHz sweep, but in digital domain with 4x rate conversion to 176.4 kHz.

First, no digital filter, just zero-order-hold which is same as what NOS DAC is doing:

Then with very slow roll-off digital filter:

Then fast roll-off digital filter:

You can see the last one removes those image frequencies between 22.05 kHz and 88.2 kHz completely. When played through DAC, you would still have the images similar to what was shown before around every multiple of 176.4 kHz sampling rate.

1 Like

Thanks! I think I understand now. I think I was conflating what happens as two separate steps: you need a steep filter for Redbook to keep an image from appearing right above the original signal, and then you separately remove the images using a reconstruction filter after the signal is converted to analog? The key part I was missing was your first paragraph.

If I was reading what you were saying earlier about (I think) the metrum musette was that its analog reconstruction filter was designed so that it would be able to completely remove the images if you upsampled Redbook to 3.2MHz. However, as the Musette doesn’t upsample to that rates, it leaves behind some of the images, and doesn’t remove them entirely as in your latest example. I’m assuming there is some technical limitation for why the reconstruction filter was designed that way?

I think based on my (hopefully corrected) understanding that the question that I really wanted to answer was the one that Guy asked and you answered. It is interesting that DAC designers (or at least ESS) don’t do this. So it is interesting that you can address half of the problem by upsampling in software before sending the signal to the DAC. But if I understand correctly, you are at the mercy of the DAC designer for the reconstruction filter; so I guess that is an important aspect in selecting a DAC? I’m not sure how to evaluate that though — I assume you determined the characteristics of the Musette via measurement?

Yes, exactly…

I can only guess the reasons for design decisions. But I think they wanted to keep the analog filter simple passive low order filter. Due to challenges with settling time you really cannot run high-bit R-2R DAC at rates like 3.2 MHz in a sensible way, it would lose lot of the precision because for each sample value it wouldn’t be able to accurately reach the correct level fast enough. More there are bits, more precisely the output voltage also needs to settle because the voltage steps are smaller.

That is one reason we these days have a lot of delta-sigma DACs, because they have only few, or just one bit (DSD) and thus can run at very high rates.

But still most DAC chips have internal digital filters only up to 352.8/384 kHz rate and as a result many have some amount of images remaining in the output at multiples of that rate (and various other restrictions too). With computers you can work around those restrictions…

For the analog filter, what ever there is, is given and cannot be changed without modifying the electronics. So software can only try to help it. Sort of simplified, using highest available rate leaves the least amount of work for the analog filter.

Yes, I always measure all the DACs I have or otherwise can access. Plus there are other sources of information (datasheets, etc).

4 Likes

Wow! Jussi, That’s very in depth experience you have in the field of digital audio processing! Thanks for details explanation and the graphs! Looks like the ‘image’ problem is likely to be there if the over-sampling from Redbook is not high enough < or equal to 384k.

What happens if the input sampling is high enough to begin with? Say 192k or 384k, will there be images after doing an over-sampling and digital filter?

How about converting 44.1k to DSD will this solve the image problem? Thanks again for valuable inputs!

Yes, images always happen around multiples of final (output) sampling rate. Only difference with hires is that the images are already further away from audio band at the beginning.

If you properly upsample for example to DSD256, then images theoretically repeat at every multiple of 11.3 MHz sampling rate. But since DSD has high frequency noise increasing as function of frequency, the analog reconstruction filter needs to start rolling off earlier, for example at 150 kHz, and because there’s a huge amount of space between 150 kHz and 11.3 MHz, even relatively gentle analog filter is able to cut out everything way before the first image frequencies around 11.3 MHz. So with proper upsampling to DSD the image problem doesn’t exist.

Note however, that since upsampling to such rates requires fairly large amount of processing power, many delta-sigma DACs don’t manage to properly upsample that high, only to 352.8/384k (8x) or 705.6/768k (16x) rates and then go on by just copying same sample multiple times. Meaning that images still exist at multiples of those rates. Doing things properly is not a problem for modern computers though.

1 Like

Thanks Jussi, converting PCM to DSD seemed to sound a lot better (more relaxed and less fatigue sounding) when listening to raw PCM by itself.

One last question, whenever I switch from PCM to DSD, there’s always a level mismatch, the DSD is always sounded softer than PCM. What is the underlying mechanism that causes this phenomenon? So whenever I switch from PCM to DSD, instinctively I will go and adjust the volume control to match the level, it is right thing to do?

I guess that the only way to determine if this is happening is to look at measurements?

There are various reasons for this and it is completely normal and expected… Primary reason is that the reference levels are different. For PCM, 0 dBFS is defined to be maximum representable sample value, thus it cannot be exceeded. For DSD, 0 dBFS is defined to be 50% modulation depth, and maximum allowed short-term level is +3.15 dBFS. That’s why DSD-to-PCM converters tend to have an option to apply 6 dB gain, to make 0 dBFS levels match. However, if the DSD content in such case exceeds the 0 dBFS DSD level, the PCM output will clip.

Some DAC chips like ESS Sabre try to match the levels, meaning that if the DAC has for example nominal 2.5 V RMS output level at 0 dBFS PCM/DSD, it would need to go to 3.6 V RMS for the maximum allowed DSD peaks which may cause problems in the DAC chip and analog stages hitting voltage rails and causing clipping.

Some other DAC chips just play the DSD content at about -3 dBFS relative to maximum PCM level to avoid clipping. For example for latest AKM chips in Direct DSD mode the output level is -3.5 dBFS lower. While with TI/BB DAC chips the output level difference depends on the selected conversion section analog filter (one of four), each having different output level due to different conversion stage configuration.

Yes, in case you are using analog volume control following the DAC. If you are using digital volume control before or inside the DAC, instead turn down the PCM volume.

Many times it is apparent from the DAC chip data sheets, but measurements will also tell in the end…

Thanks Jussi, this is really good information, it looks like DSD behaves more like an ‘analogue recording/playback’ system where one may able to push the recording level beyond the 0dBFS (PCM) without suffering from clipping effect of PCM.

So in another word, DSD has more headroom level to play with which translates to improved dynamic range compared to PCM!

Not per my math.

If 0 dBFS for PCM equals -6 dBFS for DSD, then +3.15 dBFS for DSD equals -2.85 dBFS for PCM.

Regardless, DSD dynamic range, unlike that of non noise shaped PCM, is frequency dependent. So, DSD dynamic range is not a constant.

AJ

Not sure I’m following you. If 0 dB is the maximum limit before PCM start to clip, then DSD can go further by +3.15dB. DSD does not measure in dBFS like PCM but by percentage of modulation. The higher the modulation the higher the output voltage. The reason why DSD need to be attenuated by a few dB down is to ensure when DSD is converted to PCM it does not pass the 0dBFS defined by PCM otherwise clipping will occur.

DSD to PCM can happens in a DAC chip when internal volume control is used or in software when user don’t have a compatible DSD DAC.

All of that depends upon the voltage level(s) set as 0 dBFS – or 50 percent modulation depth, if you prefer, for DSD. If 0 dBFS for both is pegged to the de facto CD standard of 2 V, for example, then DSD has 3.15 dB greater momentary headroom. But if 0 dBFS for DSD is set to 1 V because it is 50 percent modulation depth and 1 V is 50 percent of 2 V, then DSD has less headroom than PCM.

Regardless, both of us really are coming at this from the wrong angle. Digital dynamic range is built from the top down, not the bottom up.

However, to continue the discussion, try this exercise. A SDM can be fed a continuous (analog) or discrete (PCM) signal. Per the previously mentioned 0 dBFS disparity between PCM and DSD as well as the +3.15 dBFS brief transient headroom for DSD, does a PCM signal potentially need to be attenuated before entering the SDM?

AJ

Usually DAC’s match it such way, that roughly 0 dBFS PCM and +3 dBFS DSD have equal voltage level, meaning that generally DSD sounds a bit quieter. Except some which seem to have 0dBFS PCM matching 0 dBFS DSD and as result both sound roughly equally loud, but will either clip or put out extra voltage when DSD source exceeds 0 dBFS DSD level. But in any case there’s no point in driving DSD over 0 dBFS level.

Well, generally if you put 0 dBFS PCM into SDM yout get 0 dBFS DSD out. So that alone is not a problem. But the answer is still yes…

Because, higher the oversampling ratio, more you get inter-sample overs from the digital filter in case the data has been normalized to reach 0 dBFS at source sampling rate, which is typical thing to have with modern PCM material. There are even more overs from the digital filters when source data contains digital clipping, which is also very typical for modern PCM material.

For this reason I recommend using -3 dBFS volume setting. Thanks to floating point, the DSP pipeline in HQPlayer doesn’t have practically any maximum sample value, so clipping internally is not an issue because it cannot happen. But for output it would need to trigger the built-in soft-knee safety limiter (indicated by the “Limited” counter in main window) which is something you don’t want to happen…

Some DACs handle this gracefully by doing same thing internally. Wolfson chips have configuration option for -2 dBFS input volume attenuation to avoid clipping with inter-sample overs. And for example newest Benchmark DAC3 also touts about the same (-3.5 dB attenuation for headroom):
“Internal digital processing and conversion is 32-bits, and this processing includes 3.5 dB of headroom above 0 dBFS. This headroom prevents the DSP overloads that commonly occur in other D/A converters.”
This is just not visible outside.

So overall, this applies to any oversampled DAC, PCM or SDM. Because the problem is actually in the PCM source material…

I came across when PCM does some DSP functions for example like EQ etc, the PCM is attenuated by -3dB or more before going to the DSP, so the end results output to the DAC will never exceed the PCM 0dBFS.

@jussi_laako, I saw a couple of DAC chips, like CS4398 and AK4490, when DSD signal is input there a path called ‘Direct DSD’ that goes directly to the switch cap DAC output. There’s also another path that goes to the SDM then to the switch caps DAC output. Both are DSD signal. What is the difference?