Yggdrasil v DSD

Blaine_Arnold · March 18, 2016, 10:32pm

And there’s the answer, up-sampling creating artefacts .

Blaine_Arnold · March 18, 2016, 10:34pm

up-sampling is not lossless, it’s not bit for bit in relation to NOS.

jussi_laako · March 18, 2016, 10:50pm

Ehh, like what? I would say it is by definition lossless if you can upsample and then downsample and get the exact same values where you started from? Can you explain where is the loss in this?

And yes, I have number of upsampling algorithms that can produce bit for bit for all the original samples, so for example for going 8x up, every 8th sample is bit-for-bit same as the original, just with 7 new samples inserted between that are carefully calculated to match the waveform being reproduced. You get back to the original data by dropping the 7 new samples out.

But one aspect you are completely ignoring is that ringing primarily originates from the source material… Because all even remotely recent RedBook content is down-sampled from higher rate, either due to oversampled ADC (all converters in past 20 years or so) or due to down-sampling during mastering stage (mixed for example at 96/24). And to fix that I use apodizing upsampling filters that by definition are not bit-perfect, because then they would reproduce the source signal’s ringing as-is.

jussi_laako · March 18, 2016, 11:01pm

Ringing is function of band-limiting, and band-limiting is fundamental requirement of any discrete sampling system. However, if your input signal doesn’t end up hitting the band limitation you don’t get any ringing either. So ringing is primarily problem of RedBook content.

With RedBook, if you make time domain perfect, you have frequency domain 100% crap. If you make frequency domain perfect, you have time domain 100% crap. This is because time and frequency are mathematically related by 1/x relationship (where x can be either time or frequency). Pick your poison… I personally prefer to be somewhere close to 50/50, but provide means for either alternative for those who prefer either extreme.

Blaine_Arnold · March 18, 2016, 11:03pm

jussi_laako:

Ehh, like what? I would say it is by definition lossless if you can upsample and then downsample and get the exact same values where you started from? Can you explain where is the loss in this?

And yes, I have number of upsampling algorithms that can produce bit for bit for all the original samples, so for example for going 8x up, every 8th sample is bit-for-bit same as the original, just with 7 new samples inserted between that are carefully calculated to match the waveform being reproduced. You get back to the original data by dropping the 7 new samples out.

But one aspect you are completely ignoring is that ringing primarily originates from the source material… Because all even remotely recent RedBook content is down-sampled from higher rate, either due to oversampled ADC (all converters in past 20 years or so) or due to down-sampling during mastering stage (mixed for example at 96/24).

If samples are added/subtracted and ‘carefully calculated to match waveform’ to the original data, the process is no longer direct, meaning, in caparison to NOS, is no longer lossless, the digital data has been manipulated.

jussi_laako · March 18, 2016, 11:06pm

Added data is not same as manipulated data. If you have let’s say numbers 1 and 2. You can create a new sequence of 1, 1.5 and 2 (arithmetic mean) or 1, 1.4142 and 2 (geometric mean) and you still have the same values 1 and 2, but you have also the middle value. You get back to the original by dropping that 1.5 out.

So from above, you think the NOS representation of 19 kHz sine wave is more correct than the upsampled one? Although the upsampled one has several orders of magnitude lower distortion?

Blaine_Arnold · March 18, 2016, 11:08pm

You are playing with words here. The fact there is a filter, it creates a post and/or pre ring and phase shift depending on the filter. It has been measured. That is why NOS exists to adress these specific problems.

jussi_laako · March 18, 2016, 11:11pm

And the lack of proper reconstruction filtering just completely violates the Nyquist-Shannon sampling theorem.

(see the paragraph on aliasing)

Blaine_Arnold · March 18, 2016, 11:13pm

I understand what you are saying, but the fact is it’s no longer the direct 44.1 wav being created, it has been manipulated, extra samples added, filtering etc. I see no point in going further in this conversation here. we’ve both express our views on this and now just butting heads.

jussi_laako · March 18, 2016, 11:37pm

Analog waveform has infinite number of “samples” between the ones you have in the file. And you need to have those one way or the other. NOS way is to draw straight horizontal lines between the samples. Which is extremely rude. Real world is not built of Lego-blocks. Real world is not Minecraft. NOS is trying to depict a Minecraft world.

The math says, that for correct reconstruction, the 44.1k 16-bit PCM (defined to have 20 kHz usable bandwidht) needs to be filtered such way that at 20 kHz you have 0 dB attenuation and by 24.1 kHz the attenuation has reached 96 dB. This is so steep filter that it can be realized reasonably only in digital domain. The 1st order filter used in Musette is as far from this as it can be. As a result, the 19 kHz sine waveform doesn’t resemble a 19 kHz waveform at all (see above). The extra problem that grows from this, that can be seen from the DPO scope output is that stability of the NOS signal is especially compromised. In music you never have a transient that would align precisely a single sample period. At this point the whole NOS idea becomes moot.

Now above is still the easy part. Sure you can make such filter in digital domain. Now the challenge comes when you also want to have no pre-ringing and no phase-shift at 20 kHz simultaneously with all the above. Linear phase gives you no phase-shift at all, but that means you have pre-ringing. This is where the real fun begins.

There is one easy solution that solves all the problems at once, for any recording microphone you can buy. That is called DSD256. If you do it from ADC to DAC, you get something truly pristine. Have you tried it?

hmartin · March 19, 2016, 9:04am

Hi Jussi,

Don’t understand what you are saying, what is your defintion of time/frequency errors (i.e. crap )?

What I mean is: Yes there is a 1/x relation between the system bandwidth and the time resolution. This is from the transform between the frequency and time domain reperesentations. But this does not mean that there is a fundamental “trade-off” between time/frequency domain in the reconstruction. As we know using a sinc-function we perfectly reconstruct the original signal if it was bandwidth-limited with respect to the sampling rate. Hence 0 errors! This does not change just because you transform from time to frequencey (or back)?

Now there seems to be a common missconseption that the “shape” of the impulse responce gives some understanding of the “time-errors”. For example, this is why some think a NOS has small time domain errors as “test-impulse-responce” is aligned with the sample rate as you point out. While the only bandwidth limited signal that can give a output sequence …00000100000… is in fact the sinc-function, hence if the output is not a sinc you have errors! So if there is no pre-ringing there is an “error” prior to the “1”.

Edit: Sorry Andybob for this “off-topic” discussion … hope you don’t mind

jussi_laako · March 21, 2016, 7:24pm

No, that is precisely why there are errors. The error is bandwidth limitation itself, the error behavior depends on properties of pass-band, transition-band and stop-band. Since time and frequency are related by the 1/x rule, the bandwidth limitation produces corresponding time-domain error. Only 0 error case is to retain full bandwidth of the source signal. This error is primarily sourced from the decimation filters in the production chain. To put it another way, transfer function of the analog source is modified.

So NOS and “bit-perfect filters” don’t help fixing it, because the error is baked into the PCM data already when the music was produced. And NOS combined with lack of proper analog filter means that there’s no reconstruction at all…

Apodizing upsampling filters can be used to replace impulse response of the source decimation filter with another one. This allows reducing the disturbing aspects of the source data.

PCM at 192 kHz sampling rate already covers many cases without having to band-limit the source signal, and 352.8/384k rate covers bandwidth of even the best microphones without having the band-limit the source.

Since Nyquist-frequency of even DSD64 is 1.4 MHz and audio band aliasing would begin at 2800350 Hz, there’s no need for steep filters to limit signal bandwidth. 1st order low-pass at the ADC input is enough which would keep minimal time/frequency error at minimum

Be it PCM or DSD, proper analog reconstruction filter is always needed to turn discrete samples into continuous function that is the analog signal, regardless of the sampling technique used.

From DAC perspective biggest difference between PCM and DSD is to do with practical implementation challenges of DAC electronics.

hmartin · March 22, 2016, 6:37pm

There are two things here that we should not mix together. First we have a bandwidth limitation that removes high frequencies from the source, for example, needed when going from 384kHz to 48 kHz. This will limit the content of the source file and this information we cannot get back. But we can reconstruct perfectly the bandwidth limited part encoded into the 48 kHz file, i.e. 0 errors with respect to what is encoded (but not that part that has irreversibly been removed with a “brick-wall filter”).

Well that is exactly my point. All filters that replace the correct source impulse response (e.g. given by the sinc-function in the case that the source file is perfect) introduce errors to make the sound more pleasing. One way of doing it (or not doing it depending on who you ask) is by using a NOS DAC, e.g. the most simple reconstruction filter possible. Now a NOS introduce large errors that from a technical point are very bad.

But on the other hand a minimum-phase filter (i.e. another type of filter without pre-ringing) introduce large time errors or “time-smear” by introducing frequency dependent group-delay. Hence a instrument with a large bandwidth is smeared in time because different frequency components has different delay due to that the correct filter has been replaced by a minimum phase filter.

jussi_laako · March 22, 2016, 9:54pm

Wrong… Given otherwise same filter, both cause exactly same amount of time smear, only difference is whether it is equally on both sides of the event, or only after the event. Time smear is function of the filter length (plus some other properties).

With linear phase filter you have problem with transients because as a result of band-limiting the step response begins to ring already before the event has taken place in time, which would be impossible without introducing the constant time delay. With RedBook this pre-event ringing typically begins about 500 µs before the event (and thus 500 µs delay is involved). Such generally doesn’t happen with analog electronics or direct acoustic sound. I have demonstrated this problem in practice by recording transient sounds with different types of decimation filters.

Analog and acoustic roll-off are typically minimum-phase.

Musical tones have three states, attack, steady state and decay. Hearing separates these three things and each play it’s own vital role. Time smear caused by pre-ringing is much more audible because there is no forward-masking with the signal itself or hearing before the attack. While ringing happening after the attack is masked by steady state and natural decay of the transient. This is especially big problem for digital cross-over filters and digital room correction filters that have transitions in the middle of audio band. Thus there have been numerous algorithms developed to reduce pre-ringing especially in these use cases. Some applications always create minimum-phase correction filters (like RoomEq Wizard).

Due to differences of the properties between linear- and minimum-phase filters, I use (and recommend) use of linear-phase filter for recordings that don’t contain strong transients and have been recorded in natural acoustics, and minimum-phase filters for recordings that contain strong transients and have been multi-track recorded and mixed in a studio. So in short linear-phase filters for classical and minimum-phase filters for pop/rock.

My apodizing filters are available in both linear- and minimum-phase flavors, plus also one asymmetric variant for PCM output only where there is about 25% pre-ringing and 75% post-ringing.

If you want to compare filters clearly and in similar way I do as part of my development process, just listen to the pure filter impulse responses. First create ones for 22.05 kHz sampling rate thus transitioning at 11.025 kHz and then another set for RedBook. The first set is more obvious, but the second set gives the real view. When you loop it is just series of snaps. Linear phase filter sounds like shooting with a blow gun. So it’s like “hhhkhhh” vs “khhhhhh”.

Analog reconstruction filters in DACs are usually minimum-phase. I did some comparisons of the transient optimized analog filter design on my DSC1 vs two other typical DACs:
http://www.computeraudiophile.com/blogs/miska/squarewaves-dacs-645/

hmartin · March 23, 2016, 6:03pm

No, the bandwidth limitation gives the same amount of “time-smear” which is in the source file. The filter length impacts the filter performance; short filter bad performance; long filter good performance, infinite filter perfect performance (i.e. a sinc). What you can do is to add/trade errors in the reconstruction filter that maybe can transform or move the audible impact of the bandwidth limitation. Your recommendation to use a different filter if the music is heavy on transients is probably good advice. But in many cases (both minimum-phase filters or NOS) this process introduces errors in the pass-band that is probably a bigger source of audible differences. In case of minimum-phase group-delay differences and in the case NOS a lot of “crap” as your picture show clearly.

Sorry I don’t follow what is the point you try to make here? I agree that this test shows that bandwidth limiting a Dirac with infinite frequency content is audible, but when you filter away 100% of the original signal that is hardly suprising? I see no reason why a filter should sound like “k” (NOS) or “khhhhhh” (minimum-phase) instead of “hhhkhhh” (linear-phase), unless you listen to diracs in which case NOS would be the clear winner?

To me we could summerize the whole discussion as follows:
-Bandwidth limiting is bad, hence we should if possible use high sample-rates i.e. >120 kHz 18 bits or more (or DSD equivalent).
-There is only one “correct” reconstruction filter (which one depends on the ADC process, sinc in case of “perfect sampling”), but many that sound good to atleast to some people.
-Everyone should pick whatever filter they like, personaly I am 100% linear phase, but I also can hear why people like minimum phase and NOS.

jussi_laako · March 23, 2016, 6:46pm

Except with apodizing filters (linear- or minimum-phase) which allows reducing this time smear property. And in this case choice between linear- or minimum-phase makes difference. And for non-apodizing filters versions, I don’t offer minimum-phase variant because it wouldn’t make sense.

So we have some possibilities to fix the brokenness in the source file.

If it just were so simple…

You would probably like my closed-form filter.

That plot I posted looks exactly the same regardless if the filter is linear- or minimum-phase. In fact I don’t even remember whether I used linear- or minimum-phase filter for upsampling that one.

It demonstrates the the problem of brickwall-bandlimiting. When used for decimation side your perfect “NOS” would have full aliasing. Problem with NOS is that the spectrum is not limited to Nyquist band, unlike for both linear- and minimum-phase filter case. So you need to perfect time- and frequency domain behavior simultaneously. That is the hard part. Making either time- or frequency domain perfect alone is too easy.

So if your original content is “hhhkhhh”, NOS DAC is not going to make it “k”. But apodizing linear-phase filter may make it “hhkhh” or minimum-phase “khhhh” instead.

This problem is same as with Fourier transform, longer you make the transform, more frequency resolution you have, but less time resolution. And vice versa. But we need both to be perfect from audibility point of view, simultaneously.

For most of new RedBook material, it is not about ADC process alone. Because new productions are commonly recorded and mixed at least 96/24, and then converted to 44.1/16 for CD/MP3/FLAC (Tidal). Either case, the problem is in decimation filtering, regardless if it appears in ADC (worse quality) or in mastering stage (better quality). Luckily we can partially fix it later with apodizing upsampling filters. By the way, many recording ADCs have minimum-phase digital decimation filters.

Another problem is poor digital filtering inside DAC chips. It is not uncommon to find 8x digital filter having only -52 dB stop-band attenuation or ±0.02 dB pass-band ripple, followed by S/H . (rather I use 512x one with -192 dB stop-band attenuation and ±0.000000001 dB pass-band ripple)

When it comes to analog reconstruction filter after the D/A process, you won’t find “sinc” filters there…

I agree and that’s why I also offer stuff that I don’t like myself… People are sensitive to different aspects of the sound.

But NOS doesn’t fix the ringing in source content at DAC side, unlike apodizing filters can do.

hmartin · March 23, 2016, 9:24pm

Actually the poly-sinc is my favorite.

I was referering to the NOS plot:

, clearly linear vs minimum phase does not make any difference if you view a single sinoid without any absolute phase reference.

I was assuming that the digital data was … 000010000 …, of course if you do not align the input-sinc with the samling rate you are correct, but then your test would be strange as you would not be listening to the “clean” filter anymore unless the filter you try to listen to is a “aliasing-free-filter” to begin with.

jussi_laako · March 23, 2016, 9:32pm

That plot is of course without upsampling, so naturally it doesn’t have either filter involved.

The one with upsampling is same Metrum Musette DAC, but with upsampling. And as you say, of course the result wouldn’t change minimum vs linear phase filter.

No, I’m talking about something real you record with real world ADC.

What are you talking about? Real world signals are never aligned to any particular sample period, which is one thing many NOS DAC advocates tend to forget. Of course the filter is aliasing-free, with >192 dB stop-band attenuation (any aliasing is less than 32-bit PCM can represent).

For the alignment reason, I use 7 kHz square wave in the test I linked to, because it is not in sync with any of the standard sampling rates. Same reason for the 19 kHz sine wave used in the plot above.

hmartin · March 23, 2016, 9:47pm

Sorry, then I missunderstod what you meant here, so what sounds have you recorded when you listen to the “pure filter impulse response”?

Yes, that is why they make good examples, especially for NOS that clearly is not aliasing-free?

jussi_laako · March 23, 2016, 10:08pm

That was the part about learning how different filter properties sound. Next learning step is to move to recorded sounds and recognizing same properties from there. My “transient toolkit” consists for example sounds of wood and metal claves, wood block, soprano glockenspiel, maracas, castanets, etc. Originally recorded at 192/24, 96/24 and 44.1/24 and then also various conversions of the 192/24 to 44.1k. Recorded at same microphone distance as my hand-to-ear, so I can easily compare how those sound recorded vs direct acoustic sound, at matching levels (close to 100 dB peak max).

Then another separate thing when you were talking about time smear of the original data, I commented that it can be modified using apodizing filters. That is also easily verified using above material.

Yes…