Linear Phase v Minimum Phase?

Seems like Charley Hansen thought linear phase filters were old technology.

Unfortunately I can’t find a link to the white paper mentioned in the article but this gives some idea of its contents.

Clearly linear phase filters are not the only option.

1 Like

He also thought that digital filters were lower down on the list, of the most important SQ / performance features of a DAC… below the power supply section and analogue section design & performance…

A post worth reading by all & bookmarking:

1 Like

Sean, no, digital filters are not the most important aspect of a DAC. But, we are talking filters here so saying they aren’t as important as the DAC power supply is neither here nor there.

1 Like

I agree. Not sure why I posted that now. That post by Charles Hansen is still a good read, at the very worst.

The discussion here has been fascinating, so carry on :grin:

Yet many software and hardware vendors take trouble to provide a choice of linear phase and minimum phase filters. If the position was clear then I would expect less resources to be spent accomodating such choice.

I find myself preferring the mp filter with HQ Player xtr-2s (upsampling to 512 DSD) as I perceive it to maintain attack, which I think is vulnerable to being eroded when upsampling to DSD. There is no science to this, just what I hear and I fully accept that it could be expectation bias.

I’m still trying to understand the frequency distribution of pre-ringing. This paper sets out an experiment with an exaggerated crossover filter and is concluded by the following note:

“Rough safety limits according to both test methods would be to keep the order of a linear phase FIR crossover filter under 600 at higher frequencies (1 and 3 kHz) to prevent from the ringing phenomenon producing audible errors. At low frequencies, such as 100 Hz and 300 Hz, the order may be up to thousands, and still no audible errors will occur…”

High tap FPGA filters (and the xtr-2s filter I am using) would seem to be well outside the 600 order limit suggested here. Why wouldn’t that have audible consequences ? Are DAC filters different from crossover filters in some relevant way ?

Ringing occurs at the frequency of the transition band of the filter. This is the steep slope part. It occurs if there is energy across the transition band. If there is no audio energy in the frequencies across the transition or close to it then no ringing occurs (it needs to be excited). If the transition band is entirely above audible frequencies then ringing may occur but it won’t be audible. If the transition band is very gentle (like a treble adjustment on your amplifier) then no discernible ringing occurs.

A simple way to think of it is to recognize that all audio can be described by a summation of sine waves (Fourier series). When you filter out certain frequencies with a high pass filter then ringing occurs (Gibbs phenomena) at the transition band frequency because the higher frequency sine waves that made up the original Fourier series that described the waveform are gone.

Take a 1 KHz square wave recorded digitally for distribution on a CD player (a very non musical horrible signal). This can be mathematically represented perfectly by a 1 KHz sinusoid with combined odd harmonics at 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25KHz etc. …and so on infinitely. The anti-alias filter prior to A to D will remove everything above 22KHz. So the CD contains all harmonics correctly up to the 21KHz sinusoid. Now the anti-alias filter implies ringing at 21KHz because everything above that is needed to make a perfect square wave has been removed. If we apply a poorly designed reconstruction filter with too low of a transition at 20 KHz then during the D to A the the ringing will occur at 19KHz and the ringing at 21KHz harmonic will be removed. In this example it is easy to see that the ringing of a transient is always close to the transition band.

Now let’s examine a 4 KHz square wave (very non musical) to show that ringing doesn’t happen lower in the audible band. This is made up of 4, 12, 24, 32KHz etc…infinitely. The anti-alias filter in the A to D will remove 24 KHz sinusoid components and above. At this point it doesn’t much look like a square wave! However the reconstruction filter in the D to A will PERFECTLY reconstruct what was left of this “square wave” after A to D anti-alias filtering as the signal is made up of only two components (4 and 12 KHz)

https://en.wikipedia.org/wiki/Square_wave

2 Likes

The Roon DSP is very powerful and allows for active filtering:
channel mapping and IIR filters (minimum phase) with the “Procedural EQ” module, channel gain and delay setup with the “Speaker Setup” module.

Alternatively speaker filters can also be designed with Rephase (Linear Phase or Minimum Phase) and implemented with the “Convolution” module. Example of .cfg setup for a 4-way active speaker I am building:

96000 2 8 0
0 0
0 0 0 0 0 0 0 0

filterBASS.wav
0
0.0
0.0

filterBASS.wav
0
1.0
1.0

filterMEDIUM.wav
0
0.0
2.0

filterTWEETER.wav
0
0.0
3.0

filterMEDIUM.wav
0
1.0
4.0

filterTWEETER.wav
0
1.0
5.0

filterSUB.wav
0
0.0
6.0

filterSUB.wav
0
1.0
7.0

The Roon endpoint need to be USB or HDMI as other protocols do not allow for multichannel.

In case anybody has not come across this before:

I’ve also conducted my own small-scale non-scientific blind test with Lumin WM8741 users. The preference of linear vs minimum phase among users are roughly split in half.

While many experts believe linear phase is the correct filter for audio and believe ringing to be a non-issue, I’m interested in understanding why so many prefer the “incorrect” filter. Based on what I read, I think this is a subject that is less well explained than things like tube vs SS, LP vs digital, etc.

I think it’s also worthwhile to read what our resident filter expert suggested:

3 Likes

Dear Rhitmalyst,
At 44.1 kHz, the sampling interval is 11.34 microseconds.
So with “ordinary CD” you have no difficulty to define an event that is 11 times smaller than the distance between two samples, whatever the frequency ?

1 microsecond, not sure, but the known interaural delay detection limit is about 2 microseconds. Still a factor 6 missing in the accuracy between two samples. The relative phase needs to be correct. This is not easy even with resampling and long-window interpolation functions… but it can be made cleverly, which is by the way one reason why Roon or Roon/HQplayer, with the power of modern DSPs and fast computer buses, extracts more information from CD format than conventional drives.

Yet with higher sampling in the recording, progress can be made on that, assuming jitter is maintained as vanishingly low and everything is sufficiently well on all other aspects of audio reproduction, down to amplification and transducers.

Now, and this is our daily bread, introduce a masking limitation in the chain from the digital recording and its mere format, down to the acoustic signal in our ears: and the difference dims, or vanishes. This can result in some inconclusive experiences and inaccurate claims.

1 Like

As counter-intuitive as it may seem to you : yes, that statement is indeed correct. There is no difficulty. The math behind sampling is very old, and still very valid…

You can timeshift your waveforms with any time you desire. That includes 1 microsecond, or shorter if you wish so.
The timing of this waveform will still be reproduced with absolute accuracy, when converted back to analog.
As long as the frequency of said waveforms, is kept under half the sample rate. (And thát’s the important part).

What you can NOT do, is sampling a very sudden change in amplitude, in a very short time.
Such a change MUST imply the presence of very high frequency content. Which cannot be sampled, with a sampling rate that’s too low.

Following section is not entirely correct, but included to make it easier to understand. In your case:
A linear change between 0 microseconds, and 11.34microseconds can be sampled and reproduced.
This change would equate to freqency content, of exactly 20.5kHz.
Do note : the actual start time of this waveform, relative to the timing of another wave, does not matter at all. It can be anything you wish.

Any change with a faster rise/fall time than this, MUST be a component with frequency content above this 20.5kHz.(I hope you can visualize, how a high frequency rises faster, than a low one). It is :

  1. Not possible to sample this, at all.
  2. Not related to the timing or phase of your waveform. But instead, related to the high frequency content of it.
1 Like

Dear Marco,
It is almost a funny situation what happens in these forums. No need to lecture me on Nyquist and other sampling issues! I happen to be a professional of non-audio digital signal processing on top of a audio hobbyist.

So let me bring in some clarifications :

  • I was not describing that but when you mention that " Such a (one-sample) change MUST imply the presence of very high frequency content." this is only part of the reality : because a spike would contain not only very high but also very low frequencies. The FT of a (sampled) unit impulse or Dirac is a unit function across the bandwidth allowed by the sampling. But let´s focus on this timing accuracy issue.
  • We are not talking about ABSOLUTE timing accuracy but INTERAURAL timing accuracy - differential timing. On stereo PCM signal typically. The key fact is that the ear is sensitive to around 2 microseconds of delays. If for instance cross-channel jitter is present it will affect imaging and musicality altogether.
  • Being able to timeshift waveform way below sampling - or to measure time shifts (not PHASE) shifts - in order to accurately correct it for instance, requires downsampling. Of course one must dealias the result implicitly or explicitly.
    In other domains of digital signal processing one finds that solving for a variable tiny timeshift, between two very similar but not identical long time series - a similar situation to stereo signals - can be done by computing the shift to zero of the peak of the cross-correlation of these signals. But not only this cross-correlation but also the signals themselves need to be finely resampled, in order to solve for a time shift than can be of 1/1000 of the carrier frequency of the modulation.
    2 microseconds was by the way detected by humans in frequencies far below 22 kHz. Rather like 2000 Hz. So we do have a factor 1000 between the period of the carrier signal and the value of the timeshift. That tells how good our "internal signal processing is.

But this accuracy is very difficult to reach with classical interpolation (especially real-time). At least with higher resampling (up to 250 kHz for 2 microseconds), you are safe - no problem to realign or no risk to lose alignment in the digital flow.

I use Roon/HQplayer on a setup that outputs on 30 cm USB to a 384 kHz - 32 bits float converter. As a result, I get truly accurate imaging (considering only excellent recordings) with 44.1 kHz PCM, but even more accurate with 96 or 192 kHz PCM, then another improvement from DSD recordings, that also have a finer residual “micrograin”.

There can be other factors, but it is is the first time I can state that so clearly, and incidentally the first time I am using a sufficiently powerful software and hardware combination to do this heavy resampling in real time on the fly and without difficulty, interruptions or induced jitter.

Let´s say that being driven by ear sensitivity able to detect 2 microseconds to seriously oversample the digital audiostream audio, way beyond our maximum perceived frequency, is compatible with my recent Roon/HQ experience, especially using highest oversampling digital audio format as input, combined with clean computations for resampling at 384 kHz, well over 250 kHz.

Hope that sums it up, already quite a digression from linear vs min phase. But timing is even more ear-sensitive than phase on wideband signals. On sine waves, they are proportional…

1 Like

Agreed, way off topic. This’ll be my last post on this subject…and I’ll skip the irrelevant parts. Thanks for your extensive explanation.

I did not mean to ‘lecture’ you, sorry if it comes across as being a wise-ass… :).
I already understood that you were talking about relative/differential timing. That’s why I explicitly mentioned multiple (=simultaneous) waveforms, and the relative time between them…
I do agree : I should have talked about phase, instead of time or delay. Would have done that, if I knew your background. You can indeed not ‘delay’ for infinitely small times, and preserve that delay after sampling. (Whether that is relevant for audio reproduction, or not). You can phase shift as slight as you want, which is what I wanted to make clear for a layman.

Anyhow, you are now making it even more specific : it is now even interAURAL time or phase issues which are bothersome.
That makes me wonder even more, how this can be relevant. A delay of 11uS, or equivalent phase shift, equals the situation where your ear is 4mm closer to the left speaker, than to the right speaker. I don’t know about you, but I myself cannot sit THAT still, for the duration of a track…Besides, the speakers themself also travel over such distances.

(Also equates to the left sound path being 0,5 Kelvin warmer, than the right sound path (cold breeze coming in from the right window?). Or <insert another absurd, but realistic situation>.)

I’m really not trying to make fun of you. Just trying to express my scepsis on your statement…
If you say that we are able to discern 2uS differences beween our ears, in specific situations, then I’m willing to believe that.
I do not accept however, that this has any value in any practical audio application. Except, maybe, with headphones. Doubtful.

Most important : I am happy, that you are happy with your results. One cannot ask for more :).

1 Like

This is also an interesting read.

https://www.stereophile.com/content/zen-art-ad-conversion

Especially relevant to this topic is this footnote.

Footnote3: Although Charley Hansen and Bob Stuart disagreed on almost everything audio-related, they shared an antipathy for linear-phase filters.

Good point. However 50/50 split is kind of a non result. It means the sound is slightly different and that is all - making it hard to make a choice unless you are desirous of highest possible fidelity and are armed with the knowledge that linear phase preserves the original analog waveform integrity that was recorded digitally whilst minimum phase alters it. Of course, some folks prefer to trust authority figures (those who are trying to sell a product and have an established reputation of making/selling audio products) who have blatantly and falsely claimed that minimum phase is more accurate to the original waveform; it isn’t. The deception is very clever. The proponents of minimum phase use truthful arguments about ringing and the problems it brings when pre-ringing is at audible frequencies. The proponents neglect to mention the phase distortion that is introduced by minimum phase and the fact that any pre-ringing on a well designed linear phase reconstruction D to A filter will be at inaudible frequencies.

1 Like

Good point. However 50/50 split is kind of a non result. It means the sound is slightly different and that is all - making it hard to make a choice unless… [etc}

Just to be on topic, for once :slight_smile: .
I think that is the best possible answer : there is no best solution.
It’s a trade-off situation. What property are you willing to detoriate, in order to improve another property ?
Choosing the best-of-all-worlds, is not an option here…every improvement, comes with a detoriation elsewhere.

You cannot choose to have a coffee with maximum bitterness, and maximum sweetness. You have to choose, what balance suits you best.(On that moment…).
Yes, I do realize that this is an utterly bad analogy :). And yes, while I’m very critical with audio reproduction, I do not care to take a stance here.

There is a best solution for highest fidelity. Somehow the term “Hi-Fi” has a morphed into meaning pretty much any audio equipment for listening to music. However, long ago the term had an actual meaning or connotation - it meant “High-Fidelity” or high faithfulness to the recording. Linear Phase reconstruction filters are most faithful to the waveform time domain recording so it is better than minimum phase if “high-fidelity” is the metric.

I fully agree that there is no single best solution for preference as to what sounds nicest and best suits every listener. Like fashion clothing - in this case there is no metric other than whims or tastes and there is no accounting for taste.

The fact that Hi-Fi has nothing to do with its original connotation suggests we have reached diminishing returns with DACs and manufacturers are resorting to marketing and selling fashionable flavoring rather than sticking to pure high-fidelity.

1 Like

Really hate to be splitting hairs :(, but I don’t think that’s entirely correct…

In your case, you are balancing between presumably audible effects of ‘(pre-)ringing’, versus ‘phase distortion’.
Unless I misunderstood; please tell me if so.

You state that (pre-)ringing is inaudible in practice, and phase distortion might not be.
If I may take that freedom : both statements are not universally accepted as being ‘correct’. And neither are its opposites…

Of course I agree: there is a pretty huge gap, between ‘truthful reproduction’ an ‘individual preferences’. And yes, we are approaching perfection on the digital and amplification side. Good observation.
The major reason why I hate to talk about this subject, but love to know more about it :slight_smile: .

Pre-ringing is all at inaudible frequencies for a properly designed linear phase D to A reconstruction filter. Also audio is supposed to be anti-alias filtered prior to digitization. If properly anti-aliased then there won’t even be inaudible pre-ringing from reconstruction filters because the analog audio was filtered to remove transients above 20KHz before A to D. Pre-ringing is a giant misconception for converters. Pre-ringing is only a real issue for DSP or high Q EQ filtering within audible band (20 to 20KHz).

Hope the above is clear.

The “pre-ringing” concerns promoted by some folks in regard to D to A filtering are rather a bogus conflation of real effects of within audible band filtering (EQ) as well as the impulse response or square wave response of a DAC. The square wave or impulse response of a DAC is created by injecting an artificial digital test signal with frequency content at 22.05 KHz - this is NOT a frequency that should exist in proper anti-alias filtered CD audio. It is merely a test signal that exceeds what a DAC should ever be expected to decode when listening to music. It is interesting as it tells us about the filter in the DAC (Linear vs minimum phase) but it does not represent a properly produced CD or digital music file - the pre-ringing is only a manifestation of the outside of normal signal being fed to the DAC. The conflation is when manufacturers claim that it represents real studio produced music - it doesn’t - it is a test signal outside of what a DAC should expect to handle.

1 Like

To be clear : I do not disagree. In fact, I agree with most of what you wrote here. As long as we keep observing what’s happening in the electronic domain.

But let’s make a step further, and move into the acoustical domain. The thingy that reaches your ears, and that should matter the most :).

Assuming that pre-ringing, at inaudible frequencies, occurs. What do you think will happen, when you feed that signal into your loudspeaker ?
(Knowing that this loudspeaker has an insane amount of nonlinearities, as even the highest-end speaker does). Do you think this hi-freq content will shift down into the audible spectrum, or not ?

This is what I meant. I already know the answer… But, this is what I meant by ‘there is no best answer’.

I honestly don’t expect any problems with a good Hi-Fi. Even if you want to feed your DAC a test signal like a square wave that would never be on any CD.

  1. The pre-ringing signal is very small compared to the rest of the audio (1/10 and lower)
  2. A good tweeter will be rated to 25KHz or higher.

https://www.madisoundspeakerstore.com/seas-soft-dome-tweeters/seas-excel-t25cf-002-e0011-millennium-tweeter/