Hi @jussi_laako,
One of the things that intrigues me about our hobby is the unexpected sensitivity of human hearing. You have extensive experience in both audio signal processing and measurement of audio equipment. Are you still surprised at the differences we can hear between barely measurable changes ? Are there some areas where you think human hearing is more sensitive than the test instruments ?
The problem, as ever, is whether people are hearing differences, or are simply imagining differences based on expectation bias.
Even in blind listening, the human brain generally wants to believe two things sound different and so, they will.
Itâs not to underestimate the effect that this can have, the perceived sound differences can feel very real to the listener, but in reality, theyâre rarely there. Human hearing isnât that sensitive in the bigger scheme of things.
A great example is DAC filters, people go on and on about how one filter sounds better than another, when reality is, most of them have no performance implications below 16khz at the very lowest. Iâd be amazed if most people here had any functional hearing in that range, and if they claim to, Iâd like to see some proof from a qualified audiologist.
Differences are broad and easily heard between loudspeakers. But after that, within acceptable performance envelopes, it all gets a bit complicated.
Areas I am particularly interested in are spacial cues and transients. These are important to us as listeners, but Iâve never seen test equipment map them.
The stereo image arises from small timing differences between channels which are heavily processed by our brains. Evolution has honed such processing to be extremely sensitive. Test equipment does not have the benefit of such processing because its existence doesnât depend on generations of ancestors accurately determining where that particular rustle in the underbrush came from. We see figures showing separation in dB but no maps of the size or focus of a stereo image.
Similarly with transients. Our perceptual processing heavily amplifies edge effects. We hear differences in attack that are hard to measure.
The graphs we see from test instruments measure the things that test instruments are good at measuring. That is unsurprising and human hearing is not particularly sensitive in those areas (although our logarithmic frequency response is difficult to replicate in a microphone).
Iâm interested in the experience of engineers regarding areas where human hearing may outperform test instruments.
There arenât any, really. We can observe planets transiting distant stars and measure the noise of brownian motion.
Do people always measure the right thing correctly? No. Can we measure anything sound-related to multiple magnitudes higher precision than anything the human auditory apparatus can discern? Absolutely.
If youâre talking about acoustics, yes, everything can be measured. But maybe this is more of a psychoacoustics matter and, in that domain, maybe we canât measure everything?
This paper sets out some of the findings about human hearing in the context of treating hearing loss. It is interesting to see how the studies produce counterexamples to various theories of perceptual mechanisms.
Edit: The study findings about improvements following training are consistent with Jussiâs experience below and make it difficult to fit perceptual mechanisms into theory.
Evolution selected âwhatever workedâ, which may have multiple contributing factors.
Yes, and what people can hear heavily depends on practice and learning. Once you know what to look for in the sound, it is much easier to spot. I used to train passive sonar operators in the past, and it was always interesting to see how their skills develop over time.
Humans have very good skills in detecting patterns and disturbances out of lot of background noise. Just try using Siri or similar in a very noisy environment at similar distance from you as your discussion counterpart. And Iâm pretty certain the human can understand your speech much better under such conditions.
This is particularly emphasized for very short duration events (transitions).
Yes, for example in detecting differences in very short events (transients), especially when such are buried in ânoiseâ (mixed with a lot of other signals).
For example a paper from 12 years back:
Over the past several decades I have made a lot of demonstrations on these.
Youâre talking about ITD (interaural time difference). There are other methods our hearing uses to localize sound, like IID (interaural intensity difference) and HRTF (head-related transfer function, caused by pinna filtering). Humans can also use small head movements to disambiguate certain source positions. Sound localization - Wikipedia
Which means itâs measurable. But we can do more than measure. Using the combination of ITD, IID, HTRF and motion detection, we can simulate immersive sound pretty well.
I think distinguishing competing sources is the âcocktail partyâ problem referred to in the paper I linked above. Iâm not sure of the methodology adopted but it should be described in the literature cited.
This passage in particular seems relevant:
In addition to using differences in acoustic properties, the auditory system is able to make use of the regularities and repetitive natures of many natural sounds to help in the task of segregating competing sources. McDermott et al. (2011) found that listeners were able to segregate a repeating target sound from a background of varying sounds even when there were no acoustic cues with which to segregate the target sound. It seems that the repetitions themselves, against a varying background, allow the auditory system to extract the stable aspects of the sound. The authors proposed that this may be one way in which we are able to learn new sounds, even when they are never presented to us in complete isolation (McDermott et al. 2011).
Regarding the uncertainty principle, that applies to LTI (linear time-invariant) systems, since those can be analyzed using Fourier transform. Itâs been demonstrated that human hearing is non-linear in certain situations. The question is whether these non-linearities - which can be detected in very special cases - play any role in music and speech reproduction. (I personally havenât seen anything that suggests that.)
Percepted differences are by definition a psychological phenomena and canât be measured by lab equipment. They still can (and have been, many times) quantified, for audio as well as other things. There have been many an experiment where people, even trained sommeliers, are quite sure that the wine poured to them from a fancy bottle they know is expensive, tastes so much better than (the very same) wine poured from a âthree buck chuckâ bottle. Same with audio where people swear by $10K/foot cables and what not.
That whatever algorithm the brain uses to process auditory stimuli lets it extract some information that e.g. Fourier analysis canât is interesting but not very relevant to whether we can measure if two signals are different or not. Or whether we can determine if a certain difference is, in fact, audible or not.
Music is just full of such occasions where that case appliesâŠ
Yes we can, MOS and MUSHRA for example which are have been used throughout for example to create the lossy codecs that exist now. And are basis for example in the improvements made in AAC to overcome some of the transient reproduction problems of MP3 earlier.
P.S. And you need to account for the listener profile in above cases as it affects the results. So not everyone have same preferences/sensitivies/focusareas. So one size doesnât fit all perfectly in the end.
An important point of all senses we have is that all of them can be fooled and are fooled each and every day. We usually just donât notice.
We hear things that are not real, we donât hear things that are real. Same with visual sense. Thatâs the consequence of our brainâs processing which always involves filtering and creating artefacts. And you canât turn this off.
Some simply donât know about this or try hard to ignore or deny this fact when it comes to this audiophile sound thing. Believing esp. the hearing sense would be a perfect measurement instrument which itâs certainly not. No sense actually is.
And certain business areas in the audiophile field use this situation to build a business upon it. Knowing their products only feed the psychological aspect of listening/hearing without a technical fundament in their products.
You canât just say people donât have to hear something when there is nothing. Because hearing always includes the processing in the brain involving artefacts. So, hearing something when thereâs nothing to hear is normal and vice versa.
And the assumption that everything we hear/experience must have a source outside of the brain is simply false.
And that paper as well, as it puts specific constraints on the test signal and analysis parameters instead of free random method of multiple intermixed and overlapped signals (that we call music).
Yes, there are better analysis methods than Fourier:
But these are not used in your normal measurement gear and need quite a bit of care for understanding the results correctly.
I think I encountered that once in NL optics, but itâs been decades and my head still hurts a bit Still, I think epistemic humility rules out a strong claim that audio system measurements, done carefully and using FFT methods that look at key factors of distortion and noise (and with these also clearly central to ideas like spatialization), are insufficient for effectively characterizing the performance of audio systems. To say otherwise we need more results.
But, hey, I agree with everyone that preferences vary and transparency and Harman curves are not for all, just for those of us who want them!
Good starting point is that those distortion and noise measurements on lossy codecs donât tell much if anything about how humans perceive sound quality of the codec. Only good way to test this is to perform listening tests. There are some models that try to simulate this, but those are nowhere near the fidelity needed for codecs used for music. For voice codecs there are certain standardized phrases that are used to test intelligibility of the speech subject to compression and artifacts. Meanwhile audiophiles have their test tracks, many times carefully chosen, which is equally good.
You can also look up internet for blind listening test comparisons of different AAC encoders, which shows that people are able to pick up the implementations. IIRC, the one from Nero has been rated among best.
Distortion and noise tests work on constant tones. It doesnât tell about transient performance.
Just go back to history and look what happened in mid 70âs. Those days people were claiming exactly same, that because THD+N measurements were so great, the amplifiers must sound perfect. But they didnât. Reason was TIM (Transient Intermodulation Distortion), which Otala & co then invented a measurement method for, and designed an amplifier that didnât suffer from this. He then later went to work for Harman for example to design the Citation XX amplifier. Electrocompaniet also took note and started designing amplifiers based on the Otalaâs concept, thatâs how that company got startedâŠ
Now the history is repeating itself regarding these measurements.
Sure, but I donât think anyone here is much interested in the brilliant but ultimately surpassed work on lossy codecs. I have to admit to being irritated AF about streaming artifacts in video, however! More bandwidth will ultimately fix that.
But I am curious now about the TIM stuff, which Iâm not familiar with. We certainly have plenty of IM measurements. Since audio is not my primary field despite my dusty MSEE in information theory, I hoover this stuff up but am still learning how to shelve it.
I ask you, though, since you state we are repeating history why Otalaâs insights are not implemented in class-leading amps today? And if they were implemented, how might it shift the needle in terms of measurable/audible properties of amps that would be beneficial to the audio aficionado?