I think that part of the issue here is that the typical explanation of digital signal processing is vastly over-simplified, and therefore completely misunderstood by a majority of the population. It’s easy to take the sample rate and translate that into some frequency and then relate that back to human hearing and while there’s some validity to that it’s a minor part of the whole story.
In general, bit depth defines dynamic range and sampling rate defines frequency response. These are two fairly simple concepts to understand, but only tell a part of the story.
In reality both sample rate and dynamic range (bit depth) are interrelated. In fact one commonly used method of increasing dynamic range is to increase the sampling frequency before feeding the bitstream to the DAC. In many cases this is combined with a reduction in bit depth so as to overcome some of the issues with being able to resolve 24 bits in the conversion stage.
So, having extended sampling rates can help improve the overall dynamic range of the conversion and this is a good thing.
The larger issue, which is typically ignored, is what happens at the Nyquist frequency (22.05 KHz in Redbook). It’s simple say that everything above that frequency is cut-off by the filter and only the audible goodness remains. If only reality were simple!
There are two ways to apply a filter (analog and digital) and given that we theoretically have 2.05KHz in which to operate analog is pretty much out. This would have to be a very steep filter to avoid dropping into the audible band and that’s nearly impossible to create in the analog domain due to the difficulty in matching components and dealing with thermal drift.
This leaves filtering in the digital domain and while it’s easier to accomplish a steep filter there and massage the hell out of the signal you do this at the expense of phase anomalies and other nasties (like ringing). These errors (noise) are pushed well into the audible band by the filtering process and are one of the reasons that early digital had a reputation for being harsh. Oversampling in the DAC addresses this by moving the ugly digital filter well outside the audio band and allowing for a very gentle analog reconstruction filter at the DAC’s output.
So this is all well and good, but what does it have to do with high resolution audio? Studios (usually / hopefully) process at very high resolutions so as to have the needed headroom in the digital domain in which to mix and EQ. This allows for some information to be destroyed without having an impact on the critical data needed for analog reproduction. When they downsample to 16/44.1 to cut the CD one of the last stages is a digital filter to cut out any information in the digital domain which can fold down into the audible range. This filter is subject to the laws of any other and can have a definite impact on the phase coherency of the information in the audio band.
If the studio instead downsamples to 88.1KHz or 176.4KHz that filter is very far away from the audio band and has much less of an impact on the signal that we’re actually trying to hear.
So, whether or not signals over 20K are interpreted by our brains is largely irrelevant. The bigger issue is that while digital is “just ones and zeros” there’s some pretty involved mathematics at work to make it work. While it’s simple to say that the filter cuts off above X frequency the harsh reality is that the filter is also (and often significantly) mucking with the signal at lower frequencies as well.
But then nobody likes math and it’s so much easier to speak in broad generalizations…