Roon ARC 32-bit Floating Point WAV Playback / Dithering

I agree. Down-conversion to 24 bits from 32 or 64 bits, float or fixed, real or integer, doesn’t need dithering.

My understanding is that it is the float to fixed aspect which allows 32-bit float to be downconverted to 24-bit fixed without dithering and that it is a special case. Otherwise, dithering should be used when converting any fixed to fixed pair downwards. i.e. 32-bit fixed down to 24-bit fixed.

The amount of quantization noise depends solely on the target bit depth. You can think about bit depth conversion as a conversion to floating point first, followed by a conversion to integer.
Also, “floating-point” and “fixed-point” are two ways to represent real numbers, so I think we should use “integer” instead of “fixed” when referring to 16 and 24 bit samples.

Sure, I can switch my nomenclature for this discussion. This gets into how floating point PCM frames store data vs how integer PCM frames store data and I’m not sure it’s appropriate to dive so far down. I suppose this is why this topic is classically challenging to discuss.

Although I haven’t implemented this myself to see all of the dirty details, my highest level understanding is that 32-bit floating-point frames can be converted to 24-bit integer frames, effectively losslessly, and therefore dithering is not required.

Maybe it’s the same with 64-bit floating-point to 48-bit integer? I shouldn’t muddy the waters.

Again, as far as quantization noise is concerned, only the target bit depth matters.

I believe there is no quantization noise introduced when 32-bit float is converted losslessly to 24-bit integer. My understanding is that it’s a “same data stored differently” scenario. Less efficiently on disk in the floating point format.

Whenever there is a reduction in bit depth, there is new quantization noise added, since some bits - and consequently some resolution - are going to get lost. In the absence of dither, that noise is at most -6N dBFS, where ‘N’ is the target number of bits. For 24 bits, it’s -144 dBFS, which is totally negligible and can thus be considered ‘lossless’ for all intents and purposes.

And I should add, yes, the 32-bit floating point format can store a higher dynamic range. But something about how 0dB is aligned (to integer formats) that allows that extra range to just be cut away (effectively) losslessly (at least in terms of needing to add dither). We’re getting into terrain I don’t understand here, however. It’s developer level stuff here and as I said, I haven’t implemented this personally.

I hear you on the added noise when reducing the amount of data, Marian, and understand that does occur.

I probably should have added the following to the sentence above: “…when using files generated by commonly-used DAWs”. This can occur when an audio driver provides a 32-bit float stream off of 24-bit integer capable capture hardware.

Sooo… after all of that hullaballoo, is it too muddy? We’re going to end up getting dithering, aren’t we? :wink:

But, I still would really like to see that 32-bit stream go to 24-bit (however).

Considering Audacity, which is a free DAW, support 32-bit float internally, all DAWs should be able to. If you save the final tracks in 32-bit float WAVs, then you should have true 32-bit resolution.

That may be unnecessary, but hardly a bad thing.

Depends on your capture hardware there. The vast majority used today are actually 24-bit fixed at their cores. True 32-bit are odballs. I edited above and you may have missed my last sentence, but often audio drivers provide streams in that 32-bit format even though there is only 24-bits worth of data going through it.

True, but capturing is just one aspect. Once you do any post-processing, you’re squarely in the floating point domain, and the original resolution matters less or not at all.

Yeah the integer vs floating point formats are fascinating. Human vs machine.

It’s actually all machine.

For me it’s mostly human. Analog stuff coming in.

Floating point is all digital, nothing to do with analog. Then, the analog chain is still made of analog machines, right?

I was making a bad joke about our different approaches. The music I’m trying to play in Roon ARC that I created this topic about is mostly processed in the analog domain (human) and then just captured. Then I play it. I can see you are using machines with your own music more than me :-).

It was admittedly a stupid joke but I’m trying to lighten the mood.

Now I’m curious about this. Can you describe what you’re doing? We can split into a different thread if necessary…

I record real analog sounds. And then I play them. Ideally the only digital step is the capture. I like the 3d soundstage in fully analog sound so try to preserve it as closely as possible.