Reported Song Quality

Hi Roonsters

Quick question…I’m sure most of us are aware that songs purchased/downloaded from the internet may report to be one quality (eg. FLAC) but when analysed in something like Spek or Fakin’ The Funk turn out to be sometrhing less (eg, 320 MP3). When Roon illustrates the signal path does it use what the song reports as its quality or does Roon actually analyse the file like the 2 spectrum analysers mentioned above? I’ve done a few tests where the signal path shows a lossless FLAC but the spectrum analysis actually indicates a lossy MP3. Any help/advice/hints much appreciated.

DDJ

Roon reports the container format of the file itself.

So if I understand correctly Carl, Roon reports what the file reports itself as which may be different from what it actually is?

DDJ

Well no not really the file format is the file format… full stop … and that’s what Roon is reporting in the signal path there is no ambiguity here.

Now if a low bit rate lossy recording is transcoded to say 192k / 24 bit FLAC file … Roon would correctly report the format as the later.

This should not be confused with provenance of the recording itself… Roon makes no statement on that all.

8 posts were split to a new topic: Provenance and MQA

Hi Carl

Surely a lossy file cannot be actually transcoded (“upgraded”) to a lossless file but can only be “reported” (incorrectly) as such - once audio bits are lost they’re lost. Roon doesn’t actually “open” the audio container to inspect what’s actually inside but only reports the container type?. In summary, is it fair to say the signal path cannot really be trusted for any downloaded files but only for files ripped by the user (who obviously controls the entire process and output).

DDJ

As @Carl posted, the signal paths shows exactly that, eg, what Roon is doing with the audio file. It’s not an audio analysis tool, there are external problems for that task.

Thanks Mike…I realise (now) that the start of the signal path reported by Roon may not be accurate and is wholly dependent on the audio container reported by the file as Carl says.

DDJ

1 Like

It’s accurate in the sense that it describes the box correctly, sometimes Tidal and Qobuz get that wrong, but it says nothing about what is inside the box.

Think this can be made clearer, file formats aside, regardless of what container is used - WV, WAV, AIFF, APE, FLAC, mqa etc.

Inside the container is an audio stream that may or may not be lossless (except in the case of mqa which is by definition lossy). An upsampled audio stream, whether MP3 or lossless audio will be represented as the audio stream it was upsampled to, whether or not there’s audio content present beyond a specific frequency. Roon’s job is not to ascertain whether or not the file has been upsampled, its job is to play back the audio stream at the optimal resolution that your audio gear will allow. So when Roon reports the audio stream as lossless it’s only saying nothing has been discarded in the playback chain. Ascertaining whether or not something is upsampled from a lossy format or to a “higher resolution” format is the job of spectrum analysers and an educated eye as these transcodes and upsamples generally leave artefacts behind.

Thanks Evand…now what about this “volume attenuation” malarkey in the playback chain?:grinning:

DDJ

Roon’s volume attenuation comes into play if you use its internal volume control or use it’s internal volume normalisation. Both use least destructive means of digital attenuation, albeit beyond a certain threshold it becomes lossy … you’d have to be dialing it back quite significantly though. Unless you don’t have a preamp or you’re mixing tracks from different albums there’s no need to use either volume control or normalisation within Roon or any software player for that matter.

Thanks again Evand…as far as I’m aware I’m streaming direct from my PC to my Chromecast attached to my Bose ST20 with no volume normailisation/attenuation set anywhere within the Roon options but I’ll have another look. Do you know if Chromecast Audio has attenuation built in?

DDJ

It does, use the Google Home app to change the settings - see screenshot re full dynamic range.

1 Like

Excellent…thanks for that Evand…I’ll have a tinkle around when I get home from work…

DDJ

I’ve expanded upon @Carl’s comment above here:

@danny and @brian-- as a related-ish question… just curious if eg; “Discover great recordings” is doable. As an audiophile, I’ve always wanted to find well-mastered/well-recorded tracks in the genres I’m interested in. Relying on other people’s posts online doesn’t really scale (nor will those suggestions cover all genres or stay up to date).

For starters, the “state of the art” here (https://www.microsoft.com/en-us/research/uploads/prod/2018/10/Deep-Neural-Network-Models-for-Audio-Quality-Assessment-SLIDES.pdf) relies on a convolutional neural network using some derived metrics about the audio including SNR estimates, Constant-Q Spectral transformation, using mean-opinion-scores (eg; human labeled audio) as the ground truth. But the target here is speech data not music.

As audiophiles, we can already call out a few additional useful features that denote good music:

  1. A high enough dynamic range (DR). DR is trivial to calculate for local music and can be backfilled using existing databases. For streaming music, this can be done during playback.
  2. A low noise floor. Really, this matters most for live music. Electronic music is rarely going to have a high noise floor. Also, we know that typically tracks on the same album will be mastered/recorded similarly (except for compilations). This means noise floor can be estimated from the quietest passage across all songs in an album.
  3. Some stereo imaging. A feature for this can be amplitude of the side channel when doing middle-side analysis. Again, could be done asynchronously in the roon core or on your end.
  4. The lack of obvious compression artifacts. Some, like pre-echo should be fairly straightforward to detect. Can’t find a lot of literature on it though.
  5. CNN embeddings of the audio and a FFT of the audio. These can be costly to compute but can be computed “lazily” during playback. Typical embedding inference time can be in the order of a few seconds per track on CPU. Presence of GPU drops this far lower.

Anyways, I’d have loved to take a stab at this but I don’t have a huge dataset nor the ability to gather labels at any reasonable scale. But you do :wink: I reckon the main challenges surround generating the features at scale on your end as much as possible ahead of time OR doing the computation client-side over the course of several months (though we have to be cognizant that we may be running on tiny ARM CPUs or embedded platforms).

Also happy to chat about solving some of the scalability issues of computing some of these features at a large scale. I’m the original author of Airbnb’s (soon to be open sourced) Bighead ML Platform.