Quick question…I’m sure most of us are aware that songs purchased/downloaded from the internet may report to be one quality (eg. FLAC) but when analysed in something like Spek or Fakin’ The Funk turn out to be sometrhing less (eg, 320 MP3). When Roon illustrates the signal path does it use what the song reports as its quality or does Roon actually analyse the file like the 2 spectrum analysers mentioned above? I’ve done a few tests where the signal path shows a lossless FLAC but the spectrum analysis actually indicates a lossy MP3. Any help/advice/hints much appreciated.
Surely a lossy file cannot be actually transcoded (“upgraded”) to a lossless file but can only be “reported” (incorrectly) as such - once audio bits are lost they’re lost. Roon doesn’t actually “open” the audio container to inspect what’s actually inside but only reports the container type?. In summary, is it fair to say the signal path cannot really be trusted for any downloaded files but only for files ripped by the user (who obviously controls the entire process and output).
Think this can be made clearer, file formats aside, regardless of what container is used - WV, WAV, AIFF, APE, FLAC, mqa etc.
Inside the container is an audio stream that may or may not be lossless (except in the case of mqa which is by definition lossy). An upsampled audio stream, whether MP3 or lossless audio will be represented as the audio stream it was upsampled to, whether or not there’s audio content present beyond a specific frequency. Roon’s job is not to ascertain whether or not the file has been upsampled, its job is to play back the audio stream at the optimal resolution that your audio gear will allow. So when Roon reports the audio stream as lossless it’s only saying nothing has been discarded in the playback chain. Ascertaining whether or not something is upsampled from a lossy format or to a “higher resolution” format is the job of spectrum analysers and an educated eye as these transcodes and upsamples generally leave artefacts behind.
Roon’s volume attenuation comes into play if you use its internal volume control or use it’s internal volume normalisation. Both use least destructive means of digital attenuation, albeit beyond a certain threshold it becomes lossy … you’d have to be dialing it back quite significantly though. Unless you don’t have a preamp or you’re mixing tracks from different albums there’s no need to use either volume control or normalisation within Roon or any software player for that matter.
Thanks again Evand…as far as I’m aware I’m streaming direct from my PC to my Chromecast attached to my Bose ST20 with no volume normailisation/attenuation set anywhere within the Roon options but I’ll have another look. Do you know if Chromecast Audio has attenuation built in?
@danny and @brian-- as a related-ish question… just curious if eg; “Discover great recordings” is doable. As an audiophile, I’ve always wanted to find well-mastered/well-recorded tracks in the genres I’m interested in. Relying on other people’s posts online doesn’t really scale (nor will those suggestions cover all genres or stay up to date).
As audiophiles, we can already call out a few additional useful features that denote good music:
A high enough dynamic range (DR). DR is trivial to calculate for local music and can be backfilled using existing databases. For streaming music, this can be done during playback.
A low noise floor. Really, this matters most for live music. Electronic music is rarely going to have a high noise floor. Also, we know that typically tracks on the same album will be mastered/recorded similarly (except for compilations). This means noise floor can be estimated from the quietest passage across all songs in an album.
Some stereo imaging. A feature for this can be amplitude of the side channel when doing middle-side analysis. Again, could be done asynchronously in the roon core or on your end.
The lack of obvious compression artifacts. Some, like pre-echo should be fairly straightforward to detect. Can’t find a lot of literature on it though.
CNN embeddings of the audio and a FFT of the audio. These can be costly to compute but can be computed “lazily” during playback. Typical embedding inference time can be in the order of a few seconds per track on CPU. Presence of GPU drops this far lower.
Anyways, I’d have loved to take a stab at this but I don’t have a huge dataset nor the ability to gather labels at any reasonable scale. But you do I reckon the main challenges surround generating the features at scale on your end as much as possible ahead of time OR doing the computation client-side over the course of several months (though we have to be cognizant that we may be running on tiny ARM CPUs or embedded platforms).
Also happy to chat about solving some of the scalability issues of computing some of these features at a large scale. I’m the original author of Airbnb’s (soon to be open sourced) Bighead ML Platform.