Huge number of lost PCM samples in network endpoint output [ALSA software device]

brian · August 8, 2016, 6:24pm

RAAT and MPD are really very different systems. One is a media player. The other is audio distribution infrastructure designed to produce interoperability across dozens of manufacturers + DIYers with the absolute minimum of required firmware updates in the wild.

RAAT uses a dynamically defined network protocol. This means that we can fix bugs and make improvements to code running on RAAT devices without waiting for each company to release firmware updates, and without forcing users to go through manual update processes. Dynamically defined network protocols present a different workload than inflexible “baked-in” protocols, and also a different workload than decoding media (which often benefits greatly from CPU features like MMX/SSE).

Exactly how each kind of thing plays out on each device varies based on too many factors.

The BDP-1, in particular, is somewhat unique among Roon Ready devices: it uses a rather primitive embedded x86 CPU. It is cache poor, and fairly archaic in its approaches to branch prediction, out-of-order execution, etc. It does have MMX, though. This combination of tradeoffs would make a workload like RAAT (primarily protocol unpacking, interpreting dynamic network protocol, packet reassembly) more challenging for that CPU than a workload like FLAC->ALSA (MMX and memcpy).

Overall, I don’t put huge stock in this sort of comparison. Assuming Bryston has done their job well, and I believe that they have, modest differences in CPU usage should not be allowed to impact the sound quality. There is a lot of “it depends” in constructing a test and measuring this sort of thing. I brought up “differences” in CPU usage patterns to draw attention to the fact that RAAT and MPD take extremely different approaches to audio playback, not to suggest a mechanism of causation for a particular anecdotal report.

This is a good overview of the design goals of RAAT.. It might be a little bit surprising how much non-audio-related stuff is required to make a protocol like this successful. Getting audio playback right is just table stakes.