Huge number of lost PCM samples in network endpoint output [ALSA software device]

Hi, I’m an owner of Bryston BDP-2 and evaluating Roon with BDP-2 as network endpoint. While hearing differences between standard MPD playback and Roon I performed some investigations. It’s easy to SSH to BDP-2 and change Roon endpoint config file to use ALSA tee plugin for capturing output to wav file. After doing that I discovered that Roon endpoint output is not bitperfect. It looses PCM samples - about of second of 4 minute track is lost. Dropouts are very short and not audible as clicks. But sound simply degrades a bit. Here is a picture of captured waveform in comparison with origin track:


As I understand, the root cause of this issue is UDP protocol used by RAAT. UDP packets have no guaranteed delivery. And my network setup is pretty basic - Apple Airport Extreme gigabit router, Mac Mini server and CAT5 ethernet wiring. So my questions are following:

  1. Can Roon confirm that there is no guaranteed delivery of all PCM frames to network endpoint?
  2. Can buying managed switch improve the situation? Will it give 100% delivery of UDP packets?
  3. Why Roon does not have any indicator of stream quality? Perhaps you may use TCP for passing checksums.
3 Likes

Great post!!!

Curious, if you are doing any upsampling of your music before sending to the Bryston? Also, are you able to test DSD as well?

No upsampling, just 44/16 for my testcase. I have no DSD Dac. I think it doesn’t matter PCM or DSD. The question is about UDP packets delivery

I doubt that RAAT is losing samples, but let’s ask @brian if he can shed a little light on this…

1 Like

It’s easy to SSH to BDP-2 and change Roon endpoint config file to use ALSA tee plugin for capturing output to wav file

This is your problem. RAAT does not play nicely with software-based ALSA “devices” that don’t exhibit clocking behavor sufficiently close to that of a real “sound card”. This relates to the technical details of how we recover the clock from the device and expose it back to Roon.

We do not generally certify Roon Ready devices that use ALSA devices other than hw:X,X because we are unable to warrant the accuracy of Signal Path without direct access to the hardware, and because of problems like the one you are running into. We also do not certify Roon Ready devices without confirming that they are capable of bit-perfect playback.

In order to run a meaningful test, you must collect the samples via the USB or S/PDIF/AES ports without tampering with the software.

Can Roon confirm that there is no guaranteed delivery of all PCM frames to network endpoint?

We would have to be complete and utter fools to design an streaming protocol for music playback that dropped audio each time a single UDP packet was lost.

RAAT has a reliability mechanism that detects dropped UDP packets and re-transmits them. There is a substantial buffer in the endpoint, which allows us to potentially retry many times before it becomes “too late”. RAAT is a reliable protocol, just like TCP.

Any networked audio playback system–regardless of whether it is based on TCP, or UDP, has to make a decision about what to do if the data does not “get there” in time. In RAAT’s case, this means that an audio packet was dropped, and then 20 or more re-transmit attempts for that packet failed (in other words, the network is totally broken).

Were that to happen, RAAT would replace the missing audio with 0-samples (PCM) or 0x69 samples (DSD).

A whole packet of missing audio would produce an audible click/pop that would be jarring to a layman who’s not paying close attention, and it would reflect as a string of hundreds of 0-samples on a capture like the one you posted.

Can buying managed switch improve the situation? Will it give 100% delivery of UDP packets?

Packet loss is very rare on wired home ethernet networks. Rare enough that in order to test RAAT’s robustness on poor networks, we must use simulation tools to test these situations.

Could a higher quality switch with larger internal buffers theoretically reduce packet loss in extreme situations? Sure, but that has nothing to do with whether it is managed or not. Since packet loss isn’t the problem here, I don’t think this is the road to go down.

  1. Why Roon does not have any indicator of stream quality? Perhaps you may use TCP for passing checksums.

UDP has a checksum mechanism built in to prevent the delivery of corrupt packets (corrupt packets are dropped before they reach our code), so additional checksums on our end are not needed–detecting drops is sufficient.

We do use TCP to report dropouts back to Roon when they occur. Dropouts are tracked and logged. If a concerning quantity is detected, Roon ends playback, displays a message to the user, and moves onto the next track.

8 Likes

@brian, thank you very much for comprehensive answer! Could you also tell us your opinion why Roon sounds very different from MPD on same device? Perhaps Roon clocking is main cause of different sound? I would characterize Roon sound as more tight - perhaps a bit too tight for some poor records.

When we set up MPD and RAAT as closely as possible (i.e. all DSP disabled, no software volume controls, media in the same place, same DAC connected the same way, …), we don’t hear a difference.

Under those conditions, both are bit-perfect and the same clock is running the show, so any differences are likely to be induced via an indirect mechanism.

I would be interested to see some measurements (or a blind ABX trial with a reasonable population) that quantified the differences more concretely. Since the bits + the clock are the same, any work done on to change the sound is likely going to be outside of Roon’s domain. Remember: our primary purpose is to deliver the audio data to the hardware, leaving the hardware manufacturer in control of how that data is turned into sound.

One notable difference: with Roon+RAAT: the media file is being decoded in the Roon Core. With MPD decoding work is taking place on the BDP. That could change the pattern of resource usage on the CPU inside of the BDP.

1 Like

Thank you very much!

When you say the media file is being decoded, are you referring to a FLAC file or similar being converted back to a WAV format? Just wondering if there is a file format I could use that would eliminate this as a difference?

After decoding in Core, Roon sends off the audio as a regular (L)PCM stream, ready to be accepted by your DAC.

Since the Core is doing the heavy lifting (=decoding), the endpoint only has to receive the PCM stream and pass it on the the DAC for D/A conversion.

ANY format (apart from DSD) is decoder to PCM in the end. MPD does the decoding in the endpoint itself. That’s an architectural difference.

1 Like

With WAV there is no decompression. That does not completely remove the differences, but it removes part of one difference.

What’s interesting about this, however, is that MPD should be using more CPU than RAAT, since it is not only locating / loading the audio file (e.g. from a USB drive or NAS share) but it’s also doing a FLAC to PCM conversion, for example, before submitting to ALSA for playback.

However, watching the CPU on a Bryston BDP-1 it appears, in general, that RAAT is using more CPU than MPD during playback. I’m speaking anecdotally, comparing FLAC or AIFF file playback. It’s not a lot more, but it’s more, unless you compare ALAC which is costly to transcode. Obviously, YMMV depending on the machine, network, audio container being used, etc.

Did you look at the BDP-1 in such a way that you could isolate RAAT as a process? Or is that a general CPU level for the BDP-1?

Yes. There is a specific process called raatapp that runs the Roon endpoint code.

RAAT and MPD are really very different systems. One is a media player. The other is audio distribution infrastructure designed to produce interoperability across dozens of manufacturers + DIYers with the absolute minimum of required firmware updates in the wild.

RAAT uses a dynamically defined network protocol. This means that we can fix bugs and make improvements to code running on RAAT devices without waiting for each company to release firmware updates, and without forcing users to go through manual update processes. Dynamically defined network protocols present a different workload than inflexible “baked-in” protocols, and also a different workload than decoding media (which often benefits greatly from CPU features like MMX/SSE).

Exactly how each kind of thing plays out on each device varies based on too many factors.

The BDP-1, in particular, is somewhat unique among Roon Ready devices: it uses a rather primitive embedded x86 CPU. It is cache poor, and fairly archaic in its approaches to branch prediction, out-of-order execution, etc. It does have MMX, though. This combination of tradeoffs would make a workload like RAAT (primarily protocol unpacking, interpreting dynamic network protocol, packet reassembly) more challenging for that CPU than a workload like FLAC->ALSA (MMX and memcpy).

Overall, I don’t put huge stock in this sort of comparison. Assuming Bryston has done their job well, and I believe that they have, modest differences in CPU usage should not be allowed to impact the sound quality. There is a lot of “it depends” in constructing a test and measuring this sort of thing. I brought up “differences” in CPU usage patterns to draw attention to the fact that RAAT and MPD take extremely different approaches to audio playback, not to suggest a mechanism of causation for a particular anecdotal report.

This is a good overview of the design goals of RAAT.. It might be a little bit surprising how much non-audio-related stuff is required to make a protocol like this successful. Getting audio playback right is just table stakes.

4 Likes