RAAT and clock ownership

From the RAAT vs AirPlay thread @Brian stated -

I’ve got a few questions. But they don’t relate to AirPlay specifically, hence the new thread.

Can I get confirmation that the advantages listed above happen when a Roon device is communicating directly (network connected?) with a receiving RoonReady DAC?

And if so, does this same advantage apply if there is a non-DAC RoonReady endpoint (ex. a Sonicore SonicOrbiter SE) between Roon and a DAC? In that case - if the endpoint is USB connected to the DAC - the DAC will own the clock for data transfer between the endpoint and the DAC. So what benefit does RAAT confer to communications with the the non-DAC endpoint, if any?



Yes, the advantages happen in cases with an embedded DAC, a USB bridge, or an S/PDIF bridge.

Most discussions about audio clocking focus on clock quality, not clock coherence. This discussion is about the latter. The word “jitter” doesn’t belong anywhere near this discussion. RAAT has no impact in that domain. It moves buffers of audio asynchronously, just like USB. It is not involved in generating clock signals to drive DACs.

In the best possible system architecture, you’d have one clock: a high-quality clock near the DAC. This clock is responsible for both clocking out data accurately (low jitter, among other things), and for setting the pace that buffers of data flow through the system.

With AirPlay and systems like it, there are two clocks. One running on the computer, which decides how quickly to send buffers out over the network, and another running near the DAC, which actually drives the digital-analog-conversion process.

So lets say, the DAC’s clock is running at a perfect 44100.000. The computer’s clock might be at 44100.005. This seems like a small difference, but over even relatively short periods of time, they will drift relative to each other. In this case, since the computer is faster, data will tend to “pile up” in the AirPlay device.

Obviously, AirPlay devices don’t have unlimited RAM to let the data pile up (nor do they have time-travel chips to address the case where the computer is sending buffers too slowly). So they have to somehow resolve this mismatch in the rate of data flowing in and out.

These are the typical solutions:

  • Measure the clock discrepancy and use that information to instruct the clock next to the DAC to speed up or slow down to match the rate that data is arriving–best approach, but expensive to implement. This degrades the signal because there is distortion when the clock speeds up/slows down, and because the clock is no longer running at exactly its intended rate.
  • “Stuff” or “Drop” samples from the audio stream in the digital domain. This can be done well or poorly, but clearly degrades the signal during corrections since sample data is being dropped or synthesized.
  • Perform an asynchronous sample rate conversion from, in our example, 44100.005hz to 44100hz. This degrades the signal the whole time that the conversion is running to a degree that depends on the quality of the sample rate conversion algorithm.

RAAT works differently from AirPlay: the clock near the DAC requests data from RAAT at the rate that it requires it. Whether this is happening over USB, or internally to a DAC with directly implemented RAAT support is irrelevant, since both mechanisms allow the clock near the DAC to control the flow of incoming data.

RAAT includes a mechanism that allows Roon to model the device’s clock. This is done by exchanging a few network packets every few seconds. Roon internally models the device’s clock based on synchronization data from these exchanges, the system clock on the Roon machine, and a model of the drift between them. This is just an estimation, but that’s OK, since RAAT has an internal buffer. The point is making sure that that buffer doesn’t under-run or over-run. As long as the computer’s concept of the clock that RAAT is dealing with is within ~1s or so, everything is OK (in reality it’s usually within low hundreds of microseconds because this clock synchronization mechanism is also used for zone synchronization).

Because Roon is sending data at the right rate, none of those potentially degrading solutions are required when sending audio to a single device.

Astute readers will note that S/PDIF has the same issue as AirPlay, since it is clocked at the sender. This is why DAC’s often employ asynchronous sample rate converters or buffers + bendable clocks in their S/PDIF input stages. In particular, asynchronous sample rate converters in DACs can be accomplished with a lot less harm if they are done as part of an existing oversampling process.


The benefits are consistent regardless of whether the receiving device is a DAC or a USB bridge (network streamer to USB device).

I’m going to define STREAMER as “a device that receives audio over the network”.
I’m going to define PLAYER as “Roon, or any other transmitter of audio over the network”.


[PLAYER] ---- [STREAMER with internal clocking crystal]
PLAYER pushes audio using it’s own clock, STREAMER pulls audio using it’s own clock. BAD: 2 CLOCKS FIGHTING!

[PLAYER] ---- [STREAMER connected to USB device that has its own clocking crystal]
PLAYER pushes audio using it’s own clock, USB device pulls from STREAMER using its own clock, STREAMER feeds its buffer of audio from the PLAYER to USB as it requests it. BAD: 2 CLOCKS FIGHTING!

The problem here is there no “flow control” – I describe this more here: What’s wrong with UPnP? (or read Brian’s post above).


[PLAYER] ---- [STREAMER with internal clocking crystal]
STREAMER pulls audio using it’s own clock, PLAYER pushes it out on-demand.

[PLAYER] ---- [STREAMER connected to USB device that has its own clocking crystal]
USB device pulls audio from STREAMER using its own clock, STREAMER requests it from PLAYER on-demand, PLAYER pushes it out on-demand


[PLAYER] ---- [STREAMER with internal clocking crystal]
Signal Chain shows anything going on inside STREAMER

[PLAYER] ---- [STREAMER connected to USB device that has its own clocking crystal]
Signal Chain shows anything going on inside STREAMER, AND exposes any information about the USB endpoint if it is a Roon Certified USB Device.

Now, you may have noticed that in the above, I never mentioned a DAC chip or S/PDIF – this is because both use a clocking crystal, and the problem here is the impedance mismatch between the 2 clocks, not what is done with the data on those clocks. This dual clock issue is the core problem with clocking… two clocks will never align and allowing for 1 clock on the entire system will never introduce hacks to fix this problem (see above link).


Wow, gents! Knocked out by the detail of the reply. Thank you. :smile:

Going to take me a bit to digest this though, and may have follow-up Q’s (or maybe other members?). More soon. Thanks!

OK, hopefully I’ve digested this. So I’ll reply within the context of the OP.

Yes, but that’s not the only way you can get the advantages…

Because that can work very well also - the demand pull from the DAC’s clock is passed to the streamer, which in turn pulls from the player (Roon) at the rate requested by the DAC.

The only problem with this second scenario is that no two clocks can never synchronize perfectly, much less three. So adding the streamer in the middle adds one more layer of possible timing discrepancy that would be avoided if the DAC’s clock were communicating directly with the player (Roon) instead. But - as I understand it - this only applies to asynch USB connections between the streamer and the DAC. The use of S/PDIF from streamer to DAC takes the DAC’s demand pull out of the scenario, and the driving clock is instead with the streamer. That is it’s own problem, but not unique to the OP RAAT question.

Please correct me if I’m wrong. And thanks again for the detail reply.

A RAAT streamer never adds an source of clocking discrepancy.

Instead of thinking of the streamer as an extension of the device, think of RAAT as an extension of Roon that gives Roon a presence on the streamer.

The reason for this: Stacking multiple layers that pull data from each other does not create new clock sources so long as the data flow is controlled by the last layer in the stack.

RAAT does insert an extra layer (or 5, see below) of “pulling”, but there are many other layers of “pulling” that are also just left out of the discussion. That’s because adding layers of “pull” to a system is fundamentally benign. Extra clock sources arise when someone in the middle of that data flow decides to start pushing data at their own speed.

In RAAT’s architecture, there is no pushing, ever, going on in RAAT or Roon during single-zone playback.

(During multi-zone playback, one of the zones “pulls”, and Roon “pushes” to the remaining zones using to the clock recovered from the first zone to control the rate. This is unavoidable, since there is no way to force multiple independent clock sources to agree, so in that case we need to use drift compensation techniques like those I described above.)

Just to give you an idea of just how many layers actually exist (100% chance this is incomplete and reductive):

  • Dac clock
  • USB interface on the device
  • USB driver subsystem on the streamer
  • Audio driver subsystem on the streamer
  • RAAT application on the streamer
  • Internal buffer within RAAT
  • Network subsystem within the streamer
  • Ethernet cable
  • Network subsystem within the player device
  • Roon application within the player device
  • Internal RAM-buffer within Roon application
  • Network subsystem on player device
  • Network router in your house
  • N more routers on the internet
  • TIDAL’s load balancer for distributing content
  • Network subsystem on TIDAL’s content server
  • Application on TIDAL’s content server
  • Disk subsystem on TIDAL’s content server
  • SATA interface hardware on TIDAL’s content server.
  • SATA interface on an individual hard drive
  • Cache/Buffering system within the hard drive
  • I/O scheduler within the hard driver
  • Magnetic read head within the hard drive

If we aren’t worried about the fact that data is being pulled through all of those layers, we shouldn’t be worried about the fact that it’s being pulled through RAAT, either.


Thank you for the correction Brian. So what does this imply for the quality of clocks (all of them) in a RAAT implementation, other than the quality of the DAC’s clock?

For it’s starting to sound like we don’t need to be overly concerned about the quality of any clock but the DAC’s, because RAAT is sorting that out for us. Is that a true statement? Would be awesome news, if so.


Yup, that is correct.

Awesome! That’s a MAJOR selling point for RAAT (and Roon) IMO. Thank you sir. :slight_smile:


I’m glad someone else had the questions I had and that they were professionally answered. Good job Team Roon!

I think that this “answers the mail” for the most tragically technical. Now, I am looking for exploitation. :smiley:

@brian and @danny One question that has always troubled me about this (since CD players arrived in the 80s): the original design, even part of the SPDIF spec, is that the source controls the clock. I have always thought this was supremely stupid, as it led to those super-expensive CD players with heavy platter mechanisms. You want pico-second stability for a digital system, and you try to achieve that with a mechanical device? I always felt that this was influenced by vinyl turntables.

And Meridian indeed used a high-speed DVD mechanism and read it asynchronously.

Why didn’t they do that? Why was SPDIF so backward?

Specifically, in your case, why not use a pull model, where the remote requests data as it needs it?

I realize this wouldn’t support multi-zone sync. Is that the only reason? Or there other reasons a pull model wouldn’t work?

Agreed about S/PDIF. I have guesses as to why they “got it wrong”, but I don’t know for sure.

Roon/RAAT are based on a pull model. I’m not sure where my explanation went wrong to cause confusion.

(Meridian’s streaming protocol is also a pull model).

For multi-zone, we run in pull mode with the zone that has been elected as clock master and push to the other zones, which are forced to compensate for drift internally.

The actual implementation is slightly indirect. The core knows how quickly to send packets to an endpoint not because of an explicit request for data, but because the core knows what time it is at the high-resolution clock that’s driving the audio stream, and it understands the intended relationship between wall time and stream time. It behaves like a soft real time system based on those time relationships + periodic synchronization with the endpoint clock.

This turns out to be much more elegant than explicit pull requests, and lets single-zone and multi-zone cases share virtually the whole implementation. There is one extra API for slave zones, which basically tells them to go sync with a clock source of the server’s choice and adjust accordingly.

(The Meridian protocol actually uses explicit “pull” requests and has explicitly different flows for single and multi-zone cases. That works too, but that protocol has about 2.5x as much surface area, a lot more chatter, and the lesser-used multi-zone paths never got enough testing and were/are a constant source of trouble.).


That’s very elegant. Thanks.

(Your explanation was quite clear, I skimmed it and asked too quickly.)

Very interesting insights and discussion ! Thank you very much @Brian and @danny.

I was also interested by What’s Wrong With UPnP

Would be very usefull to write a synthetic short paper with simple diagrams (for the dummies like me :smiley: ) to explain Roon / RAAT architecture and protocol, differences with others solutions (mainly UPnP / Airplay and perhaps LMS) and how these technicals points conducts to a better global user experience.

1 Like

@Brian - thanks for confirming that. But now I’ve had a chance to digest it a bit more, I wonder… might that be incorrect in any situation where you have a RoonReady endpoint (aka streamer) with a S/PDIF connection to the DAC?

I would assume that - in that case - all RAAT benefits would stop at the endpoint, and the quality of the S/PDIF communications with the DAC would be governed by the normal set of S/PDIF concerns. Is that true?

So, to make sure I’ve got you correctly. In this chain, Roon Computer > ethernet> Squeezebox > coax S/PDIF > DAC, the S/PDIF out from the Squeezebox is being clocked by the Squeezebox and then asynchronously sampled at the DAC?

Yes, the S/PDIF signal delivers samples at is own pace. Asynchronous resampling is one technique used by DACs to cope with that, but there are others, too.

It’s possible to simply ignore the problem. This is a consumer-grade solution, but you can totally just use the incoming S/PDIF signal to drive the whole process and ignore (or omit) the internal clock.

Some DACs slowly adjust their internal clock to adapt to the incoming rate, using a small buffer to prevent overruns/underruns, and then re-clock the data out. I know that Meridian products work this way but they are not the only ones. I took a guess that MSB used a similar approach (knowing that they make a ladder DAC) and their marketing materials suggest that I’m correct.

DACs that already have a big over-sampling stage built in can reconcile the clock discrepancy in their existing resampling process. The technical documentation for the ESS Sabre, a very common chip in USB DSD-capable DACs, discusses this in section III-B of this document. My understanding is that this is the most common approach for sigma-delta based DACs.

Why is this OK? Because this sort of conversion does not materially harm the signal quality since the oversampling ratio is very high. The ESS Sabre is re-sampling your signal asynchronously to something like 40mHz. That is much less significant to quality than going from 44100->44100.005.

I would assume that - in that case - all RAAT benefits would stop at the endpoint, and the quality of the S/PDIF communications with the DAC would be governed by the normal set of S/PDIF concerns. Is that true?

Of course. S/PDIF is still S/PDIF. There is no way to do anything about that from where we sit.

I keep ignoring this case because S/PDIF is legacy technology. There is no saving it from the limitations that were baked into its original design. It doesn’t support DSD without encapsulation, it has rate/bitdepth limitations that don’t admit all modern formats. It’s source clocked.

1 Like

Thanks Brian. Agreed too. Was not remotely trying to imply that Roon should fix something out of its control. Just trying to make sure that my understanding was complete. Thanks.

improved my understanding too.
Thanks Brian/Danny and all.