Yes, the advantages happen in cases with an embedded DAC, a USB bridge, or an S/PDIF bridge.
Most discussions about audio clocking focus on clock quality, not clock coherence. This discussion is about the latter. The word “jitter” doesn’t belong anywhere near this discussion. RAAT has no impact in that domain. It moves buffers of audio asynchronously, just like USB. It is not involved in generating clock signals to drive DACs.
In the best possible system architecture, you’d have one clock: a high-quality clock near the DAC. This clock is responsible for both clocking out data accurately (low jitter, among other things), and for setting the pace that buffers of data flow through the system.
With AirPlay and systems like it, there are two clocks. One running on the computer, which decides how quickly to send buffers out over the network, and another running near the DAC, which actually drives the digital-analog-conversion process.
So lets say, the DAC’s clock is running at a perfect 44100.000. The computer’s clock might be at 44100.005. This seems like a small difference, but over even relatively short periods of time, they will drift relative to each other. In this case, since the computer is faster, data will tend to “pile up” in the AirPlay device.
Obviously, AirPlay devices don’t have unlimited RAM to let the data pile up (nor do they have time-travel chips to address the case where the computer is sending buffers too slowly). So they have to somehow resolve this mismatch in the rate of data flowing in and out.
These are the typical solutions:
- Measure the clock discrepancy and use that information to instruct the clock next to the DAC to speed up or slow down to match the rate that data is arriving–best approach, but expensive to implement. This degrades the signal because there is distortion when the clock speeds up/slows down, and because the clock is no longer running at exactly its intended rate.
- “Stuff” or “Drop” samples from the audio stream in the digital domain. This can be done well or poorly, but clearly degrades the signal during corrections since sample data is being dropped or synthesized.
- Perform an asynchronous sample rate conversion from, in our example, 44100.005hz to 44100hz. This degrades the signal the whole time that the conversion is running to a degree that depends on the quality of the sample rate conversion algorithm.
RAAT works differently from AirPlay: the clock near the DAC requests data from RAAT at the rate that it requires it. Whether this is happening over USB, or internally to a DAC with directly implemented RAAT support is irrelevant, since both mechanisms allow the clock near the DAC to control the flow of incoming data.
RAAT includes a mechanism that allows Roon to model the device’s clock. This is done by exchanging a few network packets every few seconds. Roon internally models the device’s clock based on synchronization data from these exchanges, the system clock on the Roon machine, and a model of the drift between them. This is just an estimation, but that’s OK, since RAAT has an internal buffer. The point is making sure that that buffer doesn’t under-run or over-run. As long as the computer’s concept of the clock that RAAT is dealing with is within ~1s or so, everything is OK (in reality it’s usually within low hundreds of microseconds because this clock synchronization mechanism is also used for zone synchronization).
Because Roon is sending data at the right rate, none of those potentially degrading solutions are required when sending audio to a single device.
Astute readers will note that S/PDIF has the same issue as AirPlay, since it is clocked at the sender. This is why DAC’s often employ asynchronous sample rate converters or buffers + bendable clocks in their S/PDIF input stages. In particular, asynchronous sample rate converters in DACs can be accomplished with a lot less harm if they are done as part of an existing oversampling process.