RAAT and clock ownership

6 posts were split to a new topic: Grouped Playback exhibiting Clock Drift

This discussion raises some questions for me. I apologize if I have failed to understand the above-posted information.

I am currently using a Musical Fidelity V90 DAC being fed via USB by an i3 NUC running Roon Remote. I will be upgrading the DAC in the next year or two, and some of the ones I’m looking at have USB inputs and others don’t (SPDIF, AES/EBU). From the above discussion, it seems that converting the USB signal to SPDIF would add a clock to the chain, potentially robbing RAAT/ROON of full clock ownership. Should I stay away from non-USB DACs?

The conundrum for me is exacerbated by the fact that the Musical Fidelity DAC will only convert up to 96kHz streams when fed via USB, but will do up to 192 if fed SPDIF. I was considering adding a USB-to-SPDIF converter as an interim step to a DAC upgrade (and to see what the Musical Fidelity can do with that level of hi-rez - curiosity).

For example, if I was to eventually upgrade to a Berkeley Audio DAC, I would probably also use their Alpha USB to convert to SPDIF of AES/EBU, as their DACs don’t have USB inputs. Would I be compromising sound quality by converting the signal to SPDIF before it gets to the DAC that way?

Thanks for any guidance you can provide.

A lot depends on how the DAC handles clock. If the DAC has its own USB interface, it could be operating purely asychronously and using a free-running clock for the DAC chip. Audio quality will not be dependent on clock jitter, because there will be little, if any.

If the DAC uses an S/PDIF input (AES/EBU, TOSlink, and coax S/PDIF are all the same – use whichever you like best – I prefer TOSlink for isolation) then the clock must be derived from the incoming stream. A good DAC, like the BADA Alpha DAC, will use a PLL to lock its own internal low-phase-noise clock to the incoming data stream. No loss of quality. And adding an external USB-to-S/PDIF converter won’t make it any better – or worse.

If the DAC extracts the clock from the data stream without regenerating it (PLL + local oscillator) then it is possible that the quality of the clock in the S/PDIF could have an effect on the sound. I just don’t know how to find out how the DAC processes the clock. If the DAC has two separate clock oscillators then it probably phase locks a local clock to reduce jitter.

I am currently playing with the Breeze Audio DACs available on eBay for $60. They look like they have promise. At $60, what have you got to lose? They go up to 192/24.

At the moment it is somewhat arbitrary, there was a discussion that it should be user selectable, as only the user can state what the “best room” is for them.

I’ll see if I can find it.

Perhaps the simplest way to force the “master” would be to use the device that “owns” the zone as the master. The devices that are added to the zone to create a group are the slaves.

Hi Brian,

I found the discussion and have split it out to its own topic to improve focus and make easier for others to find.
User Selection of the Master Clock in Grouped RAAT Zones.
In which Brian discusses "first zone as the master ", have a read and if you have any comments / questions can you post them in that topic.

1 Like

Brian, just so I understand better, each stream output device (Roon Bridge) is self clocked and just requests more data (Double buffered? Ring buffer?) when it needs it. The “master clock” concept is only for groups, right? And I presume that each group elects one device in the group to be the master so each group has its own master rather than the Roon server doing sample rate conversion for n-1 devices.

So, as for clock sync between stream output devices in a group, one device is elected as master and does not need to do sample rate conversion. Since all the stream output devices have their own free-running clocks (unsynchronized) the only way I can think of getting things matched up is to measure the buffer-fetch request interval, use that to determine clock error, and then do non-integer-ratio sample rate conversion. This implies interpolation. Just wondering what kind of interpolator you are using? That’ll have an effect on sound quality.

Just wondering and wanting to understand better.

Thanks for all the information.

In thinking about this today I thought of the multiroom capabilities provided by the Naim (UPnP) devices. Their implementation works extremely well in terms of syncing, but I’m wondering if they’re doing some heavy DSP to make that happen. There is a maximum sample rate that can be streamed in multi-room and I’ve often wondered if that’s a limitation in the network throughput of the devices or their DSP capabilities.

Anyone know specifically what they are doing?

There is actually no request for data–the server is modeling the master clock based on its own periodic synchronizations and using that to drive the outgoing data rate. This technique simplifies the protocol-level differences between master and slave zones–since they can all use the same primitives for managing streaming. There’s only one extra command (“synchronize against remote clock”) used for the slaves.

There’s a few seconds of buffer at each endpoint, and the buffer is kept around half full–so if data momentarily comes too fast or too slow, there is time to bring the clocks in line without overrunning or underrunning.

The slaves are synchronizing against (“recovering”) the master’s clock using the same mechanism that the server uses to guide the transmission rate. Clock error measurements go through a slow-to-respond low-pass filter since systems like this are prone to oscillation or over-correction when measurements are noisy. Each slave knows how “ahead” or “behind” it is, and it can adjust accordingly.

Async SRC is one technique that can be used, but not the only option. Our default implementation uses a technique called “stuffing” and “dropping” samples–basically, inserting or removing individual samples. Our implementation is somewhat improved compared to the typical one since it tries to locate positions in the stream to perform insertions/deletions that will be less audiblw, and it uses an RNG to position the corrections, since periodic sounds are easier to pick out.

I prefer that approach, since corrections aren’t impacting the audio except when they’re happening (with async SRC, there is a constant effect, and doing async SRC at very high quality levels is very expensive). It’s also more practical to use this approach on low-powered endpoints that don’t have much CPU headroom.

We’ve been toying with the idea of moving the drift adjustments to the server, which could allow for more intensive/expensive techniques, since we have more CPU resources available. There are also some aspirations of maybe supporting grouped playback across different technologies, which would almost certainly lead us in this direction, since everyone’s system works a bit differently.


Brian, thank you for that information. I understand a lot better now. It is interesting to see how you think in your architecture design process.

I was going to bring up the problems with interpolation associated with sample-rate conversion (SRC). Stuffing/deleting samples is pretty simple and doing it at random times seems prudent. I haven’t heard any artifacts from the process, other than the gross sync failure I have been experiencing. (I do suspect a bug. I haven’t partitioned my network to test the simple case yet.)

I am interested in your actual synchronization protocol. I would normally assume that the receiver would signal it needs another buffer full of data and the server would send it (pull protocol). Since the server knows how often the master fetches data, it knows the data rate. If I now understand properly, the server “pushes” the data at a synchronized rate to the slaves. Ah! The slave then just needs to notice over time if the buffer is growing or shrinking, and randomly duplicate/discard samples to keep the buffer the same size over time. How elegantly simple. I’m impressed.

This is a hard problem. Thank you for sharing your thoughts. I am learning.

I once worked on the design for a last-mile point-to-multipoint wireless system that bounded latency variation so it would support telephony. I suspect the problems are similar.

FWIW, I was going to put together a team to build the music server I wanted. I had started on the design when Demian Martin suggested I look at Roon. You are the only company with a product that does what I was going to do. And $500 is a lot less than the $500k I expected I would have to spend to get the job done properly to get an initial release out. :slight_smile: It also means I will be doing something else and letting you do what you clearly can do better than I. (You have done so much with metadata! I am trying to figure out if you have cross-referenced all the artists/performers where you can.)

So if the way RAAT is designed means that the only ‘relevant’ clock is the one implement in the DAC itself (assuming Roon endpoint to USB DAC), all those esoteric USB re clockers and high end clocks on dedicated PCIe USB boards that one can find are, technically, performing no useful function

Is that correct ?

I have been wondering about that myself, particularly after hearing Hans Beekhuyzen gush (very unusual for him) over how good the new SOtM sMS-200 Ultra with the sCLK-EX2425 clock, while listening to Roon. As I understand it, the only real difference in the normal sMS-200 and the new Ultra version, is the reclocking that it does. And while he liked the orignal, he was effusive about the reclocking Ultra version. So by all appearances, that reclocking made a significant audible difference. Even while using it as a Roon-Ready network endpoint.

I understand a “better” clock is also a feature of the UltraRendu (cf the original micro, which I have) and again there are plenty of people out there who feel there is an improvement

Looking forward to the Roon Techical Team’s take on this !

RAAT is a network protocol. It delivers data from a server (the Roon Core) to a device–for example–in the purest case–a networked DAC. When we compare clock relevancy here, we are comparing apples to apples against other network protocols–many of which make the computer’s clock an inherent part of the audio chain in a way that RAAT avoids.

When you start to insert additional elements–USB, or an S/PDIF generator, or a “USB re-clocker” or whatever–it’s important to think critically, thoroughly, and specifically about how each of those aspects work. These considerations are totally independent of RAAT, and are characteristics of those other systems. RAAT does not extends its fingers into your USB DAC and fundamentally change how USB works.

One of the most frustrating things about discussing clocking and related concepts in this setting is that it’s quite complicated, and there is a tendency to hand-wave, or to misuse or conflate terminology. A lot of people get their information from marketing sources that sometimes play fast+loose with the technical details.

For example, in your question, you are talking about totally kinds of clocks which impact the system in totally different ways and thinking that maybe that our discussion of one kind of clock applies to the others just as well. This confusion is partly on you, and partly on us, but it is a good representation of the general state of affairs.

Lets remove RAAT from the equation for a second and talk about a typical USB Audio 2.0 playback case:

  • Computer connected to USB Audio 2.0 device
  • Asynchronous clock mode / Isochronous data transfer mode
  • USB interface inside of the device communicates to a DAC chip using I2S with an MCK wire

There are many clocks in this system:

  • A clock in the USB interface that pushes USB packets onto the wire one bit at a time.
  • The system clock in the computer, which governs the operating system scheduler, which decides when who’s code gets to run on the computer.
  • The CPU’s cycle clock, which determines when individual CPU instructions run
  • A clock in the computer which determines when isochronous USB packets are generated (each contains 125us of audio–so we are not talking about sample resolution here)
  • A clock in the DAC that drives the USB interface via the MCK line and helps form the actual edge transitions on the I2S wires that feed the dac.

When audiophiles talk about clocks, they are usually talking about the last one. The most common explanations for “why jitter matters” only apply to the last one. When we talk about RAAT driving audio playback with the appropriate clock, we’re talking about the last one too.

USB enhancements, on the other hand, have no bearing on that clock. They’re concerned with other aspects.

The other ones are there, doing clock-y things, too of course…

  • If the first one is out of spec, your USB device will fail to communicate with the computer
  • If the second one stops counting time, the music stops
  • If the third one is running too slow, the CPU might get to 100% and fail
  • If the fourth one isn’t firing at the right rate, you might get dropouts

So they’re not totally uninvolved. All must be working properly for you to hear the music.

There is a relatively well understood concept about how jitter in the DAC clock only causes distortion during digital to analog conversion. I’ve seen this explanation mis-applied to USB re-clockers and other enhancement products often. That’s not to say that all clocks don’t have measurable jitter or that those measurements can’t be improved–it’s just that that their jitter numbers don’t relate to sound quality via the same mechanism.

According to the USB specifications, receiving devices are not supposed to care about these differences so long as they are within spec. If a USB device requires a computer to generate a USB signal that goes way beyond the spec requirements of USB in order to achieve its full performance, has the device designer really finished the job?

The point of these standards is to support free interoperation without these sorts of concerns–so I am always a little bit disappointed in the DAC when I hear that an “USB enhancement” product has made a big difference.

Finally…I’m a natural skeptic of claims that don’t come with a clear explanation of the mechanism of improvement, and that aren’t backed by either measurements or rigorous subjective testing. A great many products are made solely on the basis of someone’s theory + informal listening tests by a few people performed to “validate” it. I am not a huge fan of this method.

I prefer to talk about concrete engineering choices. For example–the one you refer to. AirPlay forces audio devices to conform to the computer’s clock rate, whereas RAAT drives transmission rates based on the DAC’s clock (the “last one” above). There’s no claim about differences in sound quality in that statement–if you understand the technical concepts it will be clear why ours is the better engineering approach–and this is enough of a reason to do it.

One last thing, since this topic made me think of it–Bearing in mind that John has a personal interest in USB enhancement products, he does a good job of exposing some of the complexity inherent in reasoning about USB in audio systems here.


Interesting subject and one of the things I’m confused about… For example, if we take the SOTM SMS-200 endpoint and its significantly more expensive brother the SMS-200 Ultra. The main difference seems to be a more advanced clock in the Ultra. But if the DAC is connected via asynchronous USB and therefore the DAC clock is in control … then how can the clock in the endpoint affect things at all?

1 Like

From re-reading this thread for, like, the third time, it seems that @brian’s statement is only true if the DAC is connected via USB to the network audio adapter (microrendu, squeezebox, raspberry pi, PC, etc.), or if the DAC is married to the network audio adapter in the same device, such as the network DAC he uses as an example. That is because, as I understand it, the Roon endpoint has access to the DAC’s clock info.

It seems like @gasman’s question remains unanswered, unless the answer already given amounts to: “maybe, but only if those auxiliary devices give Roon access to the modified/new DAC clock signal.” Is that a fair summary, @brian?

I think that’s what @gasman’s question was getting at: will Roon’s use of the DAC’s clock signal to pull the audio stream to the network audio adapter ignore the separate clocks or other USB enhancements? Since the answer seems like it might be “maybe,” perhaps you guys could point to some such “enhancement” products that are able to deliver the enhanced clock signal for Roon to use as the reference clock? Perhaps specific features/technology to look for in such products that would help us determine whether they are being ignored by Roon?

I don’t think the interest in this issue seeks to verify whether Roon Labs claims that RAAT makes a difference in sound quality. I think the interest comes from the fact that, in a sufficiently resolving system, nearly everything seems to make a difference in sound quality. If RAAT does indeed tend to make the sound “better,” then it makes sense for @gasman and others to ask whether reclockers or other USB enhancers get in the way of or are ignored by Roon.

I might rant about USB here, so feel free to stop reading. The “purest case.” That phrase was a painful reminder to me that USB DACs are the norm and network DACs are the exception. It is frustrating that so many USB bandaids seem to actually work. The simple fact is, though, that, in a sufficiently resolving system, everything makes a difference, even if all the equipment is up to specs. Just because a USB device is comfortably within specs doesn’t mean that its sound quality cannot be improved. It just means that USB (surprise!) is not a perfect audio transmission technology.

Since there is no such thing as a perfect audio transmission technology, it is perfectly valid for @gasman and us other sound quality fools to wonder whether we can make RAAT’s work sound better by playing with USB enhancers and other such crap.

So, what’s the scoop?

1 Like

RAAT never runs in a source-clocked mode–it is always locked against something, which in practice is the furthest clock in the chain that we can get access to. This is most often the DAC’s clock. Next most often, it’s the clock in an S/PDIF, AES, or UAC1.0 interface.

This summary is an example of the exact kind of confusion that I was warning against above.

If you think through the implementation details of these devices (if you haven’t, read and fully digest John Swenson’s link that I posted above, which explains one such product and provides a good framework for further understanding), it’s clear that these devices are proxying USB packets at a low level and not the USB Audio Protocol itself.

This means there is no “giving access” or “interfering” to consider. Roon is seeing the DAC’s clock through these devices, just as it would through any other USB hub.

To be 100% clear: the reclocking done in these kinds of devices has nothing to do with the audio clock. They are reclocking USB data transmission bits in layer that is not synchronously coupled to audio playback or the audio sample clock.

The reason why @gasman’s question is not getting a clear answer is because it used the word “clock” in a more imprecise way than in my original statement which he was referencing, and this created an ambiguity. I was talking about sample clocks, which are not the domain of these USB enhancer products.

This is why I went about explaining some of the different clocks in the system, instead of directly answering a confusing question.

The idea of “delivering the enhanced clock signal for Roon to use as the reference clock” flat out does not make sense. It’s just not how this stuff works. It’s not really possible to have a clear discussion about this if the concepts of a USB/data transmission clock and the audio clock are blurred together.

Except–this isn’t actually a question about RAAT. The answer is self-evident just looking at how USB Audio works in isolation. USB enhancers are about compensating for some shortcomings of USB, and don’t really interact with RAAT at all. If they have an effect at all, that effect will probably be the same with RAAT as with other transmission mechanisms.


A few interesting posts below by John S touching on system clocking, before the DAC.

The most interesting comments to me:

“As a side point to all this I am just starting some fundamental research into how phase noise from clocks in different parts of a system interact and move around a system. The ultimate goal is to figure out how to prevent any of this from getting into a DAC, which make all of this irrelevant. At this point I don’t know how long any of this will take, but that is where I am headed.”


“The clocking part of this is a bit unknown at this point. I am currently working on the analysis of this, which means I have to build some of my own test equipment. At this point I’m not ready to give full blown details on this, but preliminary results are indicating that clocking of any data stream coming into a digital device brings along the phase noise of the clock used to produce that stream. Packet systems such as Ethernet and USB complicate this dramatically because the data comes in packets, in between the packets there is no data to pass clock phase noise. Note that phase noise is another way to look at jitter, I’m NOT talking about amplitude noise here. The implication of this is that putting a very low phase noise clock on the last switch is not sufficient. The effect of the phase noise of whatever else is going into the switch also makes its way into the clocking of the switch. So just improving the local clock helps, but is not all there is to this. Ideally you should get rid of the clock influence from other sources as well. Please don’t ask for details about this, I have a LOT more research to do before I’m willing get into more detail…”

Links to the different posts:


By the way, I don’t post the above as some sort of rebuttal to Brian’s posts (just in case it appears that way).

Brian obviously smashed the ball out of the park (I thought) regarding Roon and RAAT and it’s role/s in clocking - who would know better than the RAAT Guru himself.

I only shared the above posts from John S, because it touches on system wide clocking, before the DAC.

We’ll have to wait to see/hear more from John S about if he is able to ‘clock block’ (LOL) and if/what the effects on SQ are. As per John’s own words, he’s still researching, after building his own custom testing gear.

Thanks @brian.

I’m going to reread this all carefully until I can pretend to understand it.

If I were listening to “Kind of Blue” while falling into a black hole and wanted to preserve low jitter SQ right up until I died of spaghettification, should the Roon Core, DAC, microRendu or my listening chair be closest to the front of the spaceship ?