So, to make sure I’ve got you correctly. In this chain, Roon Computer > ethernet> Squeezebox > coax S/PDIF > DAC, the S/PDIF out from the Squeezebox is being clocked by the Squeezebox and then asynchronously sampled at the DAC?
Yes, the S/PDIF signal delivers samples at is own pace. Asynchronous resampling is one technique used by DACs to cope with that, but there are others, too.
It’s possible to simply ignore the problem. This is a consumer-grade solution, but you can totally just use the incoming S/PDIF signal to drive the whole process and ignore (or omit) the internal clock.
Some DACs slowly adjust their internal clock to adapt to the incoming rate, using a small buffer to prevent overruns/underruns, and then re-clock the data out. I know that Meridian products work this way but they are not the only ones. I took a guess that MSB used a similar approach (knowing that they make a ladder DAC) and their marketing materials suggest that I’m correct.
DACs that already have a big over-sampling stage built in can reconcile the clock discrepancy in their existing resampling process. The technical documentation for the ESS Sabre, a very common chip in USB DSD-capable DACs, discusses this in section III-B of this document. My understanding is that this is the most common approach for sigma-delta based DACs.
Why is this OK? Because this sort of conversion does not materially harm the signal quality since the oversampling ratio is very high. The ESS Sabre is re-sampling your signal asynchronously to something like 40mHz. That is much less significant to quality than going from 44100->44100.005.
I would assume that - in that case - all RAAT benefits would stop at the endpoint, and the quality of the S/PDIF communications with the DAC would be governed by the normal set of S/PDIF concerns. Is that true?
Of course. S/PDIF is still S/PDIF. There is no way to do anything about that from where we sit.
I keep ignoring this case because S/PDIF is legacy technology. There is no saving it from the limitations that were baked into its original design. It doesn’t support DSD without encapsulation, it has rate/bitdepth limitations that don’t admit all modern formats. It’s source clocked.
Thanks Brian. Agreed too. Was not remotely trying to imply that Roon should fix something out of its control. Just trying to make sure that my understanding was complete. Thanks.
improved my understanding too.
Thanks Brian/Danny and all.
This discussion raises some questions for me. I apologize if I have failed to understand the above-posted information.
I am currently using a Musical Fidelity V90 DAC being fed via USB by an i3 NUC running Roon Remote. I will be upgrading the DAC in the next year or two, and some of the ones I’m looking at have USB inputs and others don’t (SPDIF, AES/EBU). From the above discussion, it seems that converting the USB signal to SPDIF would add a clock to the chain, potentially robbing RAAT/ROON of full clock ownership. Should I stay away from non-USB DACs?
The conundrum for me is exacerbated by the fact that the Musical Fidelity DAC will only convert up to 96kHz streams when fed via USB, but will do up to 192 if fed SPDIF. I was considering adding a USB-to-SPDIF converter as an interim step to a DAC upgrade (and to see what the Musical Fidelity can do with that level of hi-rez - curiosity).
For example, if I was to eventually upgrade to a Berkeley Audio DAC, I would probably also use their Alpha USB to convert to SPDIF of AES/EBU, as their DACs don’t have USB inputs. Would I be compromising sound quality by converting the signal to SPDIF before it gets to the DAC that way?
Thanks for any guidance you can provide.
A lot depends on how the DAC handles clock. If the DAC has its own USB interface, it could be operating purely asychronously and using a free-running clock for the DAC chip. Audio quality will not be dependent on clock jitter, because there will be little, if any.
If the DAC uses an S/PDIF input (AES/EBU, TOSlink, and coax S/PDIF are all the same – use whichever you like best – I prefer TOSlink for isolation) then the clock must be derived from the incoming stream. A good DAC, like the BADA Alpha DAC, will use a PLL to lock its own internal low-phase-noise clock to the incoming data stream. No loss of quality. And adding an external USB-to-S/PDIF converter won’t make it any better – or worse.
If the DAC extracts the clock from the data stream without regenerating it (PLL + local oscillator) then it is possible that the quality of the clock in the S/PDIF could have an effect on the sound. I just don’t know how to find out how the DAC processes the clock. If the DAC has two separate clock oscillators then it probably phase locks a local clock to reduce jitter.
I am currently playing with the Breeze Audio DACs available on eBay for $60. They look like they have promise. At $60, what have you got to lose? They go up to 192/24.
At the moment it is somewhat arbitrary, there was a discussion that it should be user selectable, as only the user can state what the “best room” is for them.
I’ll see if I can find it.
Perhaps the simplest way to force the “master” would be to use the device that “owns” the zone as the master. The devices that are added to the zone to create a group are the slaves.
Hi Brian,
I found the discussion and have split it out to its own topic to improve focus and make easier for others to find.
User Selection of the Master Clock in Grouped RAAT Zones.
In which Brian discusses "first zone as the master ", have a read and if you have any comments / questions can you post them in that topic.
Brian, just so I understand better, each stream output device (Roon Bridge) is self clocked and just requests more data (Double buffered? Ring buffer?) when it needs it. The “master clock” concept is only for groups, right? And I presume that each group elects one device in the group to be the master so each group has its own master rather than the Roon server doing sample rate conversion for n-1 devices.
So, as for clock sync between stream output devices in a group, one device is elected as master and does not need to do sample rate conversion. Since all the stream output devices have their own free-running clocks (unsynchronized) the only way I can think of getting things matched up is to measure the buffer-fetch request interval, use that to determine clock error, and then do non-integer-ratio sample rate conversion. This implies interpolation. Just wondering what kind of interpolator you are using? That’ll have an effect on sound quality.
Just wondering and wanting to understand better.
Thanks for all the information.
In thinking about this today I thought of the multiroom capabilities provided by the Naim (UPnP) devices. Their implementation works extremely well in terms of syncing, but I’m wondering if they’re doing some heavy DSP to make that happen. There is a maximum sample rate that can be streamed in multi-room and I’ve often wondered if that’s a limitation in the network throughput of the devices or their DSP capabilities.
Anyone know specifically what they are doing?
There is actually no request for data–the server is modeling the master clock based on its own periodic synchronizations and using that to drive the outgoing data rate. This technique simplifies the protocol-level differences between master and slave zones–since they can all use the same primitives for managing streaming. There’s only one extra command (“synchronize against remote clock”) used for the slaves.
There’s a few seconds of buffer at each endpoint, and the buffer is kept around half full–so if data momentarily comes too fast or too slow, there is time to bring the clocks in line without overrunning or underrunning.
The slaves are synchronizing against (“recovering”) the master’s clock using the same mechanism that the server uses to guide the transmission rate. Clock error measurements go through a slow-to-respond low-pass filter since systems like this are prone to oscillation or over-correction when measurements are noisy. Each slave knows how “ahead” or “behind” it is, and it can adjust accordingly.
Async SRC is one technique that can be used, but not the only option. Our default implementation uses a technique called “stuffing” and “dropping” samples–basically, inserting or removing individual samples. Our implementation is somewhat improved compared to the typical one since it tries to locate positions in the stream to perform insertions/deletions that will be less audiblw, and it uses an RNG to position the corrections, since periodic sounds are easier to pick out.
I prefer that approach, since corrections aren’t impacting the audio except when they’re happening (with async SRC, there is a constant effect, and doing async SRC at very high quality levels is very expensive). It’s also more practical to use this approach on low-powered endpoints that don’t have much CPU headroom.
We’ve been toying with the idea of moving the drift adjustments to the server, which could allow for more intensive/expensive techniques, since we have more CPU resources available. There are also some aspirations of maybe supporting grouped playback across different technologies, which would almost certainly lead us in this direction, since everyone’s system works a bit differently.
Brian, thank you for that information. I understand a lot better now. It is interesting to see how you think in your architecture design process.
I was going to bring up the problems with interpolation associated with sample-rate conversion (SRC). Stuffing/deleting samples is pretty simple and doing it at random times seems prudent. I haven’t heard any artifacts from the process, other than the gross sync failure I have been experiencing. (I do suspect a bug. I haven’t partitioned my network to test the simple case yet.)
I am interested in your actual synchronization protocol. I would normally assume that the receiver would signal it needs another buffer full of data and the server would send it (pull protocol). Since the server knows how often the master fetches data, it knows the data rate. If I now understand properly, the server “pushes” the data at a synchronized rate to the slaves. Ah! The slave then just needs to notice over time if the buffer is growing or shrinking, and randomly duplicate/discard samples to keep the buffer the same size over time. How elegantly simple. I’m impressed.
This is a hard problem. Thank you for sharing your thoughts. I am learning.
I once worked on the design for a last-mile point-to-multipoint wireless system that bounded latency variation so it would support telephony. I suspect the problems are similar.
FWIW, I was going to put together a team to build the music server I wanted. I had started on the design when Demian Martin suggested I look at Roon. You are the only company with a product that does what I was going to do. And $500 is a lot less than the $500k I expected I would have to spend to get the job done properly to get an initial release out. It also means I will be doing something else and letting you do what you clearly can do better than I. (You have done so much with metadata! I am trying to figure out if you have cross-referenced all the artists/performers where you can.)
So if the way RAAT is designed means that the only ‘relevant’ clock is the one implement in the DAC itself (assuming Roon endpoint to USB DAC), all those esoteric USB re clockers and high end clocks on dedicated PCIe USB boards that one can find are, technically, performing no useful function
Is that correct ?
I have been wondering about that myself, particularly after hearing Hans Beekhuyzen gush (very unusual for him) over how good the new SOtM sMS-200 Ultra with the sCLK-EX2425 clock, while listening to Roon. As I understand it, the only real difference in the normal sMS-200 and the new Ultra version, is the reclocking that it does. And while he liked the orignal, he was effusive about the reclocking Ultra version. So by all appearances, that reclocking made a significant audible difference. Even while using it as a Roon-Ready network endpoint.
I understand a “better” clock is also a feature of the UltraRendu (cf the original micro, which I have) and again there are plenty of people out there who feel there is an improvement
Looking forward to the Roon Techical Team’s take on this !
RAAT is a network protocol. It delivers data from a server (the Roon Core) to a device–for example–in the purest case–a networked DAC. When we compare clock relevancy here, we are comparing apples to apples against other network protocols–many of which make the computer’s clock an inherent part of the audio chain in a way that RAAT avoids.
When you start to insert additional elements–USB, or an S/PDIF generator, or a “USB re-clocker” or whatever–it’s important to think critically, thoroughly, and specifically about how each of those aspects work. These considerations are totally independent of RAAT, and are characteristics of those other systems. RAAT does not extends its fingers into your USB DAC and fundamentally change how USB works.
One of the most frustrating things about discussing clocking and related concepts in this setting is that it’s quite complicated, and there is a tendency to hand-wave, or to misuse or conflate terminology. A lot of people get their information from marketing sources that sometimes play fast+loose with the technical details.
For example, in your question, you are talking about totally kinds of clocks which impact the system in totally different ways and thinking that maybe that our discussion of one kind of clock applies to the others just as well. This confusion is partly on you, and partly on us, but it is a good representation of the general state of affairs.
Lets remove RAAT from the equation for a second and talk about a typical USB Audio 2.0 playback case:
- Computer connected to USB Audio 2.0 device
- Asynchronous clock mode / Isochronous data transfer mode
- USB interface inside of the device communicates to a DAC chip using I2S with an MCK wire
There are many clocks in this system:
- A clock in the USB interface that pushes USB packets onto the wire one bit at a time.
- The system clock in the computer, which governs the operating system scheduler, which decides when who’s code gets to run on the computer.
- The CPU’s cycle clock, which determines when individual CPU instructions run
- A clock in the computer which determines when isochronous USB packets are generated (each contains 125us of audio–so we are not talking about sample resolution here)
- A clock in the DAC that drives the USB interface via the MCK line and helps form the actual edge transitions on the I2S wires that feed the dac.
When audiophiles talk about clocks, they are usually talking about the last one. The most common explanations for “why jitter matters” only apply to the last one. When we talk about RAAT driving audio playback with the appropriate clock, we’re talking about the last one too.
USB enhancements, on the other hand, have no bearing on that clock. They’re concerned with other aspects.
The other ones are there, doing clock-y things, too of course…
- If the first one is out of spec, your USB device will fail to communicate with the computer
- If the second one stops counting time, the music stops
- If the third one is running too slow, the CPU might get to 100% and fail
- If the fourth one isn’t firing at the right rate, you might get dropouts
So they’re not totally uninvolved. All must be working properly for you to hear the music.
There is a relatively well understood concept about how jitter in the DAC clock only causes distortion during digital to analog conversion. I’ve seen this explanation mis-applied to USB re-clockers and other enhancement products often. That’s not to say that all clocks don’t have measurable jitter or that those measurements can’t be improved–it’s just that that their jitter numbers don’t relate to sound quality via the same mechanism.
According to the USB specifications, receiving devices are not supposed to care about these differences so long as they are within spec. If a USB device requires a computer to generate a USB signal that goes way beyond the spec requirements of USB in order to achieve its full performance, has the device designer really finished the job?
The point of these standards is to support free interoperation without these sorts of concerns–so I am always a little bit disappointed in the DAC when I hear that an “USB enhancement” product has made a big difference.
Finally…I’m a natural skeptic of claims that don’t come with a clear explanation of the mechanism of improvement, and that aren’t backed by either measurements or rigorous subjective testing. A great many products are made solely on the basis of someone’s theory + informal listening tests by a few people performed to “validate” it. I am not a huge fan of this method.
I prefer to talk about concrete engineering choices. For example–the one you refer to. AirPlay forces audio devices to conform to the computer’s clock rate, whereas RAAT drives transmission rates based on the DAC’s clock (the “last one” above). There’s no claim about differences in sound quality in that statement–if you understand the technical concepts it will be clear why ours is the better engineering approach–and this is enough of a reason to do it.
One last thing, since this topic made me think of it–Bearing in mind that John has a personal interest in USB enhancement products, he does a good job of exposing some of the complexity inherent in reasoning about USB in audio systems here.
Interesting subject and one of the things I’m confused about… For example, if we take the SOTM SMS-200 endpoint and its significantly more expensive brother the SMS-200 Ultra. The main difference seems to be a more advanced clock in the Ultra. But if the DAC is connected via asynchronous USB and therefore the DAC clock is in control … then how can the clock in the endpoint affect things at all?