RAAT and clock ownership

@brian’s post is exactly why I don’t understand all of the hullabaloo on CA surrounding external word clocks for streamers and the like. Especially when DACs don’t use those clocks.

1 Like

I found @brian’s initial response to my query to be very helpful in understanding this clearly complex topic, and am grateful for the level of customer involvement Roon Labs consistently demonstrates

It seems that in even the simplest PC audio chain then there are a number of “clocks” running at any given moment, not just the one in the DAC

One can reasonably hypothesise that the “better” (more accurate, less noise) any of these are then the “better” the decoding and playing back of digital music will be, theoretically and measurably and maybe even audibly

Which is then, presumably, where any SQ improvement comes from with “better” clocks in things like the SotM Ultra and when using various in chain USB enhancement devices

Better clocks? Sure. But a separate master word clock for DACs, streamers, and DDCs to use? Why? What devices other than the DAC use the clock rate the DAC uses?

These are good reads:

https://www.soundonsound.com/techniques/does-your-studio-need-digital-master-clock

http://pinknoisemag.com/pink-papers/pink-paper-002

I’m always nervous to wade into these topics…

This is true and, in fact, in many DACs there are multiple clocks running depending on the design of the product and its feature set. For instance a DAC with an integraded streamer card will have a clock (or clocks) managing the DAC itself along with a separate clock (or clocks) running the processor on the streaming card.

This, however, is not entirely true. RAAT operates at a higher level than the network hardware itself. To greatly simplify things it allocates a buffer on the core and another one on the endpoint. The endpoint buffer can be in a connected computer, streaming card, etc. The core puts data into its local buffer and the DAC pulls data out of its local buffer using whatever interface is appropriate. The core only sees its local buffer and the DAC only sees its local buffer. It’s up to the RAAT protocol to manage the transfer of data between the two buffers and to ensure that neither one is too empty or too full.

Why does this matter?

Well, the DAC can only pull data from its local buffer so it doesn’t care about the upstream plumbing. The core can only put data into its local buffer because it doesn’t care about the downstream plumbing.

RAAT manages the data transfer between the two and in order to ensure that all of the bits get from the core to the DAC intact it uses the lower level facilities provided by the network itself to ensure integrity. If a packet gets lost a new one is sent and dropped into the buffer in the right order. Corrupt? Same thing. In most cases the buffers are large enough so as to allow this error correction process to take place without the DAC running out of data or the core filling up its buffer.

Please keep in mind that I’m very carefully using the word data here because that’s what the music is at this point. It’s a stream of bits that are a copy of the file being played. This is an asynchronous process and has no impact on the analog output of the DAC. RAAT (and the network stack) are facilitating a very chatty conversation conversation between the core and the endpoint. They’re making sure that the file (up to the size of the buffers) is being copied from core to endpoint.

The clocks in use over the transfer media have zero impact on the sound coming out of the DAC as all the DAC is doing is playing from a local memory buffer. It doesn’t know, nor does it care, what happened to those bits prior to landing in the buffer. If the clocks in the switch or network interface meet the specification of the transfer mechanism then the data transfer process takes place with little or no issue. If they don’t then one has larger problems.

This is an asynchronous process. By definition the timing (clocking) of the transfer process is completely separate from the decoding process. The buffers will fill and empty at a variable rate, but as long as there’s enough in the DAC’s buffer playback will continue reliably.

The key thing to differentiate here is the difference between data and signal. The file being played doesn’t turn into a signal until its bits are actively making their way through the DAC in real time. Up until the point that they are pulled from the local buffer in the DAC (or endpoint) those bits are no different than those used for a document, image, or even this post.

I understand the audiophile desire to constantly try to “improve” things and that behavior really is the foundation of this hobby. That was easier to do in the past when most of the concepts related to something physical and changes were not only easily demonstrated, but could be backed up with some logical explanation. Digital isn’t like that at all as many of the concepts are counter-intuitive and the explanations are far more complex. Throw streaming into that mix and the problem gets worse. Most manufacturers of audio equipment have little to no understanding of how networks function and how the network may or may not have an impact on actual playback. Sadly, this has a negative impact on the consumers who are forking over their hard-earned cash for solutions to problems which don’t actually exist.

18 Likes

Yup. I am in WAY over my head, and I am disinclined to take on that learning curve. It seems ignorant for me to conclude this, but it’s the only conclusion this consumer can make without having cured the ignorance (and a purchasing decision in the thousands or hundreds of dollars demands an answer): avoid USB and SPDIF DACs and go for a network DAC. The old KISS principle at work again in audio!

Maybe the right USB or SPDIF implementation can easily be an exception to such an overgeneralized rule, but it seems that the only ways to know are either to audition at home (difficult these days) or to understand digital playback devices intimately (see above to gauge that difficulty). Also, maybe replacing USB with whatever network hardware is at play in a network DAC is just swapping one problem for another, but a buying decision still has to be made by us ignoramuses with limited access to in-home auditions. So we’re left with blunt decision-making tools like KISS and magazine reviews.

With any luck, more network DACs will be come available soon, and I won’t have to delay the money burning a hole in my pocket.

1 Like

Masterful explanation, thank you :hugs:

A follow on if I may?

Given your description does removing RAAT from the equation by having core and player in the same device (files on a NAS) make any SQ difference? Or in a functioning network environment is RAAT sufficiently efficient to keep the all buffers topped off?

Just my opinion (from highly subjective personal experiences only - no expertise) but just seeing an ethernet input on a DAC (even if it’s Roon Ready) doesn’t automatically mean this will be the best sounding or best optimised input for that particular DAC.

There’s quite a bit that goes into implementation and optimisation of a network card (a little computer) inside a DAC. One example: there aren’t many networked DACs that support DSD512 over ethernet yet - I think you can count them on one hand at the moment? I think for even DSD256 supported over ethernet, the number of DACs may still be countable on one hand.

We WILL get there eventually as manufacturers move away from legacy inputs and even away from USB but for right now, you still should listen to the ethernet input and compare with other inputs of the same DAC - unless of course it only has an ethernet input and nothing else.

That’s if you can be bothered comparing inputs for SQ. The convenience of having an ethernet input, particularly Roon Ready, is a key factor sometimes for some people - nothing at all wrong with that too.

SPDIF embeds the clock of the source. If the DAC hardware is not designed to reclock the SPDIF signal, the DAC clock is slave to it.

USB brings another class of noise problems that I won’t go into. Connecting a USB DAC directly to a Roon Core is not the most preferred way:
https://kb.roonlabs.com/Sound_Quality_in_One_Computer

A lot of Roon Ready devices already exist, covering a whole range of budget:
https://kb.roonlabs.com/Partner_Devices_Matrix

In theory, as long as the DAC has a sufficient amount of data on hand to perform decoding at the proper rate, the way that data buffer is maintained shouldn’t make a difference. The problem is that keeping a buffer full at the proper level is actually very difficult when you don’t control all of the variables.

Here’s a thought exercise… I’m going somewhere with this, I promise…

  • You have a tub that can hold a certain volume of water (that’s your buffer). Unfortunately, you don’t have a lot of room so your tub has to be relatively small.

  • The tub has a drain that removes water at an average rate (that’s your DAC)

  • The tub has a spigot that can be used to add water and you can use that spigot to adjust the rate at which water is added.

  • A certain volume of water has to go down the drain, but the tub can’t hold enough water to satisfy that overall need.

Now, what rate do you need to add water to the tub in order to make sure that it never over fills, but never dries out?

  • If the fill rate is higher to the drain rate then the tub overflows.

  • If the fill rate is lower than the drain rate then the tub dries out.

(both of these are bad with water and audio)

If the rate is exactly the same then everything is happy. Assuming you pre-fill the tub with some water to get the process started and then maintain the water at that level everything is fine.

Easy. Problem solved… nope.

Unfortunately (in this analogy) “exactly” means down to the molecular level (or every last bit). Problem is that while the drain may average 44,100 units of water per second the actual rate at any given time may be a little faster or a little slower (that’s jitter)

To make matters worse the spigot is on a shared plumbing system (the network) and other water users can have an impact at the volume of water that can be delivered to the spigot at any given time. You can set the valve on the spigot perfectly, but the minute someone flushes a toilet you’re screwed.

In order to maintain the system you need to actively monitor and adjust the valve on the spigot to maintain the fill rate to ensure that you never allow the water level to get below your established minimum. You also need to be sure that you make adjustments to ensure you don’t get above a safe maximum. You need to be both predictive in managing the high and low water marks as well as reactive to changes in water volume available via the spigot and minute changes in the actual drain rate.

That is what RAAT is doing. It’s modeling the clock rate of the DAC (the drain) to understand how the tub is being emptied. It’s also responding to network performance in order to ensure that the fill rate doesn’t allow a buffer over or under-run.

It’s even more complicated than that as there’s also a tub on the core side that “drains” through the network to the one on the DAC side and the size of the tubs is different depending on the sample rate involved.

Now imagine how complicated it is to group zones (especially if there are different sample rate requirements for each zone). The zone tubs need to drain in absolute lock step in order for this to work but you have no control over the pipes connecting the tubs!

Now, back to your specific question.

In reality, whether you are moving data over the network or within a device you need some way to manage the buffers. There are generic ways to do this and they do work, but RAAT was developed specifically for the needs of audio (everything that’s needed and nothing that’s not). It provides a standard interface to both sides of the equation and is easy to implement. Given that RAAT is very good at what it does and in the absence of a better way of doing it within ia single device then why not just use RAAT?

That is, in fact, what Roon does. Whether the data is traversing a network or playback is local to the core RAAT is still employed to manage the send and receive side of the chain. It may be internal to one piece of hardware or separated by some distance, but the protocol is the same.

Bingo, and the key really is a functioning network environment. That doesn’t mean one that is datacenter quality or with infinite bandwidth, but good enough to satisfy the needs of the audio data being transmitted along with any other uses at any given time. It also doesn’t mean that you need a bunch of tweaky devices and dongles along the way in order to make it sound better. It just needs to meet the standards and that’s not hard to accomplish (although a lot of audio-related network stuff [cables, filters, clockers, etc] pretty much ignores the standards).

Keep in mind that audio bitrates are nothing in comparison to the bandwidth available on a typical gigabit link (even a really crappy one). DXD is about 20Mbit/sec. DSD512 is around 45Mbit/sec. Gigabit ethernet is 1000Mbit/sec!

As long as the network is reliable and can handle the data rate then RAAT just works (and it works really well). Take away the reliability and it starts getting ugly.

As an aside, although WiFi has data rates far in excess of the audio stream requirements the nature of the way that WiFi works means that those speeds are only possible when you try to push a lot of data through at one time in big chunks. The way that the bitstream is metered out for audio (in near real-time) is the worst-case scenario and results in horrible efficiency… but that’s a story for another day :wink:

5 Likes

The tub analogy limps a bit because the network is not very much like a pipe connected to the spigot. At the bottom of the network stack you have Ethernet and IP, and those are totally asynchronous, they may have to retransmit, they do not even guarantee that packets arrive in the right order.

Higher layers in the stack, like TCP and UDP, take care of those things, from the perspective of normal software. But that’s why neither the clock nor the momentary data transfer rate have any bearing on this.

My old analogy: I order 24 CDs from Amazon every day, which feeds the average data rate since most CDs are an hour long. But the arrive in one Fedex package per day, so I have jitter of 24 trillion picoseconds, except on the weekend when it hits 48 trillion picoseconds. Why is this nonsense? Because the Fedex data stream is asynchronous to the audio system, as is the internet from Tidal, as is the Ethernet on my wires.

3 Likes

Yes but things are more complicated with classical.

6 Likes

UDP does not retransmit lost packets nor does it reorder packets that arrive out of order.

When it comes to RAAT, the tub analogy is just fine.

Not wishing to interfere in other people’s bathing habits, but RAAT switched from UDP to TCP as of build 234:

3 Likes

Yes, my mistake.
And as @rbm points out, Roon switched from TCP, which takes care of those problems, to UDP which leaves responsibility for retransmits and reordering to the RAAT layer.
Neither TCP nor UDP makes makes any promises about timeliness.
So rather than a water pipe where we can tune the flow, in the analogy, the network is more like somebody walking over with buckets of water and dumping them in the tub, when you call him, and sometimes he trips and spills the water and you have call again, and sometimes the buckets arrive in the wrong order (wait, what?).

Aside from the frivolity, there is a serious point here. The network has a very different architecture at the lowest implementation level, it is asynchronous and chaotic, it thrives on chaos, it derives its resilience and scalability from the lack of control. Some people talk of low jitter as a benefit of audiophile network cables, which is nonsense, can’t have jitter in an asynchronous stream.

1 Like

Other way around!

1 Like

Ok. Either way, my argument holds.

Brian - thank you for your explanation here on “clocks” I was not clear about this concept between the IT jargon and audio (as explained several times by so call experts in my local hi-fi dealer) . Your explanation has connected the dots in my mind as I now get in layman terms the so called “clocking” nexus between a typical PC theory and that of networked audio device like a Devialet. I agree the information out can be confusing as lot of the Sales people or some journalist don’t quite get this and prospective buyers like me who research the area before making a purchase leave the store with more confusion! - which really makes it difficult for a novice like me to understand what my requirements should be in my home environment. So thanks!

1 Like

Hey @brian

How/where does Roon’s new Chromecast support fit in with the above?

Just for those of us that are (sadly) interested in this stuff :confounded: :sob:

Better than all Airplay, Songcast & UPnP?

Chromecast owns the playback clock as in RAAT/OpenHome/UPnP.

1 Like

Nice, thanks mate