Diretta Measurements and Listening Tests

I am starting a new thread to discuss the measurements and [single-]blind A/B listening tests of the Diretta protocol, as a follow-up to the original thread:

I used to think the DAC was 90% of the sound. This sub-$300 DIY project proved me wrong - Tinkering - Roon Labs Community

I divided it into sections, so that they can be easily referenced in the discussions. There will be two main sections: measurements and listening tests. This post presents the former, with the latter to come in a later post when completed.

I would like this to be a fact-based, technical discussion of Diretta. If you are interested in a purely subjective take of the subject, please see the original thread.

0. Setup

This is the diagram of the setup used. It is based on @David_Snyder’s instructions, which you will find in the original thread.

The host box on the left is connected to LAN through a USB Ethernet dongle and is visible to Roon server through RAAT protocol. The dongle is necessary to create an additional Ethernet port on the host, as the onboard port is used to communicate with the target box on the right through Diretta protocol.

In a normal, Diretta-only setup, the red path is missing and the DAC connects to one of the USB 2 ports on the target box (the “Diretta ON” connection in the diagram). The signal path in this case is:

Roon → [RAAT] → host → [Diretta] → target → USB DAC.

The red path was used to be able to make measurements with Diretta protocol out of the picture. In that case, the DAC connects to one of the USB 2 ports on the host box (the “Diretta OFF” connection in the diagram), and the signal path is the usual Roon playback path:

Roon → [RAAT] → host → USB DAC.

(There was no USB switch used in the setup; switching was done by simply unplugging the DAC from one of the USB ports and plugging it into the other.)

The DAC used is a Topping D70.

1. Measurements

This section is further divided into two: power rail measurements for both 5V and 3.3V rails and DAC output measurements.

In all images, the diagrams on the left show the target box - i.e. with Diretta - and the ones on the right the host box - i.e. with RAAT only.

1.1 Power Rails

1.1.1 Power Rails Time Domain

These diagrams show the waveform of the power rails (yellow for 5V, blue for 3.3V) over a 17-or-so seconds at a time. I used AC coupling here so I can amplify and put both on the same screen. One time division is 10s, one voltage division is 50mV.

Diagram above was captured when nothing was playing. Diretta is definitely quieter here, with an amplitude of the noise is roughly half of that of the host for both rails. This is not exactly a fair comparison though, as host box’s USB3 port connected to the Ethernet dongle was working all the time, while there was no USB (and barely any network) activity on the target box. Still, spikes of activity can still be seen on the target. The 5V spikes are as high as the ones on the host (roughly 150mV pp), and the ones on the 3.3V rail, although smaller, appear quite random.

Same as the previous one, but when silence (more exactly, 32-bit dither) was played. The red horizontal bar at the top of the diagrams shows the playback interval, roughly one minute. We can see the noise on the target box rails “come alive” during streaming, while there’s not much change in the noise pattern on the host. The noise amplitudes on the target are now almost the same as on the host, although this is still not a fair comparison, since there’s one USB port working on the target and two USB ports on the host. Most interestingly, the spikes on both 5V and 3.3V rails on the target seem random to me.

(You can see that Roon is streaming silence for about 5 seconds after playback is stopped. That’s the time after which the signal indicator light “turns off”.)

Same as the previous one, but when death metal is played instead of silence. (I used the title track from Fractal Generator’s Macrocosmos album. That is loud!) I was curious if the streamed data had any impact on the noise. It doesn’t. I think it’s plenty apparent here that the bits that are streamed cannot be called “music” in any shape or form; they may sometimes be more ones or more zeroes, but the physical layers involved in transmission work just the same.

In the capture above, I zoomed 10x on the time axis to get a more detailed noise waveform. The division is now 1s and the span is about 17s. The noise on the target still looks quite random to me. The big spikes appear to happen more than one minute in between.

Apart from the slightly smaller amplitude of the noise on the target relative to the host - which is most probably due to one less USB port being active - I don’t really see anything that sets Diretta apart from RAAT. Let me know if I am missing something.

(I will perform another measurement of a RAAT-only box using only the onboard Ethernet port and with only one active USB, to see if it brings the noise to a level comparable to the one on the target box.)

1.1.2 Power Rails Frequency Domain

The following images show the spectrum of the power rail noise. I used DC coupling for these, to make sure I didn’t miss any low-frequency noise.

The image above shows the spectrum of the 5V rail noise between zero and about 106Hz. One horizontal division is 6.25Hz. As usual, target box is on the left; top graphs are captured during silence playback and the bottom ones during death metal. I am not seeing any significant upward spikes here, or any difference between the two boxes.

This image shows the 5V noise spectrum between about 15.5kHz and 20kHz. One horizontal division is 250Hz and the center of the frequency axis is at 17.75kHz. I’m not seeing any relevant spikes or differences here either.

This one captures the full spectrum between zero and 22.5kHz, so it fully contains the audible spectrum. One horizontal division is 1.25kHz and the center of the frequency axis is at 11.25kHz. Looks the same to me.

The following 3 images show the same thing as the previous 3, but for the 3.3V rail. Let me know if you can spot any differences between the rails or between target and host.

This concludes the power rails measurements.

1.2 DAC Output

Now comes the interesting part: measuring the actual output of the DAC! At the end of the day, this is what we are actually listening to - if of course we ignore all the artifacts of the analog chain. Again, the left side in each image is through Diretta and the right side through RAAT only.

The image above shows the output of the DAC when silence (32-bit dither) is played. This shows the DAC at its most silent; I think the zero-detect feature of the chip is muting the output.

I think this is the best proof that Diretta’s rationale for a two-box-solution falls flat on its face. Every single component of the Topping D70 DAC is under one single box, including the power supply. This means that there’s a 110V/60Hz “signal” entering the box at all times. It’s a very high voltage at a very audible frequency. Still, the only indications there’s a PSU inside are the tiny peaks at 60Hz, 120Hz and 180Hz. The highest is somewhere around -150dB! If the PSU noise is suppressed to this degree, what do you think happens with EMI/RFI noise measured in millivolts and extending into higher frequencies? The fact that everything is in one single box doesn’t seem to be of any practical consequence.

This are the output spectrums for a 1kHz sine. They look the same to me, and the THD+N are the same for Diretta and RAAT.

This is the 2-sine intermodulation test signal. Again, I see no notable differences.

Same for the multitone test signal.

Same for the jitter test signal. All the artifacts are tucked well below -120dB in both cases.

1.3. Conclusion

According to measurements, the Diretta myth seems BUSTED.

2. Listening Tests

For these tests, I asked my two sons to listen the two boxes - red and black - using their choice of tracks, and tell me which one they prefer, if any, and why. I told them to do rapid switches first with same songs and without touching the volume to see if there were any sound quality differences, then to listen for 20 minutes or so to one album on one box and then to the same album on the other, to see if there were any “long term” differences - whatever that meant. Besides those, I let them experiment in other ways, should they choose to.

We did this 3 times. I told them I did some small tweaks and/or upgrades between rounds, so that they treat each round as a new experiment. That wasn’t a trick, as I did make changes, but to add some suspense, I’ll explain later on what those were.

I’ll refer to my 26-yer-old son as “Kid A” - yes, he’s a Radiohead fan - and to my 20-year-old one as “Kid B”.

Kid A is non-conformist, to say the least, so he probably didn’t follow my instructions about how to listen. He seemed however more involved in the experiment and kept asking if he got it right and trying to read my mind, so I had to convince him there was no right or wrong answer. Not sure if that worked…

Kid B is an even bigger Radiohead fan, but he got to be player 2 because of age, and also because “A” and “B” happen to be their first name initials - more or less. I think he followed my advice about how to listen and did not come with variants.

Both of them felt some frustration with not being able to do rapid switching, and none of them felt that the “long term” listening was relevant. Kid B did mention listening fatigue, not in the sense that one box was more fatiguing than the other, but because the longer listening to one box skewed the results of the other box’s evaluation.

Finally, I’ll throw them under the bus and say that both of them treated this as some kind of chore, which is the main reason it took this long to finish. The actual rounds didn’t take more than one hour each.

These are the results of each round of listening.

2.1 Round 1

2.1.1 Kid A

Initially, after listening for a bit with his usual cans - a midrange Grado - he preferred the red box because it allowed him to turn up the volume more than with the black box, and he likes his music on the loud side - too loud if you ask me. He then switched to our Audeze cans and changed his preference to the black box. (He mentioned he didn’t think the change was because of the choice of cans.) He still thought he could use more volume on the red box, but that it was because the sound was tamer and the instruments less differentiated and more “equalized”, as opposed to the black box, where the instruments’ levels were more individualized and thus more separated. The guitar did “shriek” above other instruments, which was both good and bad, but overall, the black box had a bit better soundstage.

2.1.2 Kid B

Kid B did not perceive much difference between the two and said that, if he had to make a choice, he’d prefer the red box as being “clearer”. He didn’t elaborate beyond this. He used the Audeze cans for all rounds.

2.2 Round 2

2.2.1 Kid A

This time around, he didn’t find that much of a difference between the two, but he stuck to his preference for the black box, for similar reasons. He used his Grado cans for this round and the next.

2.2.2 Kid B

Same conclusion as the first time around: if anything, the red box had a small clarity edge over the black box.

2.3 Round 3

2.3.1 Kid A

This time around, he changed his preference back to the red box, but said that he didn’t really like any of the boxes, felt like it was a really big step down from previous two rounds on both, and the differences were more pronounced in terms of volume: red box could again go higher volume than the black box, but he preferred it because there was no other difference, and the black box was over the edge in terms of loudness.

2.3.2 Kid B

Same conclusion: red box wins, but only if we really had to make a choice.

2.4 Red Box - Black Box

Before you draw any conclusions, I have to clarify what changes I made to the setup between the rounds. I made absolutely no changes to the OS or the protocol versions.

  • Round 1: red box was Diretta, black box was RAAT

  • Round 2: red box was RAAT, black box was Diretta. I literally swapped the boards between the two cases.

  • Round 3: both boxes were RAAT. I re-flashed both SD cards with AudioLinux, installed only Roon bridge, didn’t apply any tweaks, and connected them both to LAN. In other words, the two boxes were simply identical.

Note that I didn’t do this to trick anyone; I just wanted to eliminate potential biases unrelated to the sound, e.g. the appearance of the two boxes, their positions, the choice of device names in Roon etc. Since they did their own switching, it wasn’t a completely blind test.

2.5 Conclusion

If you go back to the 3 rounds and replace “red box” and “black box” with the corresponding “Diretta” and “RAAT”, as explained in section 2.4, I don’t think anyone can unequivocally say that either participant had a clear preference for Diretta or RAAT.

  • In case of Kid A, what I found most interesting is that he still found significant differences between the boxes in round 3, when they were identical. When I told him that, he didn’t believe me, so he had me measure the output level on the two boxes to see if they were indeed the same. He admitted it could have been because he used different sets of songs for each round. For me, the takeaway is clear: when you ask someone to find differences, it’s likely they will find them, and sometimes, those “differences” are not even subtle.
  • In case of Kid B, what’s interesting is that the red box won every time, no matter what it really was. Whether he liked red more than black, or the fact that “Diretta” was on the left, or whether he preferred the name “Diretta” over “Indiretta”, I can’t tell. What I can tell is that he felt like he had to make a choice, and the choice, once made, didn’t change.

According to listening tests, I can’t reject the null hypothesis, so I declare the Diretta myth BUSTED.

This concludes my Diretta evaluation. For me, it’s busted in every aspect. And, after all this effort, I hope I’ll be allowed to express my own take on the whole experiment.

My Take

This exercise did nothing to change what I already knew about digital transports: when they work correctly and deliver unaltered bits in time, they can’t possibly make any difference in sound quality. There is no “buffer fallacy”; buffers work as intended and are your friend. Not only are they necessary to make every single digital audio solution work, they are the flywheel of digital transmissions.

It’s interesting that Diretta’s approach goes opposite to common sense. Making buffers larger actually reduces transmission frequency and the potential of the resulting EMI/RFI/XXI to affect sound. If transmissions happen seconds apart, they are already unlikely to influence quality. The ideal case would of course be to put an entire track or even album in memory and do zero transfers during playback. Instead, Diretta makes buffers as small as possible and transfers them continuously during playback, moving the system much closer to the edge in terms of stability. If you unplug the cable and the music stops [almost] immediately, you have a problem on your hands.

I’m aware this is not going to change many minds. Some people will continue to “evolve” Diretta, others will continue to push it as a “groundbreaking” technology, and others will continue to hear the angels sing when listening to it. But for the ones who are willing to listen, I’ll say this: if you find yourself tweaking the digital side of your system to squeeze more quality out of a bit-perfect playback, what you’re trying to do is fix the hole in your roof by controlling the weather to make it rain less. If you think you have a hole in your roof, just plug it: get a DAC that is immune to input noise. It won’t break the bank. When you let computers, networks and DACs do their intended jobs, the only job left for you is to enjoy the music. Happy holidays!

Let me know if you have any comments or measuring suggestions.

33 Likes

Looks excellent, I’ve no idea if the tests are relevant (not being rude, I’m not technically capable of assessing what you’ve done) - but this is genuinely interesting

My question is, have you busted the theory of the (layman’s terms) smoooooth data flow and low CPU/power demand and/or that it’s doing anything at all?

If you have, why’s it sound better? Ho Ho - sorry couldn’t resist :grinning_face:

But seriously, as with David’s thread that spawned this one, can we keep this to experiential comments not Me! Me! Phsycho Bias Congintive Acoutics or equally unhelpful ‘of course that wouldn’t work’? There are Pros and Cons to both positions (e.g. biases can be +ve and -ve and some prefer vinyl over digital)

i.e. if you actually tried it (I have) and tested it (Marian’s done both) then your comments should help move the game on (or whatever the forum guidelines say…)

3 Likes

That’s a tour de force! Thank you for running such a comprehensive test and publishing the results - much appreciated, for the rigour irrespective of the outcome.

3 Likes

My interpretation is:

@Marian didn’t report measurements of CPU load or power demand (I take it they weren’t measured) but these are supposed to induce noise in the power rails. The measurements showed no qualitative or quantitative difference in power rail noise.

The tests didn’t check smoothness of the data flow directly, but the rationale for smoothing the data flow was to reduce power rail noise, which wasn’t observed.

2 Likes

Is it possible to do that directly?

I realise sometimes we have to measure consequences in order to infer causes

Yes, I think it would be possible - for example by instrumenting the Ethernet driver or network stack so that the arrival time of in-bound frames could be collected. One could then work out a way to extract those arrival times and plot how the difference between arrival times is distributed. With Diretta one would expect to see a narrow distribution, without Diretta it should be broader. It might even be possible to do that on the Raspberry Pi with existing Linux software development (profiling?) tools.

Could you try and analyse a dynamic signal as well? Break it down into some FFT graphs as an example. I don’t have sufficient knowledge to advise on specific tests though.

And, my comment is directed at the results in the section 1.2 primarily.

(and to @david_snyder)

I thought there were some existing charts that claimed to show Diretta’s regular and steady flow of data - or were they just made up for illustrating the theory?

Yes, I think so. But it would be interesting to work out how this might be verified or disproved independently.

1 Like

It is true that these tests were not done with signals that are considered to be musical. However the original hypothesis presented by @David_Snyder was that the improvements from diretta came from the different power draw caused by the different network activity pattern in diretta target device.

These tests are perfectly good for confirming or refuting that hypothesis since the workload patterns and behaviour of the diretta host, diretta target and even a normal Roon ready endpoint do not change according to the content of the transported signal. That only has significance at and beyond the DAC. From the digital transports point of view, streaming silence or a tone or multiple tones at 16bit 44.1kS/s is exactly the same as streaming any kind of music (including many genres that I don’t consider to be music at all although many other people do :smiley:) at that same 16bit 44.1kS/s.

When, as in this setup, the DAC is separate from the streamer, the content of the data being transported is totally irrelevant as far as the behaviour and performance of the streamer are concerned. Hopefully, the same is true of streamers with integrated DACs and even pre and power amps as it should be with a good system architecture.

If you want to propose a different hypothesis to the one presented by @David_Snyder, then you have to account for the difference in apparent sound given that, in both cases the digital data transmitted over the USB cable to the DAC is the same, to that same DAC and in essentially the same environment.

If you just want to go on believing that diretta sounds better and don’t need to know why (or even if the difference is real) then that is fine - right up to the point where you start suggesting that other people should start spending there own time and money duplicating it. At that point, evidence beyond the subjective ‘i can hear the improvement’ is required.

Measurements are one form of such evidence. Records of suitable blind, controlled listening tests are another although they tend to prove the existence of differences rather than the existence of improvements.

For my part, evidence of a difference may be enough to pique my interest. Evidence of an improvement provided by measurements definitely will. A subjective opinion will not.

15 Likes

Another test that would be interesting to see the results from would be to record the analogue outputs of the DAC from

a) Roon > RAAT > DAC
b) Roon > RAAT > Diretta Host > Diretta target > DAC

invert one and overlay, null the difference and see what remains. This could highlight if an audible difference occurs or not.

Yes, I plan to do that, although, as explained on the other thread, I’m not expecting much success due to long-term (as in second or minute) clock drift in the ADC. It’s possible the clock is mighty stable, so I’ll try with a sine wave first and if that looks good enough, with music - and maybe even with a truly blind “offline” ABX test.

3 Likes

I don’t think (hypothesis) that the Diretta calmness and resolution can be explained more than partly, by the power consumption of the Host/Target.
I think (as stated elsewhere) that the benefits are to be found the time domain, primarily.

And therein lies the crux, it is difficult to measure in a reliable way.

I understand, and respect, your position. I simply don’t share it, and have no need or obligation to convince anyone. I rather just listen to some more wonderful tunes! :slight_smile:

Excellent stuff! I’m developing an audio myths website (audiomyths.info) as a side project (there’s even an AI chatbot in the works complements my kid). I’ll be adding reference to this to the streamers-sound-different posts soon!

4 Likes

The multi-sine wave is quite dynamic and is approximating music much better than one or two-sine waves. If you look at a portion of the waveform in Audacity, you wouldn’t be able to tell it’s a test signal or a recording:

All the measurements in section 1.2 are FFT graphs. Do you have anything specific in mind?

1 Like

Can you kindly annotate with your ADC make/model and other test gear for completeness and repeatability?

Sorry, Mark, but I just took a look at your site, and you are hardly myth busting when you dismiss all arguments by asserting that “a digital file is not affected by any noise if it arrives bit-perfect at the DAC.”

No one is saying that the file itself has changed; only that there are electrical, ground plane, timing and other anomalies tagging along which do impact the DAC’s performance.

The statement stands, though: the file contents are not affected. Note that I cover the other possibilities that you mention.

Sorry, of course they are FFT’s! I’ll try and get some input from an experienced electronics engineer, with suggestions.

1 Like

OK. The following argument applies to USB connected DACs since that is the setup that @David_Snyder originally described. SPDIF (and similar source clocked interfaces) connected DACs will often behave somewhat differently. Whether these differences are audible is again up for debate. I have no opinion never having used an SPDIF connected DAC.

But, with USB, the time domain aspect is managed solely by the DAC. With UAC2 and asynchronous audio, the USB host clocks and the DAC sample clock are not directly tied together. The USB host sends packets at regular intervals. The sample rate within the packet is much higher than that required by the DAC and so these samples are placed in a buffer so that the USB sample rate and the DAC sample rate can be decoupled. The number of samples in those packets is adjusted (by means of a feedback channel on the USB bus in the reverse direction) to maintain the correct average sample rate demanded by the DAC and determined by the DACs sample clock. The nominal number of samples in the packet is determined by comparing the packet frequency (either 1000 or 8000 per second for USB isochronous channels) with the sample rate. The average number of samples per packet is not likely to be an integer - at least for PCM.

For example, 44.1kS/s divided by 1000 packets per second means that, on average, 44.1 samples will need to be sent in each packet. Since it is not possible to send .1 samples, the nominal number of samples would, I imagine, be set to 44 which would result in a shortfall in samples over time. Thus a feedback channel is used to signal to the USB host that, occasionally (1 in 10), a packet with 45 samples is required. (Note: the number of samples in a packet is always nominal, nominal + 1 or nominal - 1)

The use of this feedback channel can also be used to accommodate small relative errors in the clocks on the USB host and the sample clock on the DAC. If the DAC sample clock is slightly slower than the USB host expects, the feedback channel will be used to adjust the average sample rate by reducing the number of nominal +1 packets sent (or sending some nominal - 1 packets). Similarly, if the DAC sample clock is slightly faster than the USB host expects, the number of nominal + 1 packets is increased (or the number of nominal - 1 packets is decreased). In this way the number of samples in the buffer can be managed so that buffer overruns and buffer underruns never occur and there is always a sample available for the DAC to use when its sample clock demands a sample be presented to the digital to analogue converter. Thus the time domain aspects of the DAC are controlled solely by the DAC and are independent of the USB host or the USB bus itself.

Thus, if you are suggesting that a time domain difference is observed, I would have to ask how the use of diretta can affect the sample clock in a USB connected DAC - even when the timing of packets (and their average size) over USB is not changed. diretta manages the flow of data over Ethernet to the diretta target but it does nothing to alter the traffic from the diretta target to the DAC over USB.

9 Likes