Diretta Measurements and Listening Tests

I am starting a new thread to discuss the measurements and [single-]blind A/B listening tests of the Diretta protocol, as a follow-up to the original thread:

I used to think the DAC was 90% of the sound. This sub-$300 DIY project proved me wrong - Tinkering - Roon Labs Community

I divided it into sections, so that they can be easily referenced in the discussions. There will be two main sections: measurements and listening tests. This post presents the former, with the latter to come in a later post when completed.

I would like this to be a fact-based, technical discussion of Diretta. If you are interested in a purely subjective take of the subject, please see the original thread.

0. Setup

This is the diagram of the setup used. It is based on @David_Snyder’s instructions, which you will find in the original thread.

The host box on the left is connected to LAN through a USB Ethernet dongle and is visible to Roon server through RAAT protocol. The dongle is necessary to create an additional Ethernet port on the host, as the onboard port is used to communicate with the target box on the right through Diretta protocol.

In a normal, Diretta-only setup, the red path is missing and the DAC connects to one of the USB 2 ports on the target box (the “Diretta ON” connection in the diagram). The signal path in this case is:

Roon → [RAAT] → host → [Diretta] → target → USB DAC.

The red path was used to be able to make measurements with Diretta protocol out of the picture. In that case, the DAC connects to one of the USB 2 ports on the host box (the “Diretta OFF” connection in the diagram), and the signal path is the usual Roon playback path:

Roon → [RAAT] → host → USB DAC.

(There was no USB switch used in the setup; switching was done by simply unplugging the DAC from one of the USB ports and plugging it into the other.)

The DAC used is a Topping D70.

1. Measurements

This section is further divided into two: power rail measurements for both 5V and 3.3V rails and DAC output measurements.

In all images, the diagrams on the left show the target box - i.e. with Diretta - and the ones on the right the host box - i.e. with RAAT only.

1.1 Power Rails

1.1.1 Power Rails Time Domain

These diagrams show the waveform of the power rails (yellow for 5V, blue for 3.3V) over a 17-or-so seconds at a time. I used AC coupling here so I can amplify and put both on the same screen. One time division is 10s, one voltage division is 50mV.

Diagram above was captured when nothing was playing. Diretta is definitely quieter here, with an amplitude of the noise is roughly half of that of the host for both rails. This is not exactly a fair comparison though, as host box’s USB3 port connected to the Ethernet dongle was working all the time, while there was no USB (and barely any network) activity on the target box. Still, spikes of activity can still be seen on the target. The 5V spikes are as high as the ones on the host (roughly 150mV pp), and the ones on the 3.3V rail, although smaller, appear quite random.

Same as the previous one, but when silence (more exactly, 32-bit dither) was played. The red horizontal bar at the top of the diagrams shows the playback interval, roughly one minute. We can see the noise on the target box rails “come alive” during streaming, while there’s not much change in the noise pattern on the host. The noise amplitudes on the target are now almost the same as on the host, although this is still not a fair comparison, since there’s one USB port working on the target and two USB ports on the host. Most interestingly, the spikes on both 5V and 3.3V rails on the target seem random to me.

(You can see that Roon is streaming silence for about 5 seconds after playback is stopped. That’s the time after which the signal indicator light “turns off”.)

Same as the previous one, but when death metal is played instead of silence. (I used the title track from Fractal Generator’s Macrocosmos album. That is loud!) I was curious if the streamed data had any impact on the noise. It doesn’t. I think it’s plenty apparent here that the bits that are streamed cannot be called “music” in any shape or form; they may sometimes be more ones or more zeroes, but the physical layers involved in transmission work just the same.

In the capture above, I zoomed 10x on the time axis to get a more detailed noise waveform. The division is now 1s and the span is about 17s. The noise on the target still looks quite random to me. The big spikes appear to happen more than one minute in between.

Apart from the slightly smaller amplitude of the noise on the target relative to the host - which is most probably due to one less USB port being active - I don’t really see anything that sets Diretta apart from RAAT. Let me know if I am missing something.

(I will perform another measurement of a RAAT-only box using only the onboard Ethernet port and with only one active USB, to see if it brings the noise to a level comparable to the one on the target box.)

1.1.2 Power Rails Frequency Domain

The following images show the spectrum of the power rail noise. I used DC coupling for these, to make sure I didn’t miss any low-frequency noise.

The image above shows the spectrum of the 5V rail noise between zero and about 106Hz. One horizontal division is 6.25Hz. As usual, target box is on the left; top graphs are captured during silence playback and the bottom ones during death metal. I am not seeing any significant upward spikes here, or any difference between the two boxes.

This image shows the 5V noise spectrum between about 15.5kHz and 20kHz. One horizontal division is 250Hz and the center of the frequency axis is at 17.75kHz. I’m not seeing any relevant spikes or differences here either.

This one captures the full spectrum between zero and 22.5kHz, so it fully contains the audible spectrum. One horizontal division is 1.25kHz and the center of the frequency axis is at 11.25kHz. Looks the same to me.

The following 3 images show the same thing as the previous 3, but for the 3.3V rail. Let me know if you can spot any differences between the rails or between target and host.

This concludes the power rails measurements.

1.2 DAC Output

Now comes the interesting part: measuring the actual output of the DAC! At the end of the day, this is what we are actually listening to - if of course we ignore all the artifacts of the analog chain. Again, the left side in each image is through Diretta and the right side through RAAT only.

The image above shows the output of the DAC when silence (32-bit dither) is played. This shows the DAC at its most silent; I think the zero-detect feature of the chip is muting the output.

I think this is the best proof that Diretta’s rationale for a two-box-solution falls flat on its face. Every single component of the Topping D70 DAC is under one single box, including the power supply. This means that there’s a 110V/60Hz “signal” entering the box at all times. It’s a very high voltage at a very audible frequency. Still, the only indications there’s a PSU inside are the tiny peaks at 60Hz, 120Hz and 180Hz. The highest is somewhere around -150dB! If the PSU noise is suppressed to this degree, what do you think happens with EMI/RFI noise measured in millivolts and extending into higher frequencies? The fact that everything is in one single box doesn’t seem to be of any practical consequence.

This are the output spectrums for a 1kHz sine. They look the same to me, and the THD+N are the same for Diretta and RAAT.

This is the 2-sine intermodulation test signal. Again, I see no notable differences.

Same for the multitone test signal.

Same for the jitter test signal. All the artifacts are tucked well below -120dB in both cases.

1.3. Conclusion

According to measurements, the Diretta myth seems BUSTED.

Let me know if you have any comments or measuring suggestions, and stand by for the listening test results.

26 Likes

Looks excellent, I’ve no idea if the tests are relevant (not being rude, I’m not technically capable of assessing what you’ve done) - but this is genuinely interesting

My question is, have you busted the theory of the (layman’s terms) smoooooth data flow and low CPU/power demand and/or that it’s doing anything at all?

If you have, why’s it sound better? Ho Ho - sorry couldn’t resist :grinning_face:

But seriously, as with David’s thread that spawned this one, can we keep this to experiential comments not Me! Me! Phsycho Bias Congintive Acoutics or equally unhelpful ‘of course that wouldn’t work’? There are Pros and Cons to both positions (e.g. biases can be +ve and -ve and some prefer vinyl over digital)

i.e. if you actually tried it (I have) and tested it (Marian’s done both) then your comments should help move the game on (or whatever the forum guidelines say…)

3 Likes

That’s a tour de force! Thank you for running such a comprehensive test and publishing the results - much appreciated, for the rigour irrespective of the outcome.

3 Likes

My interpretation is:

@Marian didn’t report measurements of CPU load or power demand (I take it they weren’t measured) but these are supposed to induce noise in the power rails. The measurements showed no qualitative or quantitative difference in power rail noise.

The tests didn’t check smoothness of the data flow directly, but the rationale for smoothing the data flow was to reduce power rail noise, which wasn’t observed.

2 Likes

Is it possible to do that directly?

I realise sometimes we have to measure consequences in order to infer causes

Yes, I think it would be possible - for example by instrumenting the Ethernet driver or network stack so that the arrival time of in-bound frames could be collected. One could then work out a way to extract those arrival times and plot how the difference between arrival times is distributed. With Diretta one would expect to see a narrow distribution, without Diretta it should be broader. It might even be possible to do that on the Raspberry Pi with existing Linux software development (profiling?) tools.

Could you try and analyse a dynamic signal as well? Break it down into some FFT graphs as an example. I don’t have sufficient knowledge to advise on specific tests though.

And, my comment is directed at the results in the section 1.2 primarily.

(and to @david_snyder)

I thought there were some existing charts that claimed to show Diretta’s regular and steady flow of data - or were they just made up for illustrating the theory?

Yes, I think so. But it would be interesting to work out how this might be verified or disproved independently.

1 Like

It is true that these tests were not done with signals that are considered to be musical. However the original hypothesis presented by @David_Snyder was that the improvements from diretta came from the different power draw caused by the different network activity pattern in diretta target device.

These tests are perfectly good for confirming or refuting that hypothesis since the workload patterns and behaviour of the diretta host, diretta target and even a normal Roon ready endpoint do not change according to the content of the transported signal. That only has significance at and beyond the DAC. From the digital transports point of view, streaming silence or a tone or multiple tones at 16bit 44.1kS/s is exactly the same as streaming any kind of music (including many genres that I don’t consider to be music at all although many other people do :smiley:) at that same 16bit 44.1kS/s.

When, as in this setup, the DAC is separate from the streamer, the content of the data being transported is totally irrelevant as far as the behaviour and performance of the streamer are concerned. Hopefully, the same is true of streamers with integrated DACs and even pre and power amps as it should be with a good system architecture.

If you want to propose a different hypothesis to the one presented by @David_Snyder, then you have to account for the difference in apparent sound given that, in both cases the digital data transmitted over the USB cable to the DAC is the same, to that same DAC and in essentially the same environment.

If you just want to go on believing that diretta sounds better and don’t need to know why (or even if the difference is real) then that is fine - right up to the point where you start suggesting that other people should start spending there own time and money duplicating it. At that point, evidence beyond the subjective ‘i can hear the improvement’ is required.

Measurements are one form of such evidence. Records of suitable blind, controlled listening tests are another although they tend to prove the existence of differences rather than the existence of improvements.

For my part, evidence of a difference may be enough to pique my interest. Evidence of an improvement provided by measurements definitely will. A subjective opinion will not.

14 Likes

Another test that would be interesting to see the results from would be to record the analogue outputs of the DAC from

a) Roon > RAAT > DAC
b) Roon > RAAT > Diretta Host > Diretta target > DAC

invert one and overlay, null the difference and see what remains. This could highlight if an audible difference occurs or not.

Yes, I plan to do that, although, as explained on the other thread, I’m not expecting much success due to long-term (as in second or minute) clock drift in the ADC. It’s possible the clock is mighty stable, so I’ll try with a sine wave first and if that looks good enough, with music - and maybe even with a truly blind “offline” ABX test.

3 Likes

I don’t think (hypothesis) that the Diretta calmness and resolution can be explained more than partly, by the power consumption of the Host/Target.
I think (as stated elsewhere) that the benefits are to be found the time domain, primarily.

And therein lies the crux, it is difficult to measure in a reliable way.

I understand, and respect, your position. I simply don’t share it, and have no need or obligation to convince anyone. I rather just listen to some more wonderful tunes! :slight_smile:

Excellent stuff! I’m developing an audio myths website (audiomyths.info) as a side project (there’s even an AI chatbot in the works complements my kid). I’ll be adding reference to this to the streamers-sound-different posts soon!

4 Likes

The multi-sine wave is quite dynamic and is approximating music much better than one or two-sine waves. If you look at a portion of the waveform in Audacity, you wouldn’t be able to tell it’s a test signal or a recording:

All the measurements in section 1.2 are FFT graphs. Do you have anything specific in mind?

1 Like

Can you kindly annotate with your ADC make/model and other test gear for completeness and repeatability?

Sorry, Mark, but I just took a look at your site, and you are hardly myth busting when you dismiss all arguments by asserting that “a digital file is not affected by any noise if it arrives bit-perfect at the DAC.”

No one is saying that the file itself has changed; only that there are electrical, ground plane, timing and other anomalies tagging along which do impact the DAC’s performance.

The statement stands, though: the file contents are not affected. Note that I cover the other possibilities that you mention.

Sorry, of course they are FFT’s! I’ll try and get some input from an experienced electronics engineer, with suggestions.

1 Like

OK. The following argument applies to USB connected DACs since that is the setup that @David_Snyder originally described. SPDIF (and similar source clocked interfaces) connected DACs will often behave somewhat differently. Whether these differences are audible is again up for debate. I have no opinion never having used an SPDIF connected DAC.

But, with USB, the time domain aspect is managed solely by the DAC. With UAC2 and asynchronous audio, the USB host clocks and the DAC sample clock are not directly tied together. The USB host sends packets at regular intervals. The sample rate within the packet is much higher than that required by the DAC and so these samples are placed in a buffer so that the USB sample rate and the DAC sample rate can be decoupled. The number of samples in those packets is adjusted (by means of a feedback channel on the USB bus in the reverse direction) to maintain the correct average sample rate demanded by the DAC and determined by the DACs sample clock. The nominal number of samples in the packet is determined by comparing the packet frequency (either 1000 or 8000 per second for USB isochronous channels) with the sample rate. The average number of samples per packet is not likely to be an integer - at least for PCM.

For example, 44.1kS/s divided by 1000 packets per second means that, on average, 44.1 samples will need to be sent in each packet. Since it is not possible to send .1 samples, the nominal number of samples would, I imagine, be set to 44 which would result in a shortfall in samples over time. Thus a feedback channel is used to signal to the USB host that, occasionally (1 in 10), a packet with 45 samples is required. (Note: the number of samples in a packet is always nominal, nominal + 1 or nominal - 1)

The use of this feedback channel can also be used to accommodate small relative errors in the clocks on the USB host and the sample clock on the DAC. If the DAC sample clock is slightly slower than the USB host expects, the feedback channel will be used to adjust the average sample rate by reducing the number of nominal +1 packets sent (or sending some nominal - 1 packets). Similarly, if the DAC sample clock is slightly faster than the USB host expects, the number of nominal + 1 packets is increased (or the number of nominal - 1 packets is decreased). In this way the number of samples in the buffer can be managed so that buffer overruns and buffer underruns never occur and there is always a sample available for the DAC to use when its sample clock demands a sample be presented to the digital to analogue converter. Thus the time domain aspects of the DAC are controlled solely by the DAC and are independent of the USB host or the USB bus itself.

Thus, if you are suggesting that a time domain difference is observed, I would have to ask how the use of diretta can affect the sample clock in a USB connected DAC - even when the timing of packets (and their average size) over USB is not changed. diretta manages the flow of data over Ethernet to the diretta target but it does nothing to alter the traffic from the diretta target to the DAC over USB.

9 Likes