I used to think the DAC was 90% of the sound. This sub-$300 DIY project proved me wrong

This isn’t something to fear, but something to challenge. Whether a difference is perceived, understanding is crucial. With my DigiONE experiment, there was a change. But was it better? Subjectively, yes, because I anticipated an improvement. Objectively, no, since it introduced unwanted noise (discovered some time later.)

In any experiment, we cannot rely on our perceptions. The experiment needs to be independent of our feelings.

I understand this. When using Roon, the DAC controls timing, and will fill up its buffer and request more data as playback continues.

But, I think you may have misunderstood my comment. That is, if Diretta is an audio protocol, it should perform its task independent of hardware, i.e., Diretta, not the real-time kernel, schedules data transfer. Incidentally, what happens to playback when you disconnect the Host from the network?

My remark about noise was seeking to understand how Diretta eliminates noise other techniques do not. Noise doesn’t necessarily enter the DAC at its digital input and is practically everywhere, even in a streamer running Diretta. Nonetheless, these sources of noise should be below the noise floor, or well above audio frequencies. Fortunately, all of this is measurable, so there is no good reason for mystery. This is why I expressed uncertainty in Diretta, and curiosity got the better of me. I’d like to understand what it is really doing.

So, like @Marian, I am tempted to build your project. However, my goal is to understand why it may sound better (different) to Roon in a traditional setup. As I said earlier, for an experiment to be successful, we need to have understanding. IMO, aesthetics and subjectivity are not enough for this kind of upgrade.

However, whilst I don’t mind buying a couple of Raspberry Pi 4s (I discovered I’m still running 3s), I would rather not buy a set of scripts masquerading as a Linux distribution, unless there is a trial period for evaluation, which I assume GentooPlayer offers.

5 Likes

Hi @Bruce_Barbour,

Those are excellent, logical questions. On paper, “Wireless + Buffering” sounds like the perfect solution: you break the physical connection (galvanic isolation) and you store the data locally so timing shouldn’t matter.

However, my experience (and the theory behind Diretta) suggests that this approach actually increases the specific type of noise we are hoping to eliminate.

Here is why:

1. The “Buffer” Paradox
We tend to think of a buffer as a calm reservoir, but from the CPU’s perspective, it’s a chaotic workload. To fill a buffer, the CPU has to wake up, process a burst of network packets at high speed (“Race to Idle”), and then sleep.

  • This creates a “Sawtooth” pattern of current draw: High… Low… High… Low.
  • That fluctuation creates low-frequency noise on the internal power rails—exactly the kind of noise that gets past a DAC’s filters.
  • Diretta does the opposite: it minimizes buffering to keep the CPU load constant and flat, eliminating those power fluctuations.

2. The Wireless Noise Source
You are right that Wi-Fi offers galvanic isolation from the router. But the trade-off is that you are activating a powerful radio transceiver inside your endpoint.

  • Even in a quiet rural area, that Wi-Fi chip is generating significant RFI/EMI inside the chassis just to maintain the link and decrypt the stream.
  • Processing the Wi-Fi stack (WPA2/3 encryption, retransmits, etc.) is also much more CPU-intensive than handling a wired Ethernet frame, which adds to the processing noise mentioned above.

The goal of this project is to create an “oasis of electrical quiet” right before the DAC. A wired, low-CPU connection (like the Diretta Target) is electrically much quieter inside the device than a Wi-Fi radio and a bursting CPU.

Regarding the License Cost:
Thanks for checking! I just looked at the shop again. The license required for this project is the “Diretta USB Target”, which is currently listed at €100 (excluding VAT). You might be looking at the “Alsa” or “ASIO” driver licenses (often used for the Host side or PC builds), which can be more expensive.

Since my project uses AudioLinux for the Host and Target, the only standalone license you need to buy from Diretta is for the Target. Note: It must be purchased from a link generated by Piero’s software on your specific Target computer. There’s no way to purchase the license separately and install it on an AudioLinux Target later.

1 Like

Maybe Bruce was looking, as I initially did, at Dnegozio (Diretta Shop) with its

Diretta Target USB Bridge for RaspberryPi5

Regular price €200,00

All of this is a bit confusing.

1 Like

Hi @mjw,

I appreciate you taking the time to engage with the technical details. I respect your rigorous approach to testing; while my “tinkering” approach is admittedly more experiential, I think we are both chasing the same understanding of why things sound the way they do.

To address your specific technical questions:

1. Why the Real-Time Kernel?

You asked:

…if Diretta is an audio protocol, it should perform its task independent of hardware, i.e., Diretta, not the real-time kernel, schedules data transfer.

This is a great question that gets to the heart of system architecture. The issue is that Diretta (the application) is at the mercy of the Linux Kernel.

  • In a standard kernel configuration, the scheduler prioritizes throughput and system stability. It can (and does) interrupt audio threads to handle network interrupts, disk I/O, or system housekeeping. This forces the application to “burst” data to catch up when it gets CPU time back.
  • The Real-Time Kernel changes the rules. It provides deterministic behavior. It ensures that when Diretta says “send packet now,” the CPU does it now, not “when I’m done with this background task.” The software defines the schedule; the RT Kernel guarantees the CPU is available to keep it.

2. What happens when you disconnect the Host?

Incidentally, what happens to playback when you disconnect the Host from the network?

It depends on which connection you break, but the result is telling either way. I just tested this to be sure:

  • Disconnecting the LAN Input (Core → Host): The music continues to play for 7–8 seconds. This confirms that Roon’s RAAT protocol sends data in large bursts to fill a small but significant buffer on the Host.
  • Disconnecting the Point-to-Point Link (Host → Target): The music stops instantly. There is effectively zero “coast” time.

This contrast is the key proof. The Target is not “filling and draining” a buffer (like the Host is); it is relying on a constant, stable stream. This lack of buffering is what keeps the CPU load flat and stable on the Target side.

3. AudioLinux vs. GentooPlayer

I would rather not buy a set of scripts masquerading as a Linux distribution…

I hear you on the cost concerns. The good news is that GentooPlayer is absolutely a suitable technical alternative. Many Diretta users prefer it on forums I’ve been following.

However, if cost is your primary driver, it’s worth noting that the GentooPlayer license is per-device. For a dual-PC setup (Host + Target), purchasing two licenses (plus the premium kernel options often recommended for best performance) usually ends up costing significantly more than the single AudioLinux subscription which covers both units.

I chose AudioLinux for my guide simply because it was the most cost-effective path to get a fully optimized, RT-kernel-equipped Host and Target up and running quickly.

If you already have a couple of Raspberry Pi 4 boards on hand (and yes, the RT kernel does require the Pi 4 over the 3), I would genuinely love to see you set this thing up, regardless of which OS you choose. Since you have the tools and the skepticism, you are in the best position to verify (or challenge) what I’m hearing!

5 Likes

You are exactly right. That listing you found is likely for their “turnkey” OS image. Since we are using AudioLinux as the OS, we get access to the lower “add-on” pricing (€100), but only if we buy it the right way.

Here is the safe sequence to ensure you don’t overpay:

  1. Acquire the Hardware (prerequisites listed in my guide).
  2. Purchase AudioLinux (link) for Raspberry. Currently, $69 for one year of support. Note: This one license covers both your Host and Target Pis (and any others you build at that location).
  3. Build the kit and get everything working at 44.1/48 kHz (sample rates for which Diretta is free to test).
  4. Listen for a week. Decide if you like it, hate it, or are indifferent.
  5. Buy the Diretta License (Target Only): If you like it, use the License menu option on the Target to generate the hardware-specific link you need to purchase the €100 Diretta license.
    (Pro Tip: run sudo /opt/diretta-alsa-target/diretta_app_activate on the Target to generate the URL).

CRITICAL WARNING: Do NOT buy any Diretta licenses using links on the main Diretta website. You generally cannot transfer those to your AudioLinux installation. You must use the link generated by your specific Pi.

David, I must say that you are so polite and respectful in all your replies. And the time you take to elaborate is truly impressive. It’s almost like writing 1000 words comes effortlessly to you. The only other exchanges that I have had that can compare are in my audio conversations with ChatGPT,but let’s return to the topic at hand.
Your beautiful analogy of the sawtooth operation of the CPU turning on and off as it processes batches of data from the buffer is quite intuitive, Who would want that anywhere near their dac,but I would love to know how you actually measured this and determined that it is significant. In a similar vein. I am struggling to understand how the set up you are experimenting with manages to create that sea of tranquility around the target. May I offer an analogy of my own that somehow Direta works as a “computer tranquilizer” to calm that over active CPU?
Your image of a data buffer as a noisy chaotic place is also striking and once again, I’d love to know how do you know this? In contrast, my image of a data buffer is that it’s rather smooth efficient operation, because it simply consists of sorting packets of data. by their assigned numbers, and I always thought this is the kind of job. The computers do with almost no effort.

This is the most enjoyable thread, like you I love tinkering, and I have had a lot of good success with raspberry pies as data sources.

5 Likes

Thanks for clarifying the process.

Reminds me of the “Wizard of Oz” - Pay no attention to the man behind the curtain.

1 Like

Hi @Bruce_Barbour,

I will take the comparison to ChatGPT as a compliment! I try to be thorough because I know how frustrating it can be to piece this stuff together from scattered forum posts. I’ve spent way too much time on this stuff over the past five months.

I love your analogy of Diretta as a “computer tranquilizer.” That is brilliant, and it perfectly captures the intent. I may steal that. :wink:

You asked excellent questions: “How do you know?” and “How did you measure this?”

To be transparent: I have not yet hooked up an oscilloscope to the internal power rails of the Raspberry Pi to measure the voltage sag/spikes during playback. (Though, as a tinkerer, I admit the idea is tempting!)

My explanation relies on well-known computer architecture principles, specifically “Race-to-Idle” behavior.

Why Buffers = Electrical Noise

You mentioned that you view a buffer as a “smooth efficient operation” of sorting packets.

  • Logically (Software): You are right. It is just sorting numbers.
  • Electrically (Hardware): It is violent.

Modern CPUs are designed to save power. When a burst of data arrives (to fill a buffer), the CPU wakes up, spikes its voltage and frequency to process the data as fast as possible (“Race”), and then cuts power to drop back to sleep (“Idle”).

This rapid switching between “100% effort” and “0% effort” creates a steep dI/dt (change in current over time). In electrical engineering, high dI/dt causes voltage ringing and noise on the power rails.

The “Tranquilizer” Visualized

By “drip feeding” the data (Diretta), the protocol prevents the CPU from ever sleeping but also prevents it from ever sprinting. We force it into a constant, low-power “jog.”

You inspired me to try to capture this effect visually. I just ran a fresh network capture on my Diretta Host to compare the Input (RAAT, left) vs. the Output (Diretta, right).

The Top Row - 30 seconds at 32-bits, 88.2 kHz
This shows the throughput (Mbps) over time. I had to use a logarithmic scale here because the differences between RAAT and Diretta are so large.

  • RAAT (Left): You can clearly see the “Sawtooth”. It bursts up to ~100 Mbps to fill the buffer, then drops to zero, then bursts again. That violent up-and-down is what causes the “Race-to-Idle” power noise.
  • Diretta (Right): It is a “Flatline”. It’s a constant, calm stream of data hovering steadily around 6 Mbps. The CPU never has to sprint; it just jogs steadily.

The Bottom Row (4-Millisecond Zoom)
This zooms in to see individual packets during a 4ms window.

  • RAAT (Left): You see a dense “clump” of packets hammered into the interface back-to-back, followed by silence.
  • Diretta (Right): You see the “Drip Feed”. Packets are perfectly spaced, arriving with metronomic precision.

I think these plots effectively illustrate that ‘Computer Tranquilizer’ effect you described.

Thanks again for the nudge to dig deeper into this. I wouldn’t have thought to run this specific comparison without your questions. I hope these plots help bridge the gap between the logical idea of a buffer and the physical reality of what’s happening on the wire!

2 Likes

I think the dac is the most important piece in the digital setup. You can have the best source (whatever that is) but if the dac is bad, the sound will be bad.

Once you have a very good dac, 1 that has either i2s or Ethernet inputs, (usb is a terrible interface into the dac), then the streamer is the next important piece. My streamer does a couple of things that provides the dac with the best signal:

  1. converts incoming Ethernet data to i2s for the dac
  2. streamer acts as a Roon endpoint
  3. streamer also allows it to talk to other sources like Qobuz connect

The next item on the list that makes a difference in sound quality is what computer OS you’re using for Roon. When using Roon, the best sounding server was running Linux. Hardware matters by having enough resources to run Roon without swapping/pausing.

All these steps have been demoed in listening tests with audio club members.

My goal was to make Roon sound its finest, and it does sound very good. But, at the same time, I made sure my hardware can accept any other source. I’ve said this in other forums, Roon isn’t the best sounding source, but Roon has the best graphical interface and the sound is very very good. Right now, Qobuz connect sounds the best, but no real GUI and no access to ripped music and no ARC. Using Connect, all you need is a good internet connection (prefer fiber) and a streamer that is Qobuz Connect capable.

Audirvana also sounds very very good, lets you play ripped music, but the GUI isn’t as elegant as Roon (nothing is) and no ARC.

Things change in a relative short period of time so make sure your hardware can accept the newer technologies.

Hi @maximasr,

You wrote: “I think the dac is the most important piece in the digital setup.”

I agree. In fact, I started this thread by admitting I used to believe the DAC dictated 90% of the sound quality, with the transport being a minor 10% player.

This project dramatically altered that ratio for me. I now believe the transport is at least as important as the DAC.

Why? Because in 2025, are there really any “bad DACs”? Even modest modern DACs are technically excellent. In my experience, when a modern DAC sounds “bad” or “digital,” it’s rarely the DAC’s fault—it’s the electrical noise riding in on the input from the transport, ground plane contamination, or environmental noise.

That brings me to your points about Roon and USB

I used to agree that “Roon isn’t the best sounding source.” I had better results with Audirvana driving a directly connected USB DAC, and I have friends who swear by Innuos Sense and get impressive results from JPlay and Aurender Conductor.

However, I found that the culprit wasn’t Roon itself—it was the transport mechanism.

Roon’s RAAT protocol is “chatty” and bursty, which causes rapidly fluctuating CPU activity. When you feed that into a standard USB interface, you get electrical noise that degrades the sound. This is why people often say “USB is terrible”—not because the USB protocol is bad, but because the computers feeding it are noisy.

Why specialized inputs (Ethernet/I2S) aren’t the answer

  1. Internal Streamers (Ethernet Inputs): While convenient, an Ethernet input on a DAC means putting a noisy computer (the streamer) inside the same chassis as your delicate analog circuits. Externalizing that computer (via a Diretta Target) physically isolates that noise, preserving the benefits of the inverse-square law. It keeps the “computer” outside of the DAC (preserving isolation) but makes that computer run so efficiently that it presents a cleaner signal than even an internal streamer could achieve.

  2. The I2S Rabbit Hole: I2S was designed as an internal protocol for moving data a few centimeters between chips on a circuit board. It was never meant to be a transmission protocol over external cables.

    • There is no standard pinout for HDMI-I2S.
    • Pushing that signal over a long cable through multiple fiddly HDMI connectors often introduces impedance mismatches and resulting transmission line reflections and losses which negate the theoretical benefits.

Why USB is actually great (when done right)

A high-quality Asynchronous USB implementation allows the DAC to act as the clock master. When you pair that with a Diretta Target—which “calms” the CPU to remove the processing noise—USB becomes an exceptionally quiet, transparent interface. Really, it is only lacking in galvanic isolation, which better DACs offer internally anyway.

My experience is that once you fix the transport with Diretta, Roon sounds just as good as (or better than) those other high-end solutions, allowing you to keep the great Roon interface without the sonic compromise. That’s what this project is all about.

4 Likes

I concur; I experienced exactly this when setting up my SOtM boxes

I’ve progressed from PC Roon Bridge/Asio => SOtM Diretta to SOtM => SOtM, which involved much fiddling/learning and restarting

2 Likes

I bought the system as David describes and I couldn’t be happier. The Diretta host and target replaced my Sonore optical converter and Opticalrendu deluxe in my system. Compared to the Sonore gear, the Diretta provides a cleaner presentation with improved soundstage and tone. A real bargain! Re the DAC - the Diretta is feeding my MSB Premier via the Pro USB/Pro ISL modules.

5 Likes

Hi David,

Out of interest, layman’s question, how ‘hard’ are the RPis working?

I’m asking from two angles of curiosity…

  1. If the Host is acting as Roon Bridge + Diretta source, isn’t it doing more work than the Target? I know the spec calls for the more powerful Pi5 (64-bit quad-core Arm Cortex-A76 processor running at 2.4GHz), but why is that?

…and that brings me to the second thought…

  1. My SOtM boxes have some ‘Arm 2-core running at about 1GHz’ (I understand), which would seem to be relatively underpowered vs the Pi5, and yet they work fine*. Last year, SOtM issued a firmware update that enabled us to reduce the clock speed as low as 144KHz when used as a Roon endpoint - this gave a noticeable improvement to SQ, in particular to bass weight and texture. Admittedly, getting into the Eunhasu interface was rather (very) slow but it was fairly stable - in the end I settled on ~340KHz. Again, if two cores at 144-340KHz can process Roon, does Diretta really have to work that hard? Could Diretta be slowed down (<1Ghz) and still work, and sound even better?

To be clear, the Diretta solution sounds better than the reduced clock speed from before and the * above is to add a note that there is a slight delay now from Play-to-Sound (I guess 1/2 a second?) - I don’t know if the ‘more powerful’ RPi system has this delay, but it’s not an issue for me

Regards
Andy

2 Likes

Hi @Andrew_Stoneman,

These are not “layman’s questions” at all—they are actually quite advanced architectural questions! You’ve hit on a few design choices that I spent a lot of time debating during the testing phase.

1. How “hard” are they working?
I monitored the systems during playback of high-res content, and the results might surprise you:

  • Diretta Host (Pi 4): ~3% CPU usage (97% Idle)
  • Diretta Target (Pi 5): ~2.5% CPU usage (97.5% Idle)

So, you are absolutely right—neither of them is breaking a sweat. They are effectively “coasting.”

2. Why the Pi 5 for the Target?
If the CPU load is so low, why do we need the “more powerful” Pi 5 on the Target?
It’s not about processing power (muscle); it’s about latency and I/O architecture (reflexes).

The Raspberry Pi 5 introduces a dedicated I/O controller (the RP1 southbridge) that handles USB and Ethernet interrupts with greater precision and isolation than the Pi 4. Since the Target’s only job is to take packets from Ethernet and hand them to USB with perfect timing, that architectural improvement helps maintain the “flatline” steady state we want.

Furthermore, Yu-san (the developer of Diretta) performs his reference testing exclusively on the Raspberry Pi 5 Compute Module. Using the Pi 5 for the Target aligns this project as closely as possible with the developer’s own reference hardware while keeping costs low (the 2 GB RPi5 is only $49.95 in the US).

3. Why not Pi 5 for the Host, too?
This is where the network “tinkering” comes in. I suspect that connecting two Pi 5s directly might trigger Energy Efficient Ethernet (EEE) negotiation. EEE tries to save power by putting the link to sleep during tiny gaps in data.

  • For standard file transfers, EEE is fine.
  • For Diretta (which sends a constant stream of tiny packets and has no error correction/re-transmits), EEE can be disastrous, causing dropped packets and audio dropouts.

The Pi 4 (Host) to Pi 5 (Target) connection seems to be the “Goldilocks” zone—robust, stable, and fast, without the aggressive power-saving negotiation risks.

Regarding the “Delay”:
You asked if this “more powerful” RPi system has the ~0.5s delay you are seeing on your SOtM setup. It does not. In my setups, the start of playback is effectively instant (<0.1s). The Pis are faster than the SOtM units, but not that much faster. Perhaps this has to do with configuration?

Hope that satisfies the curiosity!

2 Likes

Excellent, even I get it :grinning_face:

Thanks
Andy

PS. not sure if I can iron out the start delay - I’ll have a fiddle in the settings, I vaguely recall there is a ‘latency’ drop down, maybe that will help…

1 Like

Yet, network traffic is not noise. The assumpton is that “tranquility” results in less noise, but there is no evidence to support this, so far. I hope you do explore this further.

Likewise, it is unclear if the DAC is controlling timing or Diretta essentially behaves like S/PDIF, where there is no signalling, and the DAC has to reclock.

Describing this as a “quantum jump” causes me to scratch my head. Is it messing with timing, or altering the stream? I don’t find the argument that less network traffic in the final hop very convincing since noise is ubiquitous. Nor the need for a real time kernel. Low latency would make more sense in audio-intensive scenarios, but for an SBC doing nothing but processing a simple 44.1 kHz stream, this should be unnecessary (after all, it is Diretta that schedules the transfer, not the RTK.)

I do hope we can move beyond hypotheses. For now, I’ll sit back, enjoy reading, occassionally scatch my head, and make sure the discussion stays on track.

9 Likes

Correct. Yet, network traffic causes electrical activity which can generate noise.

I think David has demonstrated that the amount of noise generated at the Target can be minimized. It doesn’t have to be everywhere, as in standard TCP/IP.

2 Likes

Hi David,

Another question, sorry to keep piling on

I want to see if I can set up a totally free trial of Diretta at a friend’s house, he’s er, on the fence about this. I understand we can get a free trial of Gentoo Player, a 44.1 restricted trial of Diretta and he has a few Pi2s and Pi3s to hand - question is (and bearing in mind the low real-world processing requirements for Diretta, plus that it works on the lower-power SOtM boxes), will it work on, say, a pair of Pi3s?

I realise it might not sound as good as a Pi4/5 setup - but then I was intrigued enough by PC/SOtM (which doesn’t sound as good as SOtM/SOtM and may not sound as good as Pi4/5) to keep going - and I’m pleased I did. If we can get a PoC up and running, it’ll be an interesting project and could pave the way for the full system. Of course, the danger to me then is that I have to swap out my SOtM boxes - and his Auralic G2.2 may go up for sale at exactly the wrong time…

Any more pointers please?
Thanks again
Andy

1 Like

Hi Andy,

In addition to my dual RPi4 Diretta endpoint, I use an Allo USBridge Signature with 3B+ compute module as the surround endpoint in a 5.1 home theater. My Audiolinux license doesn’t help here, so I paid GentooPlayer to make this a Diretta Target, which does now sound better but can’t use David’s direct-connect isolation AFAIK.

Please correct me, David, if I’ve missed something…

2 Likes

I’m not disputing this, but that’s not my point, and I don’t believe Diretta is about network generated noise, either. Indeed, Ethernet rejects common mode noise effectively.

When I say noise is ubiquitous, that means it is everywhere all the time. Potentially reducing noise – we do know if this is true – in the target doesn’t reduce noise from those other sources. We live in very noisy environments, e.g. machines, power lines, Wi-Fi, microwave, radio, and so on, all affect electronic devices.

But steps are already taken to reduce or eliminate such noise. Diretta implies that the drip feed reduces noise in the target, and that there is a correlation between this and noise.

However, this isn’t proven, and there is no demonstration of this being the case. Moreover, DACs have galvanic isolation between the input stages, and the analogue output. That’s not to say that noise can’t appear at the analogue output through other means. But it begs the question: what is Diretta actually doing?

This is why this is such a head-scratcher.

3 Likes