Gabriel - my understanding of this issue is rather simplistic. I’m sure you will find much more robust explanations on-line. But at the risk of sounding pedantic, please allow me to weigh in…
Electrical signals to our DACs are intended to pass zeros and ones, as you say. But they are not sending, zeros and ones, just analog signals that can be interpreted as zeros and ones by the DAC. And that depends on precise timing of the interpreting device (the DAC). And that DOES get messed up at times, resulting in jitter.
In something like the TCP/IP traffic that we’ve all come to know and love, information travels in discreet packets (or bundles). And those bundles have checksums, so the recipient can do a little math on the packet received, and see if the result agrees with the checksum. If they don’t agree, it can send a “better send that packet again, something happened to that last one” message back to the sender. So it is zeros and ones that got encoded into an analog message, and ultimately decoded by the recipient as the exact same set of zero and one. This is how Roon would communicate with a network endpoint. But that has nothing to do with the music streaming into the digital input to your DAC. There are generally no checksums in that process.
If something gets messed up in that process, it stays messed up, unless the DAC figures out that it looks wrong, and approximates what it thinks it should have been. So in the case of digital music coming into your DAC (which is rendered in analog signals, merely representing zeros and ones) timing becomes a critical issue. That analogue signal is a waveform. And the troughs are zeros, the peaks are ones. But the recipient looks at the signal based on timing (like a fast metronome), and “tick” of the timing may not exactly line up with the peaks and troths. So if the recipient ticks a little early, it may see the wave as it was moving to a peak or a trough, but was not quite there yet. The leaves the recipient with a problem. That beat was not quite at peak level, nor at the lowest. Was it a zero or a one? Sometimes the resulting decision by the recipient is wrong.
And if the beat of the timing (time between the ticks) for the sender and receiver are not exactly the same (one is slightly longer, one is slightly shorter), then the chances if the misinterpretation of a zero or one increases over time. But if you are using an asynchronous connection (like USB), then the receiver can detect the problems while it’s still easy to correct and has the chance to say, “I think you are going a little fast (or slow). Please alter your timing rate to mine.” That - of course - reduces jitter.
Bottom line - it’s NOT just zeros and ones. It’s a waveform that was encoded from zeros and ones, to be able to be decoded to the same zeros and one, if - and only if - the timing of said encoding and decoding precisely align.l
Hope that helps. Sorry for the long - if overly simplified - explanation. And for all those tech cognoscenti here… if I misrepresented anything, please do correct me.