PS Audio made an excellent engineering choice given their DACs architecture, but it does not make sense to repeat that choice in a software system. Like everyone else, PS Audio is optimizing the performance their product within a set of constraints. Product cost, R+D budget, engineering limitations. Roon has a different set of constraints to contend with.
Before I get into some thoughts about this idea–I want to point out that PS Audio is optimizing their output for a discrete DAC running at DSD128. This is a very different thing than a chip-based DAC that accepts DSD128 as its maximum input rate–which is the very much the “norm” that we would be targeting if we went down a road like this in software.
There are a very small number of DACs out there that are truly discrete (i.e. our DSD stream would pass to the DAC without further rate processing) AND also limited to a DSD rate that you could reasonably downsample to (i.e. DSD128 or less). I think Lampizator has made some. The current batch of Discrete DACs seem to take DSD512 directly now–which is awfully close to the sample rate where PS Audio does their processing anyways.
Anyways, lets try a little thought exercise…
Pretend we’re the engineer designing the DirectStream and assume that the DSD128 output stage and the input modules are already done and it’s time to work on the FPGA.
So we need to design an FPGA-based signal processing path that takes input at any rate up to 384kHz/DSD256 and outputs it as DSD128.
Lets assume for a second that we solve this the most “normal” way that one would in software: by employing a polyphase FIR sample rate converter to convert the input to DSD128 in one stage, then passing that through a sigma-delta modulator.
I’m sure PS Audio’s approach differs in some details (in fact, Ted has said elsewhere that they use IIR filters at the DSD64x10 rate, so it is quite different!), but this thought exercise really isn’t about PS Audio–it’s about Roon. So getting too far from the way that Roon would extend itself to implement this isn’t very useful.
When using this approach in the most direct way, the filter within the sample rate converter runs at the lowest common multiple of the input and output sample rates. That means:
- for 44.1kHz -> DSD128, the filter runs at DSD128
- for 96kHz -> DSD128, the filter runs at 56.448MHz (DSD64x10–the same rate as PS Audio uses)
This is in fact, exactly how Roon’s signal processing works today. If you do 96kHz->DSD128, we process the SRC filters at DSD64x10. But in Roon, when you go 44.1kHz->DSD128, the filters are processed at DSD128.
Isn’t this worse? Mustn’t there be a benefit to running the filters at a higher rate?
When using the polyphase approach (how all of Roon’s sample rate conversion works), there is no mathematical difference in the output, but if you do the processing at the higher rate, you’re going to spend some more resources doing it. You’d use a much larger filter (5x larger), and 4/5ths of the coefficients making up that filter would never impact the final output–since they would be perpetually out of phase with both the input and output streams. The 1/5th of the filter that remained would be identical to the filter used when processing the filter at the DSD128 rate. The multiply/adds would be the same.
Why did I explain all of that? To make two points:
- There is nothing at all unusual about using a high-rate intermediate format when upsampling. In fact would be unusual if the product was able to meet its functional spec without doing so.
- Using a higher-than-strictly-necessary intermediate rate does would not change the math in Roon–it would just make more work for the computer.
Ok, so now that that’s explained, we can see that the interesting thing about PS Audio’s product is that it uses a fixed intermediate rate (DSD64x10) regardless of the input sample rate. Why would they do that? Well, consider that they are doing this on FPGA.
On an FPGA, compute performance is “pass/fail”. Either the DSP fits within the performance envelope of the chip or it doesn’t. Also, on an FPGA, code size and filter storage can be significant constraints–and building much of the system around a single sample rate takes pressure off of those constraints. Think about it: once the input stream is converted to that DSD64x10 format, everything that comes after that point can be the same. Same code. Same filters. One thing to focus on when optimizing, and no waste of those precious FPGA resources.
On a computer, things are different. Code size + filter storage are virtually unlimited. Having many large filters pre-generated for a variety of rate pairs is practical, as is having multiple copies of the code itself, tailored to different use cases. So we do.
But–unlike on the FPGA–we are obligated to scale our performance demands gracefully since we are not the only thing making demands on the CPU. It’s ok to have more expensive approaches that provide benefit (we support some pretty CPU-bashing configurations already), but we won’t throw away performance for nothing in return 
Finally…
To be very clear: I am not trying to describe how PS Audio’s DAC works inside. I am just trying to illuminate some design tradeoffs, and explain why certain approaches may make more sense in software vs in hardware.
I also want to be clear that many products perform processing/filtering at very high intermediate rates as a matter of course (including most upsampling software products–Roon, HQPlayer, Audirvana, and JRiver for sure). It is just part of how a lot of sample rate convertors work. PS Audio is associated with this sort of upsampling because they developed a strong marketing message around it, and took the time to optimize their system very nicely–but it’s really a normal thing, especially in software.
As for Roon, I don’t think there is much for us to take away here.
Increasing the intermediate rate during SRC filtering so that we could say your media spent a few milliseconds at some ultra-high rate would spend resources without changing the math. That is a little too much like snake oil for me 
In order for this kind of thing to make sense, we really need to be in context of an architecture more like PS Audio’s. But that architecture doesn’t make much sense within our design constraints. They target one DAC design with an FPGA, and we target every possible DAC design with a computer. There is value in both things, but that doesn’t mean that they should–necessarily–converge.