This is 100% guesswork on my part…
If I understand what they are saying, and really guessing here, in DSD mode the, nominally, 1v, 0v DSD signal is fed into a 24 stage pipeline, and each stage is attached to a resistor in the R2R network.
For example, if the DSD signal happened to have 24 1’s in a row, the output cap would be charged with a nominal 1V (through a resistor). A mixture of 1s and 0s would charge and uncharge the output cap. An of course the charging and uncharging of the cap would vary as the mixture of 1s and 0s varied.
Decoding a PDM (pulse density modulation) signal, which is what the output of a DSD A to D is, is sort of like doing a simple sliding boxcar average over the 1s and 0s. The output level is proportional to the density of 1s in a PDM signal, so a boxcar average sort of works.
The technique is “sort of” because a simple boxcar would let a lot of noise in. The noise is there because the PDM signal is, sort of, the result of modulating a pulse with an audio signal, which produces, without going into a lot of detail, a result with “extra” frequency components. So what FIIO is doing here is sort of a sliding triangle rather than boxcar, which would improve the results by removing more of those “extra frequencies” than a box car would.
At a high level they are using a low pass, FIR filter on the PDM signal to extract the audio signal. And, at a high level this will work well because a FIR filter does not mess up the phase of its output the way a high order analog filter would.
At the highest level, whenever you do signal interpolation, you need an infinite number of exact interpolation points, over an infinite time inveral to perfectly recover the signal. The noise (error or whatever you want to call it) in a digital signal is the result of meeting none of those conditions. This manifests itself in many ways, for example this is why using a triangle, or Blackman window, or a rectangle gives different, approximate, but maybe good enough, results.
The MSB Cascade has 8 ladders per channel (but they are not regular R2R). In DSD mode they say that they convert these ladders into massively parallel native DSD converters. And (alert… guessing on guessing here) they are making D to A’s and sum the results, which would be different from the implementation FIIO is describing.