Transcode/Downsample For Devices Over Slower Links

This is a very general request that would eventually very likely be part of your mobile solution. The issue is that I can VPN into my network and stream data just fine to my endpoints over the VPN as long as the link is fast enough. But over lesser connections, Roon does not support transcoding or downsampling to a sufficient level to support playback over latent or slow links. Cellular networks cannot sufficiently stream lossless all the time (they can some of the time though).

I realize this would lessen the quality of the output, but a loss in quality is better than no audio. So the feature request is this:

EDIT: Requirement/Feature Request Re-worded 4/08.

Provide the capability to send audio to endpoints over slow, latent links. Offer at least two options on the degree of tradeoff the technique will impact sound quality and improve playback ability

Roon architecture puts all codecs in the core. Endpoints receive only PCM or DSD. Streaming any codecs – lossy or lossless – seems unlikely. But downsampling in the core to 16 bit 44.1 kHz PCM already is possible, if needed.


That PCM or DSD is RAAT encoded right? So then why not enable compression on the link or approach it that way?

RAAT is a proprietary transport mechanism, not an encoding scheme. The Roon guys can answer definitively, but doubtful that RAAT has an optional compression mechanism.


So you’re saying that it doesn’t use TCP/IP? That’s it’s a transport mechanism in the purist sense of Network Engineering terms replacing TCP?

Per the developers, Roon uses TCP/IP, changed over from UDP.


Sounds like it’s relying on TCP as the transport mechanism over the network. Perhaps they have their own built-in CRC/error checking and other data consistency routines, but as a Network Engineer by trade, I wouldn’t say it’s a true Transport Mechanism unless it completely drops TCP/UDP/etc and they have their own layer 4 protocol. Which would make it a disaster to work with modern systems. I get you when you (and Roon) want to call RAAT a “Transport Protocol”, but it’s still using the built-in OSI networking standards which allow for compression (which is my point).

Not trying to be pedantic or difficult, but I do want to be sure we’re talking the correct language here.

RAAT = Roon Advanced Audio Transport

If you disagree, take it up with the developers who named it.


RAAT is, I would think, an application level protocol in OSI stack terms. More on its design goals here.

Thanks @Geoff_Coupe – my point exactly.

Call it what you want, it’s not a “true” OSI Transport layer.

They do have a hint to OSI layer changes they are making, in the reference you provided @Geoff_Coupe:

The way this reads is that they may play with things like packet/datagram size and other optimizations, which can get pretty complex. So it sounds like RAAT works in conjunction with setting specific fields when they utilize the existing stack. But they’re not re-writing TCP here.

I’d also add, anything done at Layer-7 (the application layer) is essentially an encoding mechanism that is then sent over the existing network stack.

Final edit (hopefully) – the thought occurs to me that you could technically write your own version of a “full stack” and utilize the existing stack to send the data (sort of a network within a network). It would be costly, complex and make me scratch my head as to why you wouldn’t want to use what is provided natively by the existing OSI stack, but it’s still possible. One reason you may want to do it would be to create perhaps a mesh network where you wanted to add more redundancy and reliability as an “overlay”. So in that sense it is an application layer program that is acting like a network transport (potentially a full stack even but I would think they wouldn’t recreate the physical layer). I’d still scratch my head and wonder why not utilize the application in a more optimized manner and save resources. But I’m not an application developer, I’m a Network Engineer.

And the details of what’s happening at Layer-7 are not provided by RAAT documentation that I can find. You’d probably need an NDA for those details.

> No support for under-specced platforms or un-proven network stacks. RAAT is built to evolve over time. We continue to improve the network protocol. We might decide to change the buffer size requirements on the device to increase stability. We might decide to build a second network protocol optimized for streaming over WAN, or something else like that. We give the same advice for users of Roon as we do to manufacturers building RAAT-based products: under-specced systems lead to bad user experiences; hardware is cheaper than ever and getting cheaper all the time; don’t over-economize if you want the best result.

This should really go hand in hand with the mobile music sync / stream.

2 posts were split to a new topic: How did you get Roon to work over VPN?

No one has, I not read too much into RAAT name and the OSI networking layers.

I thought about what @WiWavelength had to say about the codecs being server-side and everything being sent PCM. This begs a question: I assume this means if your database was all MP3 128K, that Roon would sent a 16/44 PCM bitstream to the client. This would be more than a tenfold increase in data size. Here’s a an example: a 128kbps mp3 file of 4,689KB becomes a 51,176KB PCM file (WAV) which is what is sent as a bitstream (if it’s indeed sent as PCM/LPCM). I realize that most people using Roon are storing their data hi resolution or lossless, not 128K MP3. Just giving an example.

_EDIT So I found the answer to my own questions at
Roon does send the full upsampled datastream over the network.

So I have a couple of a questions (yes I could go through the exercise, but the point is to document this):

  1. Is Roon then “upsampling” and sending that full PCM (LPCM??) file over the wire?
  2. Does this mean that any upsampling that’s done (as in upsampling everything to DSD) ends up sending DOP (or dCS DSD earlier DSD) over the wire and therefore the equivalent of my comparison of mp3 to PCM? That’s about a tenfold increase if I’m just eyeballing it, perhaps just a bit less, maybe around 7x.
  3. If 1 and 2 are true, then why don’t I hear more about network congestion issues being the core of problems people are having rather than hardware issues?

Technically, it’s possible to compress PCM - there have been open and closed algorithms since phone systems went digital in the 80’s albeit those are focused on a narrow band of the voice spectrum. Good example is the now “mostly” outdated GSM spec. But modern VOIP/H.323 does it. Some others that come to mind are ADPCM. My point being there are a lot of techniques for compression of PCM after the fact. It just ends up putting some of the processing requirements on the distant end to decompress – the more compressed the greater the processing requirements. But with modern systems it’s negligible – think about it, it’s built into your cell phone already.

The bottom line I’m simply saying it’s possible to compress PCM and offer that as a solution to an endpoint willing to sacrifice a little bit of fidelity. I’m not asking to destroy it, maybe just a reasonable amount of compression that is a trade-off between fidelity, processing and “difficulty to implement”. There are certainly cases where Roon has done something unique to how they’re sending out data as Roon isn’t sending RAAT to endpoints like Sonos.

And this is just for discussion purposes for the group, I defer to Roon for the implementation. Any Engineer knows that when it comes to a requirement – or a “Feature Request” the point is to NOT tell Developers or Engineers how to come up with the solution. It’s to present a need and ask for a suitable solution as they are the domain owners of the technology and know best how to implement a potential solution.

So the requirement/feature request is this:

Provide the capability to send audio to endpoints over slow, latent links. Offer at least two options on the degree of tradeoff the technique will impact sound quality and improve playback ability

1.) WHY??? would you UPSAMPLE from an 128k MP3 to PCM 44.1, that’s a contradiction in it self.
2.) Also a VPN Tunnel over the Internet may take a lot of different hops, even depending on the L7 Applic. different pakets can take different ways to the end (thats why TCP was invented :-)) So why should the Applic. do the safty mechanism a second time and also in an adptive way you seem to want? also some VPN Applics work on L7. So what do you really want?
Latency problems on cellular networks may differ on (users in the same cell, QoS Profile) so ask for the spec which max. latency ROON supports to gurantee no dropouts.

best regards,


@Armin_Moesslacher: are you a Roon Developer?

No offense, but any VPN that’s going to work with Roon is going to be Layer2 (there’s been one case of someone using a Layer3 VPN but that only worked because the VPN server and the Roon server were the same device). Still that does go over layer 3 and is encapsulated, but I’m certainly getting beyond the scope of this discussion.

And “hops” are determined at layer 3, TCP is not layer 3 and it has nothing to do with the number of hops or path a packet takes over the Internet. TCP was created to allow for multiplexing of traffic in a reliable fashion over IP. The application layer has nothing to do with the path you take to get to your destination, that’s called routing and it’s done at the carrier level via BGP, and other autonomous routing protocols when you’re not on a carrier-class segment (just Google ASN’s, routing, BPG, and how are routes determined because it sounds like you don’t understand routing which has nothing to do with this requirement or discussion) do your homework please or become a network engineer if you want to make such incorrect statements.

And I would not want to upsample 128K MP3, if you understood and completely read what I wrote, Roon does this because the lowest it supports is a 16/44 bitstream. 128k MP3 is not that, so it’s is a lossy version of that 16/44 PCM bitstream in file format. Since you cannot send MP3 over the wire because all the codecs are on the server, the only alternative is a 16/44 bitstream. I did the packet captures and the math works out actually. Go do it yourself if you don’t believe me or understand. The numbers I posted are quite accurate as to what’s sent over the wire for a very small MP3 file.

I believe my requirement is very clear and purposely open-ended enough to allow for the Roon dev’s to implement the solution in the way they see fit. The point is to support playback over slow or latent links. They might even decide to just add a caching function on the application side of the client to store a large portion of the file before it begins playback, or play back via memory like Jriver does (though that has issues with very large files and mobile devices, but I’m getting way to detailed here and that’s not the point).

Here is my requirement again – please tell me what you don’t understand about it.

Provide the capability to send audio to endpoints over slow, latent links. Offer at least two options on the degree of tradeoff the technique will impact sound quality and improve playback ability

No i am not, There are L2 VPN solutions yes, but it always depends on the device the tunnel is made with (for instance a Qnap device which Roon server is running on, so also L7 VPN solutions are possible and are NOT always L2/L3).

OK my Cisco CCNP certificate seems to be for nothing :slight_smile: now i know it, thanks to you.
Lets talk TCP multipath now :slight_smile:

So that way MP3->PCM (as an example) we talk about alot about unneccessary interpolation, OK undestood what you mean how the music is transfered in an PCM bitstream.

Still end to end latency is very important (i work in Telecommunications on technical level) to determine if end2end Audio works in good quality and of course a bigger buffer on Applic. level would help agreed.
Still as i said the end2end QoS in a cellular network can and will never be guaranteed.

Take voice over LTE as example since this has most priority (voice) and voice are very, very small pakets.
An highres audio stream is way more data though and not loosing pakets is crucial.

So your request makes kind of sense now to me.

best regards,


@Armin_Moesslacher: You are technically correct that there are Layer 7 VPN’s…

But as a Network Engineer for almost two decades, I (and most of my colleagues) have a bit of disdain for them because L7 VPN’s are hogwash. They’re just SSL connections with HTTP/S rewriters that force HTTP/S traffic over the SSL connection. It’s not a full VPN, it’s more like a secure HTTPS connection into a enclave of Web Servers. You cannot send arbitrary layer 4 traffic over application (layer 7) VPN’s like you’re talking about. They’re more commonly referred to as WebVPN’s and it’s a bastardization of the true intent of the acronym VPN – Virtual Private Network – meaning you can essentially “remote” a whole network to a device over the public/non-private Internet. Instead it’s just a couple of protocols to a few servers.

Also, unless Roon entirely changes how they work on the wire, the program will never be compatible with a L7 VPN. I’m confused when you say that your Qnap L7 VPN is working with Roon. Maybe if it’s storing data on a remote site and using that connection as a drive and presenting it to Roon Server then OK. But it won’t allow for a client to connect to a server remotely through that L7 VPN. For starters the IGMP/Multicast traffic won’t make it. There are other issues as well.

Personally, I think L7 VPN’s are the product of marketing. If Engineers had a say we would have never let it get called that. Because it’s not a true VPN. But it sells. I just wanted to acknowledge that you are correct about saying there are L7 VPN’s – however most people don’t understand the difference and think that they are a true/pure VPN in the sense I have described above which is why I tend to discard them as real solutions.

And you’re spot on about voice networks. Fortunately they only need a smaller frequency range to compress and packetize. Think back to GPRS and how all that got started even further back, then consider how much of a leap GSM was at the time it came out. Modern cellular networks are quite reliable at the data stream level these days. We can thank LTE (and 4G, but we know there’s really no such thing or formal spec, but you know that if you’re in tcom).

I’m glad the requirement makes sense. This is where Roon is going to need attention as a good amount of networks are going cellular/mobile and that’s the future. People wanting to remote their solutions, and eventually a “mobile Roon” will need this kind of capability, not just someone unfortunate enough to be stuck on a DSL128K connection over a VPN.

Thanks for the reply.

You are welcome.
For LTE it still depends on the QoS Class you get on the radio network interface which is the most crucial part and the amount of users in the same cell (this varies a lot as well sometimes). But my VPN over LTE keeps a stable connection most of the time, so anything is possible and 5G is at the start here in Vienna soon. Hybrid Solutions in the loop, thats why i mentioned TCP multipath which is not so easy to implement for now for that purpose (then its REAL hybrid LTE/DSL for instance at the same time).

best regards,