Roon skips and stops during mixed RAAT zone playback (ref#GRVTPQ)

Marcelo_Bulgueroni · July 7, 2025, 6:50pm

Trying to re-open an unsolved case the support system closed automatically for the second time. Please let me know if you are actively investigating it. Thanks

Reference:

benjamin · July 8, 2025, 11:15pm

Hey @Marcelo_Bulgueroni,

We’re sorry to hear you’re still having issues here! And apologies if this has already been glazed over - Does it happen with one zone at a time? Where are the zones themselves connected, near the primary router?

If you could reproduce the issue and share another specific track name, we’ll gather a fresh diagnostic report to share with senior development. Thank you!

daniel · July 10, 2025, 6:26pm

Hi @Marcelo_Bulgueroni,

We just wanted to follow up with one more question from our developers, in addition to what my colleague @benjamin asked earlier. Could you let us know where the affected zones are located in relation to your router?

Marcelo_Bulgueroni · July 11, 2025, 1:48pm

Dear @benjamin and @daniel ,

Thanks for getting back. I apologize in advance, but in my understanding the last contact from you was that you were analyzing the logs from the repeated crashes experimented.

We already have, through extensive tests:

1 - a very consistent way to repeat the crash, using the playlist I created with DSP enabled;

2 - lots of tests involving wi-fi grouped endpoints, ethernet grouped endpoints, and mixed grouped endpoints, all behaving in the same manner, as per my post Roon skips and stops during mixed RAAT zone playback (ref#GRVTPQ) - #6 by Marcelo_Bulgueroni

3 - Tests with zorloo and without zorloo as per your requests;

4 - A description of my network setup, with a mesh network consisting of a majority of routers connected among themselves through ethernet (only one is connected though wi-fi to the router and there was some tests in which I connected even this access point to ethernet). My house is almost entirely ran on ethernet through a gigabit tp-link switch, only three endpoints are wi-fi, and the three endpoints are VERY CLOSE to each mesh router with excellent signal quality (but again, the problem happens even on endpoints connected through ethernet).

5 - the problem occurs with multiple songs, and we already have a 24-hour test I made to demonstrate that.

The feeling I have is that there is a specific focus on finding the problem in my network, my devices constantly, and even though we proceed with this extensive A /B testing not for once I saw any consideration that maybe the RAAT protocol is the problem. This is kind of tiresome, respectfully, considering that I have been in this process for almost two months, having my tickets closed two times and when I reopen them being faced with a similar kind of questions.

I would greatly appreciate if you could help me understand why the tests and information provided are not enough for a diagnostic. It is ok as well to have an answer like “RAAT cannot deal with this currently” so I can finally give up on the investment I made on “upgrading” to Roon’s native protocol. What is not okay is feeling that the information is not being interpreted as a whole, since we are not talking about an wi-fi problem here as it was thoroughly demonstrated and tested.

Again, I really appreciate your help and hope to get this fixed with your help and I am fully available to test new variables or information.

Thank you for understanding.

connor · July 11, 2025, 8:51pm

Hi @Marcelo_Bulgueroni,

Thank you again for your patience. RAAT’s uncompressed protocols rely on multicast traffic - you’re not necessarily encountering something unresolvable here, but we will likely need to reconfigure more settings. We also need to pinpoint what’s happening with the Zorloo.

We need to clarify two points to proceed:

When you say “crash,” do you mean the Roon GUI itself actually freezes or closes down? Or are you referring to a dropout in the audio transport (stream gets paused or stops playing)? The team needs to clarify the precise symptom you’re experiencing with the Zorloo.
The line-of-sight between mesh nodes and routers isn’t the problem. RAATServer distributes audio between endpoints that must also communicate with one another to synchronize clocking and playback. If the mesh network is attempting to optimize bandwidth distribution and packet handling, it might intentionally delay or re-route some of this traffic between endpoints. This can wreak havoc on RAAT and it will affect Grouped, not individual Zone, playback.

The logged dropouts are systemic and always include a second of audio (precisely 96000 samples). The dropouts occur simultaneously across each Zone, even when RAATServer itself is online and distributing audio actively. The endpoints aren’t receiving the data that RoonServer is sending to them over the network, and they all stop receiving the data at the same time even though Roon keeps sending it. They eventually receive it again.

In the vast majority of cases, this is due to the STP implementation in any managed switches or the multicast settings of the mesh network. It’s for this reason that we’re asking about network settings, not just network topology.

Have you tried setting up manual ethernet backhauls between all of the mesh nodes instead of relying on the mesh network?

Try disabling Beamforming, if applicable, in the Deco M5 settings admin page.

We’ll watch for your response.

Marcelo_Bulgueroni · July 15, 2025, 1:19am

Hello @connor

Thanks for the detailed explanation and questions!

Concerning the points you raised:

Sorry if I have been misleading in my wording. What I say is a “crash” is the behavior described since the beginning only - roon skips the next song entirely then plays 3 or 4 seconds of the next song stops playing. GUI remains fully functional and as soon as I press play the song resumes playing. It plays fully and sometime after a different song presents the same behaviour. It only happens when transitioning from one song to another.

The line-of-sight between mesh nodes and routers isn’t the problem. RAATServer distributes audio between endpoints that must also communicate with one another to synchronize clocking and playback. If the mesh network is attempting to optimize bandwidth distribution and packet handling, it might intentionally delay or re-route some of this traffic between endpoints. This can wreak havoc on RAAT and it will affect Grouped, not individual Zone, playback.

The logged dropouts are systemic and always include a second of audio (precisely 96000 samples). The dropouts occur simultaneously across each Zone, even when RAATServer itself is online and distributing audio actively. The endpoints aren’t receiving the data that RoonServer is sending to them over the network, and they all stop receiving the data at the same time even though Roon keeps sending it. They eventually receive it again.

In the vast majority of cases, this is due to the STP implementation in any managed switches or the multicast settings of the mesh network. It’s for this reason that we’re asking about network settings, not just network topology.

Have you tried setting up manual ethernet backhauls between all of the mesh nodes instead of relying on the mesh network?

Yes, I have already done that, using only ethernet connected router points in order to avoid the backhaul chatter through wi-fi, but the problems continued. I can try that again if you want.

Tried that as well and the problems remained, but will try again. I also disabled fast roaming but it seems unnecessary due to the fact that I set each endpoint to connect to a specific router in order to avoid being tossed around different routers.

A point I would appreciate your insight: it is clear that the setup on wireless has its challenges on any home. However the problems appear even when grouping ethernet-only endpoints. The difference is that it is less frequent but easily happen when testing. Shouldn’t this eliminate all the wireless investigation and bring the focus to other technical point? Please forgive me if I am missing something in my interpretation of the current facts.

If you think it is productive I can run a battery of tests only with ethernet-connected endpoints so you can better compare the logs. Just let me know!

Best,

Marcelo

connor · July 15, 2025, 9:49pm

Hi @Marcelo_Bulgueroni,

Thank you for precisely clarifying these points. We’re making progress pinning this down and we’re very grateful for your patient and diligent troubleshooting so far.

RAAT can be sensitive to network topology and settings, but it’s an industry-standard, robust group playback protocol that has matured over more than a decade of harsh QA and user testing. We haven’t received reports of protocol-level failures with RAAT in many years, which is why we’re being so infuriatingly scrutinous of your network here.

Let’s summarize what we know is not happening.

We’ve eliminated the possibility of WiFi-related interference or dropouts. We’ve discounted the possibility that the router or mesh nodes are failing to forward multicast commands.

The precise symptom you’ve reported is a dropout of 1-4 seconds, accompanied by track transition issues and overall playback failure (you need to hit play).

With this information in mind, and considering the symptom, we’d like to focus on two possibilities:

The Deco M5 can still mishandle RAAT clock synchronization traffic, particularly during track transitions. Even when nodes are in ethernet backhaul. Deco firmware doesn’t allow users to customize STP or multicast handling. This might not be a factor if all your endpoints (RoonBridges) are behind the unmanaged switch in your network setup. But let’s set that aside for now, because we likely test this without rearranging your topology.
A single Raspberry Pi might be slightly underpowered; resource constraints on that machine are causing slow response to RAAT clock sync during track transitions. This clock sync failure cascades across the grouped Zone.

What we recommend as a robust test for both possibilities:

Create a new grouped Zone, starting with whichever Zone uses the Raspberry Pi with the most processing power in your setup.

Play a queue containing 96KHz content in the same file format.

Add only a single Zone at a time. As soon as you encounter a dropout, note the Zone name that you just added to the group and share it here along with the track that was playing and stopped. Please also describe how this Zone connects to the network relative to your RoonServer.

Thank you and we’ll watch closely for your response.

Marcelo_Bulgueroni · July 18, 2025, 5:36pm

Dear @connor

Considering the possible power investigation, I did tests for almost 30 hours, with the following setup:

Reactivated a NUC 8 (core i7) which used to be my Roon Core, installed Roon Bridge on it and connected the Meridian Explorer 2 to it (ethernet);
Activated the Realtek audio of the NUC as an endpoint (ethernet);
Activated the onboard audio of a rpi 3B as endpoint (ethernet);
Activated the onboard audio of a rpi 2 as endpoint (ethernet);
Used my Khadas Tone Pro 2 connected to the rpi (ethernet);
Used the three endpoints with Zorloo (rpi zero 2w, wifi 5ghz);
Used the Coreaudio of the Mac Mini as endpoint (ethernet).

Started with an ethernet-only group, Leaded by the Khadas Tone, added with Coreaudio and The Meridian Explorer (NUC) - everything went smoothly, and continued ok when adding the three wi-fi endpoints. It ran fine with this six endpoints for hours.

After that, adding the RPI2 caused problems almost instantly.

Judging I had pinpointed the question to power problems, I started othe group, this time with the Meridian Explorer (NUC) first. Then the problems started to appear regardless of the components of the group.

Went back to starting the group with thre Khadas Tone Pro. Problems started happening again, with every configuration of zones possible.

Rebooted: whole network, roon core, endpoints. Problems continued.

Tried to adjust again re-sync intervals, buffers, to no help. I simply cannot establish a good baseline in almost 48h of continual tests.

The main problem is the transition from a 96hz song to a 44hz song on Tidal. This is when the general failure more consistently happens, so you will find in the logs I focused many times on this moment to see if the transition would happen or not.

I hope the logs can help bring some light to this crazy situation…

benjamin · July 21, 2025, 9:16pm

Hey @Marcelo_Bulgueroni

Thank you for the above testing - we very much appreciate your thoroughness in your process and your reporting!

We reviewed a fresh set of diagnostics around the timestamp of your testing, and it looks like this could be related to timing problems. Your system is consistently missing chunks of audio that line up with exact time intervals, like one second or one-tenth of a second.

This usually happens when timing sync signals between devices are delayed or lost, which can cause the audio to drop out or stutter, like track transitions, which also cause the buffer to flush and refill with new data.

As another step in troubleshooting, try upsampling everything via MUSE to 96hz first and try to drop out on

Your MacOS Zone
Your RPI 2 Zone
A grouped Zone with three or more endpoints

After that, set everything to 44hz and try the same tests above 1-3 on the same Zones. We’ll monitor for your results - thank you! 👍

Marcelo_Bulgueroni · July 26, 2025, 1:59pm

Hello @benjamin

Did the tests as directed.

When zones were alone no problem occurred (which is normal, it never skips on isolated zones, only on groups).

On 96hkz it took a LONG while for the skip to happen. Many hours. It happened eventually, and only when I used a zone with more than 4 endpoints grouped together (all set to 96hz fixed) - I am sure if I added convolution equalizer to any of them, for example, the skips would happen much earlier

On 44hz the behavior was more or less the same. Needed a zone with more endpoints to reproduce the skips (3 or more), but it seemed to happen faster than on 96hz.

tests were run during the last 36h approximately.

Thanks!

connor · July 30, 2025, 8:09pm

Hi @Marcelo_Bulgueroni

Let’s summarize what we’ve learned from these tests:

The dropouts occur when you play hi-res files (large files) on distributed endpoints across your mesh network. This includes Ethernet-only groups, but all are still managed by one of the two Deco mesh routers.
The dropouts include a full second of sample loss in each case.
While this might be exacerbated by underpowered endpoints (like the Raspberry Pi 2), resource constraints at the endpoint-level are not the sole source of the dropuots.
While you haven’t bypassed the mesh network entirely, you have replaced one Deco mesh router with another Deco router using a different generation of firmware. The issues still occurred.

RAAT requires tight timing coordination and low jitter across zones. Older Pis or unexpected multicast handling by the network can break this - you’re up against an environmental limit of using RAAT within this ecosystem.

It sounds like you’ve been able to pinpoint certain configurations that won’t manifest the symptoms for some time.

However, even when relying on ethernet backhaul, the Deco is likely imposing its own traffic management to the nodes. This affects multicast traffic handling and can impact Roon - this is why you see the symptoms on Grouped Zones that are communicating across the network with one another and with Roon.

The strongest recommendation would be to rely on longer cable runs and connect all of your endpoints to the unmanaged switch, which is in turn connected directly to the router. In other words, bypass all of the mesh nodes (even those backhauled). This is obviously a challenging topology to impose in a residential environment.

Where would you like to proceed from here? Your testing has been incredibly detailed, and it’s clear you’ve taken every logical step to isolate this problem. However, based on logs and behavior, this appears to be a real-world edge case; RAAT’s group playback synchronization is sensitive to both network timing jitter and endpoint processing latency with a Group this large and files of this sample rate on this specific network. Older Pis and Deco mesh firmware are likely amplifying these effects. Unfortunately, developers have uncovered no architectural flaws with Roon’s clock sync in the logs we’ve reviewed extensively over the last three months. What we do see are environmental vulnerabilities on this network and with the older endpoints. There’s not a clear fix we can release that would improve or relieve this behavior for RAAT with the combination of hardware, topology, and firmware that you’re currently using. However, the recommendations and best practices we’ve attempted to outline can hopefully optimize performance and prevent dropouts even at higher-resolution playback and across large Groups.

Please let us know if we can elaborate further. Thank you.

Marcelo_Bulgueroni · August 4, 2025, 7:53pm

Hello @connor

Thanks again for the reply. Concerning this:

It is already done when I referred to each ethernet-connected endpoint. They all connect to the unmanaged gigabit switch directly, and the main router (Deco X60) to the same switch - the Other Decos M5 only provide connection to the other three wi-fi endpoints. Maybe that helps in reviewing the logs, maybe not, but I think it is important that this “optimal” setting is already used to the majority of endpoints, and the problem appears as well when using these all-cabled, all-connected to the same switch endpoints (although less frequently than when we add the wi-fi endpoints and then I am sure the mesh setting contributes negatively to overall stability).

The moment I decided to look for help was exactly when through A/B testing I found that even the “best” hardwired endpoints would struggle sometimes, that DSP would amplify the problems, and it is specifically worse with Tidal. I expected problems with wi-fi, did not expect the endpoints connected to the same switch to be a challenge to the protocol.

connor · August 4, 2025, 9:36pm

Hi @Marcelo_Bulgueroni,

Thank you for providing such clear context. Logs in the last several days have illuminated some more context that may provide a next step.

Both the RAATServer instance and the RoonServer instance on the Mac Mini show instability during low-level TCP sessions. Sockets are closing while RoonServer is still sending data. This is wreaking havoc across connections to endpoints on this network because RoonServer has to reset the connection entirely, sometimes during playback.

The fact that some of these TCP errors occur during upstream cloud handshakes indicates that multicast handling and/or RAAT clock sync might not be the causal factors. Both coud and RAAT TCP connections are getting killed while Roon is still sending them data.

Again, it’s natural to suspect the protocol when facing persistent issues. However, RAAT’s clock synchronization and multi-zone streaming have been rigorously designed and widely proven in complex setups. The consistent one-second dropouts and connection errors you’re seeing are classic signs of network-related interruptions when Roon has to reset a network connection to one of the many active Zones. This is why we’re focusing on the Deco mesh’s handling of multicast and TCP connections.

Please double-check that Cloudflare (1.1.1.1), QuadNine (9.9.9.9), or another reliable DNS server is assigned in the router.

In the background, the Cast and Shairport-based devices are constantly broadcasting their availability. This broadcast/multicast traffic isn’t competing with RAAT but it raises the floor for both a) bandwidth saturation and b) resource constraint across the network, particularly when you have the WiFi-based endpoints involved.

Just for due diligence, please power down all Chromecast devices during the next test or disable their network access temporarily. Please also disable Shairport Sync on all Pis, or turn off the Airplay receivers on any endpoints. This will greatly reduce background noise and traffic on the network and in logs.

Next, perform the same test of playing high-res Tidal content and adding in a single Zone at a time to the group. Please, per our usual cadence at this point, note the name of the track that first dropped out (or the approximate time).

We’re very eager to help see this through and restore a reliable playback setup. Thanks again for your patience.

Marcelo_Bulgueroni · August 11, 2025, 6:55pm

Hi @connor !

Thanks again for the thorough explanation!

I took me some time to be able to effect this test since I would need to manually turn off the chromecasts. I learned as well that ropieeee does not make it easy to turn off shairport so instead I turned all wi-fi pis with ropieee off as well.

I started only with the following endpoins, all connected through ethernet through the same switch, each with a different DSP setting (on purpose to “force” the issue).

The Mac Mini (roon server) coreaudio on exclusive mode;
Rpi4 with ropieee connected to khadas tone pro 2
NUC i7 with Meridian Explorer 2
Raspberry pi 3 headphone out
Raspberry pi 2 headphone out

Played back and forth for a while with no problems.

Then I turned on and joined in the group:
6. Pi zero 2 w with ropiee - wi-fi 5ghz (bedroom)
7. Pi zero 2 w with ropiee - wi-fi 5ghz (kitchen)
8. Pi zero 2 w with ropiee - wi-fi 5ghz (bathroom)

No problems happened.

Decided to turn on all chromecasts again to see if any “noise” would force the issue to appear, again everything smooth.

However when I decided stop testing something interesting happened:
I moved the Zone “1” (Mac Mini coreaudio) out of the group in order to use the audio of the computer. Problems appeared instantly and could be reproduced all the time.

Intrigued, I added the zone “1” to the group again. Problems disappear.

Removing the zone “1” would make the problems appear again, instantly.

For reviewing in the logs:
20250811 - 3h29 pm - skips and stops after removing coreaudio
20250811 - 3h32 pm - plays fine with coreaudio included
20250811 - 3h36 pm - skips and stops after removing coreaudio

It seems the combination of endpoints is playing a major part in the problem. Maybe this could open other line of investigation? Let me know. And thanks again!

vadim · August 20, 2025, 11:56am

Hello @Marcelo_Bulgueroni,

Thank you for your detailed testing and for sharing your observations. We apologize for the delay in our responses.

To help us investigate further, could you please provide a bit more detail:

The exact grouping pattern when the issues are reproducible.
Whether you can reproduce the problem if the Mac Mini is set to non-exclusive mode before removing it from the group.
If possible, please try to reproduce the issue again and provide the exact date and time when it occurs.

This information will help us understand the interaction between endpoints and grouping, and may guide further investigation into the behavior you observed.

Thank you again for your thorough testing and patience!

Marcelo_Bulgueroni · August 25, 2025, 7:59pm

Hello @vadim,

Thanks. Here are the answers:

As per my previous post the group used comprises 1 to 8 of the endpoints described on my post.

I also tested now with a group comprising only of endpoints 1 to 5 to keep out the wi-fi ones.

Yes, just tested, the same issue appears, both with the 1-8 group and 1-5 group. When the coreaudio endpoint is included in the group there are no issues. As soon as the endpoint is not part of the group all issues come back

Today, August 25, at:

04:46 PM (grouped 1-8)

04:47 PM. (grouped 1-8)

04:51 PM - (grouped 1-5)

04:53 PM - (grouped 1-5)

connor · August 29, 2025, 8:12pm

Hi @Marcelo_Bulgueroni,

Thank you for your thorough report and testing.

We’ve examined these timestamps and see a continued pattern of precisely half the generated samples dropping before they reach endpoints. What we don’t see is any evidence of packet loss directly. We’d like to try and zero in on clock sync from here.

Adding the Mac Mini provides a RAATServer instance to serve as the lead clock sync instance. Removing this Zone - after having created the rest of the grouped Zone - forces RAAT to pick a new clock master. RAAT doesn’t always pick the second Zone added to the group chronologically as the next clock lead, but it often will. What happens if you repeat the creation of the group Zone 1-8 (with Mac Mini as the first Zone, Rpi 4 to Khadas Tone Pro 2 as the second), but this time remove both of those Zones, leaving the remaining six?

The intention here resembles that of our earlier effort when we were testing to isolate an “underspec’d” Raspberry Pi with inconclusive results. We want to try to isolate (with your current topology, don’t worry about rewiring at the moment) which Zone is not working well as a clock lead.

If you’d like, you can continue to remove Zones in the order in which they were added, hammering playback at 96Khz with DSP like before.

Let us know the results - we’re eager to ensure we can resolve this issue for you. Thanks again for your patience.

Marcelo_Bulgueroni · September 8, 2025, 1:02pm

Dear Connor,

Testing now and will be back with results. Thanks!

Marcelo_Bulgueroni · September 8, 2025, 2:12pm

Dear @connor

Can’t say it will be of much use, but here are the new tests, always trying to find a group combination without coreaudio that would work to experiment with the clock sync, but I was mostly unsusessful.

Just keeping here for reference:

The Mac Mini (roon server) coreaudio on exclusive mode;
Rpi4 with ropieee connected to khadas tone pro 2
NUC i7 with Meridian Explorer 2
Raspberry pi 3 headphone out
Raspberry pi 2 headphone out
Pi zero 2 w with ropiee - wi-fi 5ghz (bedroom)
Pi zero 2 w with ropiee - wi-fi 5ghz (kitchen)
Pi zero 2 w with ropiee - wi-fi 5ghz (bathroom)

Tests conducted TODAY, 09.08.2025, times are Brazilian Time.

I will put the “order” in which I added the zones to the group

Group 3-2-4 : problems 10:05 / 10:06

As soon as I add the zone 1 problems stop

Group 3 - 2 - 4, freshly formed group - no problems

Added all the other zones except 1: problem repeats at 10h20

Went back to 3 - 2 - 4 - problem now happened at 10h21 AM and 10h23 AM - less errors though

Adding Zone 1 (core audio) instantly solves problems

Removing Zone 1 and keeping 3 - 2 - 4: problem at 10h25 AM

Group 8 - 7 - 6 (wi-fi only!) - worked fine at 96 hz with 3s crossfade and audio leveling

Added convolution equalizer to zone 8 - problem appears at 10h32 AM

Removed convolution from zone 8 - plays fine

Added zone 1 (thus 8 - 7 - 6 - 1): plays fine (expectedly)

Removed zone 1 (8 - 7 - 6) - still plays fine

Added zone 2 (8 - 7 - 6 - 2) - problem at 10h40 AM

Turned off convolution equalizer on zone 2 (keeping only 96 khz upsampling for all zones): played fine

Added zone 3 (8 - 7 - 6 - 2 - 3) - all at 96khz - problem appeared at 10h48 (it is much less frequent when DSP is “lighter”)

Adding other zones (8 - 7 - 6 - 2 - 3 - 1 - 4 - 5) and enabling convolution for zones 1, 2 and 3: consistent problem at 10h58 AM, 10h59 AM

This with core audio included

Removing convolution from all zones and keeping upsampling at 96khz: seems to play fine A LOT of times, but ended having the same problem at 11:06 AM

This shows that even having core audio among the zones has its limits…