Frequent Connectivity Problems with Roon-Titan during Playback (ref#HXIQSD)

Hello,

one important addition: The initial request started with a different issue and behavior.
→ The Roon-Nucleus server disconnected from the network - an occurrence I have observed again today, with Wifi running stable on all my other devices as well as both escape speakers disconnected from group, and both emm Labs NS1 playing simultaneously. Message cont find/connect to Roon server appears and after a few minutes it will reconnect and I can start the play music again from the point it was disconnected.

I would like to go back and solve this problem first. Please also see the beginning of the thread where I have explained it in more detail.

I have checked the Logs; unfortunately there is no log created when the Nucleus gets disconnected as it happens just this morning. All logs are from yesterday, where there was the issue with the Escape.

The behavior of the stopping and skipping of songs related to the escapeP6 is different. The Nucleus remains connected to the networks, and it’s a queue interruption/skipping. It adds to the frustration but this might be related to an issue of the EscapeP6. I know that if you turn off and disconnect one of the speakers from the group, it stops playing and so on.

Thanks for getting back to me on the first and initial problem. All my computers are connected stable to my network with the exception of the Titan (a very expensive server), which disconnects randomly, however frequently and reconnects again by itself within a few minutes.

Regards
Herbert

Hey @Herbert_Zwartkruis,

We’d be happy to focus on the initial issue of your Nucleus disconnecting from your network.

As a next step, could you please navigate to your webUI and perform a RoonOS reinstall?

Here’s more info on how to access the webUI:

Let me know if you run into additional disconnects afterwards. Thank you! :raised_hands:

Hello Benjamin,

This is the trouble shooting 1&1 I have already tried a few times: checking for updates, software re-install (at least 2 times) as well as rebooting multiple times. This hasn’t solved the problem and the reason for seeking expert help. Another repetition I do not see as meaningful therefore please advise what we/you can do next to further trouble shoot ?

Thanks Herbert

Hi @Herbert_Zwartkruis,

Just a quick note—since this wasn’t mentioned in any of your previous replies, we need to go through all troubleshooting steps, even if they’ve been tested already. If you’ve tried something before, please let us know so we can avoid repeating steps unnecessarily. :slightly_smiling_face:

Based on your report, you’ve only mentioned two remote devices - an iPhone and iPad, do you have any other devices you can test out as a remote device? Have you removed and reinstalled Roon on both mobile devices?

Are you running any other background application when connection drops? We’ve yet to connect and enable diagnosicts on either device, could you please reproduce the issue, share the date/time or track name, and then please use the directions found here and send over a set of Roon logs from either your iPhone or iPad to our File Uploader?

From a recent Nuclues diagnostic report, and see the disconnects from the iOS device - it’s not looking like the Nucleus itself is disconnecting from your Netowrk, but more so, the movie device from your Nucleus - here’s an example:

Info: [.NET ThreadPool Worker] [remoting/serverconnectionv2] Client disconnected: 10.0.0.131:62064
Warn: [Broker:Misc] [remoting/brokerserver] [initconn 10.0.0.131:62081=>10.0.0.48:9332] failed: System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Sooloos.Broker.Distributed.Extensions.Cancelify[T](Task`1 task, CancellationToken ct)
   at Sooloos.Broker.Distributed.InitConnectionV2.Go()

Do you by chance have any low power settings enabled on the mobile device?

This could also be occurring due to the nature of your mesh system. If using mesh Wi-Fi, ensure devices aren’t switching between nodes, which can cause brief disconnections.

Perhaps checking to see if you can set a static IP for your mobile devices. Let me know if any of the above help! :+1:

Hello and thanks for now looking into this.

Also just a quick note from my side; the initial request is more or less a month old. This is the most advance response I have seen and finally are going through some the basic trouble shooting steps, which in essence is trying to exclude other sources of error. As I am an engineer and are working in production/operations this is also what I try to do first before I start asking for help…I hope that we can now stay current and on top of the trouble-shooting.

Mobile devices; We have multiple in operation in our household and which 4 of them have the Roon app installed to operate. 2 iPhones and 2 iPads. In good engineering, we keep them up to date with Software versions and app-updates.

In case the message “Waiting for Roon Server” shows up I have cross checked on all of the mobile devices with the same result. None of them during this brief period is able to connect with the Titan-server. I even have tried to connect on my hard wired Mac-Book, also with the same result. For good housekeeping I will delete and reinstall the apps on the various devices I have, however with the pattern above, I don’t think this will make a difference. Hard to believe that all devices go sour the same time.

If a disconnect from one iPhone to the network (or changing nodes of one iPhone) leads to a complete disconnect of the Titan to all other devices, and simply doesn’t respond to anything until it has recovered itself after 3-5 min, isn’t that strange? This is what I honestly cannot understand and would find strange to point to an iPhone or the like.

I can tell for sure that on our mini iPad I don’t run other applications in the back-ground as this is mostly used for music selection and listening only. I can’t tell for sure if not at times something else is running in the background of my iPhone, it is mostly limited as I do not like a crowded device. Low power mode setting I do not use.

I also will reach out to Netgear again and see if they can help. From all the exclusions I have been doing, it’s either the Roon-Titan or the Netgear. What leads me to the Roon is that during the unresponsive time of the Roon-Titan all other devices I have in my household are still running.

I am happy to check other “exclusions” however hope that we can focus on something.

Finally: I cannot “reproduce” the issue. I only can wait until it happens (unpredictable and in its frequency changing) and then either have the time or make the time to further look into this. I have spend already many hours on this subject. And the reason why I am not as patient anymore.

Anyway, I do not seem to have another option to wait and be able to capture the logs which may lead hopefully to more fruitful results and can stop the “ping-pong”. Technology is beautiful, if it works…

Regards
Herbert

Hi @Herbert_Zwartkruis,

We understand that this has been an ongoing issue, and we truly appreciate your patience. That said, regardless of when the thread was started, we aren’t mind readers :slightly_smiling_face: — we can only work with the information provided to us. If troubleshooting steps aren’t mentioned, we have no way of knowing what’s already been tried, which is why we ask users to clearly share what they’ve tested. Thank you for your future cooperation in providing more details around troubleshooting attempt already tested.

All we need is a more specific date and time - nothing else, no need to look further into things on your end.

Given that all devices experience the same issue concurrently, it is unlikely that reinstalling the Roon app on each device will resolve the problem. However, ensuring that all apps and devices are up to date is always a good practice.

Roon relies on multicast for device discovery and communication. Some mesh systems don’t handle multicast traffic well or isolate devices across different nodes, making it hard for Roon components to communicate.

As another next step in troubleshooting, if you haven’t already tested this, would be to review any possible QoS settings within your Orbi router, and set some additional network priority to your Nucleus.

We’ll be monitoring for your reply, thank you! :+1:

Hello Benjamin,

I am not a mind reader either and not an expert by any means. That’s why this trouble shooting feels like pulling teeth. Btw no different than my experience with Netgear. They finger-pointed to Roon and my other network devices and now this goes on vice versa with Roon. If you do this for now in my case, over 6 month, its getting frustrating. But it is what it is in this industry as it seems. And buying the premium products (Netgear, Roon, Apple and so on) in the assumption that this cause less problems and are more reliable doesn’t seem to be a successful strategy……

No I haven’t checked the QoS settings. Dr Google unfortunately is unable to help me here. It isn’t available to change, neither direct in advance settings, nor in the “hidden version” - I would have to contact Netgear, if this is even possible and if so, how to change. The manual doesn’t have a chapter on it. If I understand this well, from my read, it seems to have a self optimizing algorithm.

Any other tricks in the box, which I could try or look after?

Regards
Herbert

Update
Netgear confirmed: no QoS setting options on the 971 router

Hi @Herbert_Zwartkruis ,

I noticed that you mentioned you are an engineer, so I will go ahead and provide you all the technical details that we see in the logs based on the timestamps you provided in the past.

The logs for this event shows as follows:

02/27 23:40:56 Trace: [RaatSender] [EscapeP6-2] [Enhanced, 24/96 QOBUZ FLAC => 24/96] [100% buf] [PLAYING @ 1:15/4:34] If You Love Me - Melody Gardot / Cliff Masterson / Royal Philharmonic Orchestra / Dadi
02/27 23:41:00 Warn: [Worker (2)] [LivingRoom + MBedRoom + EscapeP6-1 + EscapeP6-2] [zoneplayer/raat] long rtt sync Escape P6Air (): realtime=4215197632667 rtt=146500us offset=-6777446367us delta=-124235us drift=135231us in 81.0735s (1668.009ppm, 6004.833ms/hr)
02/27 23:41:00 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.172:41696] [raatclient] GOT [888] {"samples":48000,"status":"Dropout"}
02/27 23:41:00 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.172:41696] [raatclient] GOT [888] {"samples":47400,"status":"Dropout"}
02/27 23:41:00 Warn: [Worker (3)] [LivingRoom + MBedRoom + EscapeP6-1 + EscapeP6-2] [zoneplayer/raat] Too many dropouts (>3s dropped out in the last 30s). Killing stream
02/27 23:41:00 Trace: [Worker (3)] [LivingRoom + MBedRoom + EscapeP6-1 + EscapeP6-2] [zoneplayer/raat] too many dropouts. stopping stream
02/27 23:41:00 Debug: [Worker (3)] FTMSI-B closed file for qo/323FA260; open files:0
02/27 23:41:00 Debug: [Worker (3)] FTMSI-B qo/323FA260 download status: AllBlocksDownloaded accessTimeout:True openFiles:0 prev:(AllBlocksDownloaded,True,1)
02/27 23:41:00 Debug: [Worker (3)] FTMSI-B closed file for qo/1FB50BE6; open files:0
02/27 23:41:00 Info: [Worker (6)] [audio/env] [zoneplayer -> stream] All streams were disposed
02/27 23:41:00 Debug: [Worker (3)] FTMSI-B qo/1FB50BE6 download status: AllBlocksDownloaded accessTimeout:False openFiles:0 prev:(AllBlocksDownloaded,False,1)
02/27 23:41:00 Trace: [Worker (1)] [LivingRoom + MBedRoom + EscapeP6-1 + EscapeP6-2] [zoneplayer/raat] Endpoint EMM NS1 State Changed: Playing => Prepared
02/27 23:41:00 Trace: [Worker (1)] [EMM Labs NS1 @ 10.0.0.100:47671] [raatclient] SENT [910]{"request":"end_stream"}
02/27 23:41:00 Warn: [Broker:Transport] [zone LivingRoom + EMM NS1 + Escape P6Air () + Escape P6Air ()] Track Stopped Due to Slow Media
02/27 23:41:00 Info: [Worker (7)] [audio/env] [zoneplayer -> stream -> endpoint] All streams were disposed
02/27 23:41:00 Info: [Worker (12)] [audio/env] [zoneplayer -> stream -> endpoint] All streams were disposed

As you can see here, the buffer was at 100%, meaning the track was downloaded properly to your Nucleus, but the dropout occurred at the Escape bv P6Air () @ 10.0.0.172:41696 endpoint, the Nucleus was unable to get the stream sent to the endpoint on time.

A case of you shows a similar pattern:

02/27 23:41:00 Trace: [Broker:Transport] [EscapeP6-1] [Enhanced, 24/96 QOBUZ FLAC => 24/96] [100% buf] [LOADING @ 0:00] A Case Of You (Live) - Diana Krall / Joni Mitchell
02/27 23:41:01 Trace: [Broker:Transport] [EscapeP6-2] [Enhanced, 24/96 QOBUZ FLAC => 24/96] [100% buf] [LOADING @ 0:00] A Case Of You (Live) - Diana Krall / Joni Mitchell
02/27 23:41:02 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.172:41696] [raatclient] GOT [888] {"samples":48000,"status":"Dropout"}
02/27 23:41:02 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.172:41696] [raatclient] GOT [888] {"samples":48600,"status":"Dropout"}
02/27 23:41:02 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.172:41696] [raatclient] GOT [888] {"samples":48000,"status":"Dropout"}
02/27 23:41:02 Trace: [Worker (3)] [LivingRoom + MBedRoom + EscapeP6-1 + EscapeP6-2] [zoneplayer/raat] sync Escape P6Air () -> Escape P6Air () result: Success
02/27 23:41:02 Debug: [.NET ThreadPool Worker] [easyhttp] [10671] GET to https://www.qobuz.com/api.json/0.2/track/getFileUrl?format_id=27&intent=stream&request_sig=27099c3b905b9004e9b2afe462843653&request_ts=1740699661&track_id=641946 returned after 246 ms, status code: 200, request body size: 0 B
02/27 23:41:02 Trace: [.NET ThreadPool Worker] [Escape bv P6Air () @ 10.0.0.156:45535] [raatclient] GOT [906] {"status":"Ended"}

100% buffer, but not able to get the stream sent to the endpoint.

Looking over this timestamp, I noticed quite a few networking errors right around your timestamp (the below is in UTC):

02/06 14:47:15 Warn: [Broker:Media] [music/slurp] Failed to get update_timestamp: 400
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>400 Bad Request</title>
 </head>
 <body>
  <h1>400 Bad Request</h1>
 </body>
</html>

02/06 14:52:48 Warn: [.NET ThreadPool Worker] [easyhttp] [37693] POST https://api.dropboxapi.com/2/files/get_metadata check network configuration: socketerr (HostUnreachable): No route to host (192.168.1.254:443)
02/06 14:52:48 Debug: [Broker:Misc] [dropbox] Received status: 999, url:https://api.dropboxapi.com/2/files/get_metadata, method:POST, body:System.Net.WebException: No route to host (192.168.1.254:443)

Further looking over your Nucleus system logs, I can see that the Ethernet keeps going up/down, I suspect this is also when you are having issues (keep in mind these logs are in UTC):

    Line  769: Mar  8 16:48:10 (none) daemon.info ifplugd(eth0): link is down
	Line  771: Mar  8 16:48:10 (none) user.notice network/watch.sh: ifplugd(eth0): link is down
	Line 4599: Mar  9 05:31:46 (none) daemon.info ifplugd(eth0): link is down
	Line 4600: Mar  9 05:31:46 (none) user.notice network/watch.sh: ifplugd(eth0): link is down
	Line 6168: Mar 12 11:56:03 (none) daemon.info ifplugd(eth0): link is down
	Line 6169: Mar 12 11:56:03 (none) user.notice network/watch.sh: ifplugd(eth0): link is down
	Line 7197: Mar 14 03:39:35 (none) daemon.info ifplugd(eth0): link is down
	Line 7198: Mar 14 03:39:35 (none) user.notice network/watch.sh: ifplugd(eth0): link is down

To sum it up, it looks like the issue is on the internal network from my point of view. The Nucleus’ Ethernet connection to the router keeps disconnecting, and while the track is generally downloaded ok, it has issues sending it to the 10.0.0.172 Escape P6Air endpoint.

I see no evidence of anything going wrong on the Nucleus side, all appears to be working as expected and I suspect if you check your Orbi system log, you’ll also notice some errors reported there for those times.

Hello Noris,

Thanks, finally we are getting somewhere!

This is useful information and I appreciate getting some facts (although I do not understand everything as chemical engineer and operations manager), nevertheless something I will be able to share with Netgear. Hopefully this breaks some ice and we can get to the bottom of this and away from ping-pong.

The EscapeP6 itself seems to be a trouble child. I did notice too with some other own testing and observations. Another trouble shooting exchange I am truly are not looking forward to. But maybe this company surprises me with a different response then my past experiences. Let’s see…

I’ll keep you posted on the response and hopefully a resolution Netgear can offer to me.

Thanks again
Herbert

Hi @Herbert_Zwartkruis ,

Happy to help! Do keep us informed on how your dealings with Netgear/Escape goes, hopefully you’ll be able to find the root cause of the issue and address it!