Android Roon Remote looses connection to Core (daily)

It is interesting. I"m on the other side of things, fortunately. I’ve never had an issue connecting my Note 3. Nor any other of my Galaxy Notes (4, 7, 8) or S6s, S7s, S8s, etc.

I have run the core on Windows, Rock, and about 8 different Linux distros when the Linux version was released, and never had an issue with Android connections.

I think I still have the hard drive with Ubuntu and RoonCore setup. Maybe I’ll spin that up and see if it is still provides a stable connection to my Android phones after I update it to the latest version.

Maybe a comparison between a setup which fails daily and one which never fails would have some value.

Hi Rugby,

oh yes, that’d be quite interesting.

My two Android devices (with consistently reproducible problems) are a Blackberry Priv and a Samsung Note 3 (SM-N9005 ie. international GSM version).

-Chris

Hello @CRo,

Thank you for sharing your results so far and apologies for the delay in getting back to you here. I spoke to the dev team again regarding this issue and in the past we have narrowed it down to a setting on managed switches.

To confirm that this is actually the switch and not some other external variable here, would you by any chance have an umanaged switch you can put in place of the managed one, keeping all of the other connections the same and see if that helps resolves the issue?

I understand that you intend to use the managed switch at the end of this testing but this will be a great data point to have. If you don’t have an unmanaged switch, then we can try connecting the AP directly to the Linux server as you suggested but logistically, the best test here would be to have everything remain constant and just have the managed switch be replaced by an unmanaged one.

If you are able to perform these tests, please do let us know your results. I appreciate you looking into this issue and hope that can make some progress here that will benefit Roon users with similar issues in the future.

Thanks,
Noris

Hello @Noris,

Yes, I can get an unmanaged Ethernet switch from a colleague next week, and will try as you suggest.

In the mean time, I have created a second infrastructure setup, and am testing the functionality of Roon core/bridge/remotes without changing any of the networking stack.

Setup #1			Setup #2
[CentOS 7 - Roon Core]		[Centos 7 [Win 10 Pro 1709 - Roon Core]]
(GigE)				(GigE)
[Cisco SG300 Switch]		[Cisco SG300 Switch]
(GigE)				(GigE)
[TPLink EAP 245 AP]		[TPLink EAP 245 AP]
(Wifi N)			(Wifi N)
[Android - Roon remote]		[Android - Roon remote]

What you see deliniated above are two setups that are identical, except for the following point:
In setup #1, Roon Core is running directly on CentOS 7.
In setup #2, Roon Core is running on a Windows 10 virtual machine (KVM) running on CentOS 7.

This is a nice comparison, because as far as networking goes, it is the identical, same setup used in both cases. Also the CentOS server is the same, there literally are no configuration differences whatsoever. The Android remotes are also the identical units.

The result:

  • As already exhaustively documented, in setup #1, the Android remotes are unable to connect to the Roon core about 12 hours after start of the core service. They cannot connect again, until the Roon core service is restarted.

  • In setup #2, the Android remotes function perfectly, and are able to connect to the Roon core anytime, even after the Android device has been turned off and turned back on more than a day after the start of the Roon core service.

Perhaps the implementation of multicast in the Linux Roon core has some issues?

-Chris

That’s always been my suspicion, actually. Some weird interaction between the Linux network stack, Roon’s use of multicast (maybe something involving Mono?), and some network switches.

Hello @Noris,

it took a while getting a non-smart gigabit switch. :wink:

I have switched out the Cisco SG300 switch with a Netgear GS108 (a classic, blue metal case, these days they go for $22).

The test setup is identical to Setup #1 (with a Cisco SG300 fully-managed switch, as documented above in this thread) except for the dumb switch.

Setup #3
[CentOS 7 - Roon Core]	
(GigE)
[Netgear GS108 Switch (non-managed)]
(GigE)
[TPLink EAP 245 AP]
(Wifi N)
[Android - Roon remote]

The results are the same as with Setup #1, namely:

  • The Android Roon clients are unable to connect to the core after a few hours.
  • Restarting the Roon core service allows the Android remotes to connect again for a few hours, until the fail to connect again.

-Chris

Hi @CRo,

Thanks for performing that test and confirming that, these results do seem very interesting!

Previously we have been able to narrow it down to IGMP snooping on managed switches/routers but maybe this is not actually the root cause and just something that mitigates the issue.

In all of your setups that you have provided, I do not see a router model/manufacturer listed. Can you let me know what you are using or have you made the CentOS itself the router?

It looks like we’ll need to narrow the setup down even further to get closer to the root cause, I have two follow up suggestions:

  1. You mentioned that it would be possible to remove the TPLink AP out of the equation by using a USB -> Ethernet adapter for the Android device, if you are still willing to give this a try it would be interesting to see if bypassing the AP has any effect.

  2. If you are using the CentOS to manage the routing, I am wondering if there is any change if you temporarily use a standard consumer-grade router instead. Would you perhaps have a regular router around the house and can verify this aspect?

I think we’re definitely making some progress here, I look forward to hearing your reply!

Thanks,
Noris

Hi @Noris,

the router is pfsense 2.4.4-RELEASE-p2 (AMD64) running in a KVM VM running on the same CentOS 7.6 Server where the Linux Roon core runs (and currently also the Windows 10 VM with the -other- Roon core also on it.)

Since the entire LAN, Roon core, AP, Androids, remotes, and endpoints are in the same subnet, is the router important? Yes to contact your (Roon Labs) servers, but if we leave the internet connection out of it, what does the router do on the local network? It does DNS and DHCP, but it does not route.

In fact, the ARP table on the CentOS Server should prevent any packets destined between Roon core and the other Roon devices from ever being sent to the router.

For testing, I am trying to think of a way of not using pfsense and instead using a “commodity” appliance without completely destroying my LAN and internet connection in the process.

I have a half-modern fritz!box 7390 (Router / Switch / DSL Modem / Wireless AP / DLNA Server / DECT base-station / VOIP Client all-in-one consumer “internet appliance”) laying around, but I’m afraid you’ll tell me that that is too smart of a device!! I am only half kidding!!! :grimacing:

-Chris

PS. Since I had opened my mouth and suggested trying an USB-ethernet device, fair enough then, I ordered a “Plugable USB 2.0 OTG Mico Typ-B 10/100 Ethernet Adapter” and hope that one of the Android devices in the house will work with it. We shall see. :wink:

Hello @noris,

On we go testing Roon core connectivity in fun ways whilst selectively removing/changing single components of the infrastructure.

In setup #4, ROCK is running in a virtual machine (KVM) running on CentOS 7.6.
The rest of the infrastructure is identical to Setup #1 and #2 (detailed previously in the thread).

Setup #4
[Centos 7 [ROCK])
(GigE)
[Cisco SG300 Switch]
(GigE)
[TPLink EAP 245 AP]
(Wifi N)
[Android - Roon remote]			

The result:

  • In setup #4 (same as in Setup #2), the Android remotes function perfectly, and are able to connect to the Roon core anytime, even after the Android device has been turned off and on many hours after the start of ROCK.

-Chris

Hi @CRo,

Thanks for confirming the type of router you are using and for trying ROCK on the same machine.
Hmm, so everything works as expected when Roon is not communicating directly with the CentOS and you are still using the managed switch?

If so, that is most intriguing and would indicate that there is a setting on the CentOS itself which could be blocking this communication. I’m going to bring your findings to the team at our next weekly sync meeting and let them know this information, will let you know what they say. If the Ethernet adapter for your Android phone arrives, it would be interesting to note the behavior on the CentOS with and without a VM.

I wonder if there could be a 12 hour lease on some multicast functions imposed by linux itself. I have also found this article which may provide some more clues here since it seems that Hythim’s comment is very similar to the situation you are experiencing. Would any of his suggestion’s or the suggestions listed on that thread be applicable to your CentOS setup?

Thanks,
Noris

I’m also using CISCO SG300 series switches (IGMP snooping not active, multiple VLANs) and PfSense as router for my network. Wireless Access Points are CISCO WAP371. Core is running on a QNAP TBS-453a (Linux) behind a MikroTik hAPac configured as WLAN-bridge. Therefor, as far as the communication between Core and working as well as non working controllers go, they are all connected to the same AP and WLAN. Doesn’t this rule out the Router (PfSense) and Switches (CISCO SG300 series) from the equation?

Suffering from the endless “Searching for Roon Core” problem

  • Lenovo P2 (P2a42) phone running Android 7.0
  • Lenovo Yoga Tab3 Pro (YT3-X90F) tablet running Android 6.0.1

Connects fine every time I use it

  • Samsung Galaxy Tab S2 (SM-T810) running Android 7.0

If that’s the case, shouldn’t my SAMSUNG device not also be affected or do you talking about the Android Linux kernel?

Interestingly my Lenovo devices show up in Roon (on my PC) under Settings > Audio as long as the Roon app is running on them (showing the “Searching for Roon Core” message). If I activate and configure them from the PC, I can even play music on them. Looks like the Bridge part of Roon is working.

grafik

Hello @Noris,

I found the problem, and the solution. :smile:
tl;dr at the bottom of this post.

After looking at all that has been discussed here, and looking for commonalities in the test setups, I find the following points:

  • Roon core running natively on Linux (CentOS 7.6) looses connectivity to / is not reachable from Android Roon remotes / endpoints connected via Wifi, after some number of hours (far less than a day) of successful functioning. A restart of the Roon core (not the entire Linux OS) is required to get the Androids to function again.
  • Roon core running in a VM (ROCK, Windows 10) in a KVM VM on the same CentOS server communicates fine, with no failures with Android Roon remotes/endpoints connected via Wifi.
  • Roon uses multicast communication
  • IGMP snooping (on or off) on the switch does not change the outcome. A dumb vs. smart switch does not change the outcome.

I wanted to test another VM, running DietPi for example, but then I thought - it’ll probably work fine, again. I asked myself why I think that.

What’s the different between an application in a KVM VM running on Linux, and that application running natively on the machine hosting the KVM VMs?

  • The VMs and the host Linux system are using the same ethernet interface (NIC).
  • The NIC is shared with the VMs via a bridge (in the Linux OS)

So I looked at the bridge. A bridge passes along everything, it’s not a router. A bridge is like a switch in that sense. Does Linux mess with IGMP or anything in a bridge? No.

Any further differences?

  • The VMs (on the bridge) bypass the Linux firewall.

So I looked at the firewall rules.

My active firewall rules (stated in the very first post of this thread) conform to the Roon documentation. So, perhaps there is something that has as yet not been taken into account in the firewall documentation.

BINGO.

tl;dr:
It turns out Linux IPTables firewalls have to be explicitly configured to pass along IGMP.

The following is my (current) iptables configuration as pertains to Roon. In CentOS (and RHEL) this configuration is located in /etc/sysconfig/iptables :

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
## Defaults ##
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
## IGMP ##
-A INPUT -s 224.0.0.0/4 -j ACCEPT
-A INPUT -d 224.0.0.0/4 -j ACCEPT
-A INPUT -s 240.0.0.0/5 -j ACCEPT
-A INPUT -m pkttype --pkt-type multicast -j ACCEPT
-A INPUT -m pkttype --pkt-type broadcast -j ACCEPT
## Roon ##
# Core
-A INPUT -s 192.168.0.0/24 -p udp --dport 9003 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -p tcp --match multiport --dports 9100:9200 -j ACCEPT
# Web Controller
-A INPUT -s 192.168.0.0/24 -p tcp --dport 8080 -j ACCEPT

The IGMP and Broadcast rules (in the ## IGMP ## block) are new (to my setup). The previously existing Roon rules (## Roon ##) are directly based on the Roon ports documentation.

Result: using Setup #1 (documented in this thread), and the addition of the IGMP rules added to IPtables on the CentOS Server, the Android Roon remotes/endpoints are able to function properly, even after the Roon core service on the CentOS Server has been running for many hours (a day) since last being restarted.

I have tested this for one day now on three Android phones of various models (also previously documented in this thread). In this time all three Android devices have previously al been unable to connect to the Roon core with Setup #1, before the addition of the IGMP iptables runles.

Further points for me to look at:

  • Longtime performance of the setup (we’ll see in a few more days / weeks if it holds up)
  • The Firewall is set to accept 224.0.0.0/4 - well, that’s a lot. I wish to test if a subset will work to not allow everything. This firewall ist NATed and firewalled separately to the internet (via pfsense), so fine, the entire world can’t com knocking anyway, but still, let’s see if this is necessary. :wink:

-Chris

2 Likes

@noris,

A status update: 48 hours since core restart and the Android endpoints / controllers are still connecting fine to the core.

-Chris

@Noris,

it’s been three days since the last Linux Roon core restart, and the Android remotes / endpoints are continuing to connect fine.

The Roon documentation currently does not treat IGMP and Broadcast as something that needs to be addressed in the local firewall configuration, only that it’s possibly an issue on the switch.

At this point, IGMP snooping is ON on my switch, and I don’t think it is relevant on or off. The point is that the Linux firewall natively blocks IGMP and broadcast (rightly so.)

I would suggest that perhaps you add to the Roon Linux documentation that the Roon core must be reachable not just via UDP 9003 and TCP 9100 - 9200 ports, but multicast and broadcast communication must also be allowed in the firewall.

Most Linux distributions use either iptables or firewalld (Red Hat Enterprise Linux and CentOS have iptables as default up to Version 6, as of RHEL 7 / CentOS 7 firewalld is started by default, iptables is also fully supported.)

iptables is the “classic” firewall daemon of Linux and IMHO much easier to configure if you haven’t wrapped your head around firewalld. I use iptables with CentOS 7. Hence here again, distilled down to the point, are the Linux iptables entries that are required for Roon to work properly:

## IGMP / Broadcast - required by Roon ##
-A INPUT -s 224.0.0.0/4 -j ACCEPT
-A INPUT -d 224.0.0.0/4 -j ACCEPT
-A INPUT -s 240.0.0.0/5 -j ACCEPT
-A INPUT -m pkttype --pkt-type multicast -j ACCEPT
-A INPUT -m pkttype --pkt-type broadcast -j ACCEPT

## Roon ##
# Core
-A INPUT -s [Your Subnet]/[Subnet Mask] -p udp --dport 9003 -j ACCEPT
-A INPUT -s [Your Subnet]/[Subnet Mask] -p tcp --match multiport --dports 9100:9200 -j ACCEPT

[Your Subnet] would typically be for example 192.168.0.0 and [Subnet mask] is typically 24, ergo the UDP and TCP port entries would be for 192.168.0.0/24 .

Even if you don’t want to get into the details of how to enable the required ports because they are dependant on the user’s subnet, whatever that may be, the IGMP and broadcast entries are essentially universal; IGMP is not specific to any local network config, and can be documented for everyone.

If you want I can also make some firewalld entries, then just about every modern Linux distribution is covered. We could even cover IPv6 (but seriously, everything about Roon networking is based on the assumption of IPv4 networking, which is obviously the only sane thing to do at home.)

-Chris

9 Likes

Hello @CRo,

Thanks for the update, this was some very impressive sleuthing and I’m glad that we were able to get to the bottom of this issue!

I have a feeling that this thread will be helpful for others having similar issues, and the firewall settings you mentioned will be a great reference to have.

I discussed your findings with our technical team, and our QA team is going to look into these distinctions a bit more and update our documentation as needed.

Please keep me in the loop for any new updates you have to share and thanks again for your work on this thread, great job!

Thanks,
Noris

The Poirot of roon.

Thank you Noris and Ged_Hickman. =)

At work we have the network folks, and those are split off into the firewall folks, and then we have the Unix folks like me. And although I asked a network colleague about IGMP, and also talked to a firewall guy, that didn’t initially help much because these guys don’t mess with (wouldn’t stoop down to) OS-based firewalling; they work with Cisco Nexus switches and Juniper firewalls in an enterprise environment.

And I’m the Linux guy, and although I know the rudiments of networking (as the network folks make clear to me every time I deal with them), I rely on them to make the LAN and WAN and the DMZs work us.

So no, I didn’t have a clue that iptables blocks IGMP, and neither did my networking folks.

Now we know. :wink:

-Chris

Ah the UNIX guy in my day was the chap with the socks and sandals and the tooled leather holster for the mobile phone :peace_symbol:

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.