I found the problem, and the solution.
tl;dr at the bottom of this post.
After looking at all that has been discussed here, and looking for commonalities in the test setups, I find the following points:
- Roon core running natively on Linux (CentOS 7.6) looses connectivity to / is not reachable from Android Roon remotes / endpoints connected via Wifi, after some number of hours (far less than a day) of successful functioning. A restart of the Roon core (not the entire Linux OS) is required to get the Androids to function again.
- Roon core running in a VM (ROCK, Windows 10) in a KVM VM on the same CentOS server communicates fine, with no failures with Android Roon remotes/endpoints connected via Wifi.
- Roon uses multicast communication
- IGMP snooping (on or off) on the switch does not change the outcome. A dumb vs. smart switch does not change the outcome.
I wanted to test another VM, running DietPi for example, but then I thought - it’ll probably work fine, again. I asked myself why I think that.
What’s the different between an application in a KVM VM running on Linux, and that application running natively on the machine hosting the KVM VMs?
- The VMs and the host Linux system are using the same ethernet interface (NIC).
- The NIC is shared with the VMs via a bridge (in the Linux OS)
So I looked at the bridge. A bridge passes along everything, it’s not a router. A bridge is like a switch in that sense. Does Linux mess with IGMP or anything in a bridge? No.
Any further differences?
- The VMs (on the bridge) bypass the Linux firewall.
So I looked at the firewall rules.
My active firewall rules (stated in the very first post of this thread) conform to the Roon documentation. So, perhaps there is something that has as yet not been taken into account in the firewall documentation.
It turns out Linux IPTables firewalls have to be explicitly configured to pass along IGMP.
The following is my (current) iptables configuration as pertains to Roon. In CentOS (and RHEL) this configuration is located in /etc/sysconfig/iptables :
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
## Defaults ##
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
## IGMP ##
-A INPUT -s 22.214.171.124/4 -j ACCEPT
-A INPUT -d 126.96.36.199/4 -j ACCEPT
-A INPUT -s 240.0.0.0/5 -j ACCEPT
-A INPUT -m pkttype --pkt-type multicast -j ACCEPT
-A INPUT -m pkttype --pkt-type broadcast -j ACCEPT
## Roon ##
-A INPUT -s 192.168.0.0/24 -p udp --dport 9003 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -p tcp --match multiport --dports 9100:9200 -j ACCEPT
# Web Controller
-A INPUT -s 192.168.0.0/24 -p tcp --dport 8080 -j ACCEPT
The IGMP and Broadcast rules (in the ## IGMP ## block) are new (to my setup). The previously existing Roon rules (## Roon ##) are directly based on the Roon ports documentation.
Result: using Setup #1 (documented in this thread), and the addition of the IGMP rules added to IPtables on the CentOS Server, the Android Roon remotes/endpoints are able to function properly, even after the Roon core service on the CentOS Server has been running for many hours (a day) since last being restarted.
I have tested this for one day now on three Android phones of various models (also previously documented in this thread). In this time all three Android devices have previously al been unable to connect to the Roon core with Setup #1, before the addition of the IGMP iptables runles.
Further points for me to look at:
- Longtime performance of the setup (we’ll see in a few more days / weeks if it holds up)
- The Firewall is set to accept 188.8.131.52/4 - well, that’s a lot. I wish to test if a subset will work to not allow everything. This firewall ist NATed and firewalled separately to the internet (via pfsense), so fine, the entire world can’t com knocking anyway, but still, let’s see if this is necessary.