Not sure where to post this so please move it to wherever it is appropriate.
I wanted to summarize an issue I ran into and how I solved it. It presented as a “Roon Problem” but was ultimately a wireless network issue. It’s important to know that, like many problems reported on these forums, Roon was the only application having problems. Everything else was operating as expected including streaming video and the generally touted “if X is working fine it cannot be my network” statements.
Problems: Had a single Fire 10 tablet / remote that would disconnect from Roon every 30-90 seconds. It would re-connect automatically. The progress slider would pause / jump. Buttons were unresponsive at times and then it would rapidly catch-up only to become unresponsive again. Roon extensions were regularly crashing (ran from a RasPi hardwired so this was weird and pointed away from this being a wireless network issue). Other remotes were not having issues. Uniqueness of this remote was that it is in my Office. Other remotes in other parts of the house behaved “better”.
Troubleshooting: Rebooted everything multiple times including all wireless APs. Upgraded extensions. Upgraded diet-pi. Upgraded everything else that was able to be upgraded. Minimal change in behavior. Now it would disconnect / reconnect every 90-120 seconds. Extensions still unstable.
OK, time to look at some hard data:
Starting right on the 3rd there is a spike across all error counts. The particularly disturbing one is retries. Retries indicate a RF or signal strength issue. Since the physical distances between my devices did not change it pointed to something within the RF that changed. This kind of issue can quickly pop-up if the spectrum gets congested. Your neighbors can cause this by occupying the same channels that you’re currently occupying. I have no idea what my neighbors did but I blame them. One other thing to note was the increase in “attempts” does coordinate with the increase in errors. Sometimes an increase in errors can simply be because that AP, for whatever reason, is handling more traffic. If I was having no issues then I would have contributed the increased error counts to increased usage. But, being as I was having an issue with an application I could not make this association. There really was a problem here and it needed correction.
Now, let’s look at a different AP in another part of the house:
This AP actually gets cleaner starting on the 3rd. This would explain why the Remote in my office was having issues but other remotes were not. This issue was directly influenced by which AP my device was connected to. Now, in this case, the cleaner graphs are partially because of the lower traffic levels. But, overall, this AP is behaving better than the other one. In fact, if I just turned off the offending AP and pushed all clients onto this one my problem would have probably gone away.
I ran a scan of all the channels at ~5PM. It’s important to pick a “busy” time when all your neighbors are online. This allowed me to identify the least congested channels. I did this from each location where I had an AP. It’s important to do this at every location as even a few feet of distance can change what is congested and what is not. After identifying the less congested channels I proceeded to reconfigure the APs to those channels. When this completed, and the devices settled down and re-attached to their closest AP, I had zero problems with Roon. Everything was cleaned-up.
Going back to the two charts I posted earlier, you can see when the configuration change occurred after the 4th. I’m not looking for zero errors but I am looking for consistently flat performance where my applications are performing without error. I then consider that my baseline. When I check my graphs I simply look for things rising off of the baseline.
You can see these charts are from August. I’ve not had to reconfigure my APs yet but over time I am seeing a general increase in error counts. I have more work to do but the truth is… it’s only going to get worse. The best thing you can do is be a good neighbor and turn down the TX power of your gear. If you’re not trying to provide network access to your neighbors there is no reason why you need to blast them with signal.
The main point of this post is to show how issues with Roon can be visualized in the network. Of course, not all wireless gear gives you this visibility; especially historically. But, my situation was typical of issues I read on the forum and if I had dismissed them as “cannot be network, everything else works fine” then I’d still be having issues with this remote. My example illustrates that, even without this visibility, it often really is “the network”. You can get the same visibility from your phone by downloading a Wifi analyzer and just looking at the congestion across the channels. Like most things, the network requires maintenance to keep performing optimally. As the days get darker and the kids start doing more homework I’m sure I’ll have to rerun this exercise and find other ways to keep the error counts down.