Information on Roon's March 6, 2024 Connectivity Incident (inability to connect to Roon services, Qobuz, Tidal, and KKBOX)

On March 6, 2024 Roon released version 2.0.30 which included build 1382 of Roon Server. In the day following the release our support team received reports of connectivity issues from a very small subset of the Roon user community. Upon investigation it was discovered that a small, maintenance-focused change in Roon was resulting in some devices being unable to establish a connection with Roon’s cloud services as well as the cloud services of our streaming partners (TIDAL, Qobuz, and KKBOX). Due to the late release time on March 6, many of these user reports did not start appearing until the morning of March 7 (US EDT).

Our support, QA, and engineering teams sprang into action to investigate the issue and restore connectivity to the affected users. While we were able to determine the cause of the failure early on in the process, its true nature made resolution quite difficult. Ultimately, the team was successful and by the evening of March 7 we had a workaround developed and ready for testing. By the morning of March 8 we received confirmation that the fix was successful and impacted systems were able to communicate again.

With instructions in hand our support team has been working to disseminate information on the workaround to the impacted users.

What happened?

As far back as August 2023 our engineering team has been revamping and improving the way in which Roon talks to our cloud services and streaming partners. This has been a long-term project that has touched every aspect of our infrastructure and has had the benefit of providing improved performance and more reliable connectivity. This change has been in place with some parts of our infrastructure for many months and the final stage of the project was to release it to our users in production, which happened in build 1382.

This type of maintenance work is always ongoing at Roon and any given release may contain dozens of these housekeeping changes. More impactful changes, such as this one, go through significant testing in order to ensure a smooth transition for our users. In this particular case the transition went as expected, but a very small group (less than 0.1% of our user base) reported a loss of connectivity that was not experienced in testing.

Background

Roon uses industry-standard protocols for network communication and we rely on the underlying operating system to manage this communication. This is no different than any other application running on the device. In other words, we’re doing the same thing as a web browser or mail client. Similar to any other application developer, we assume that the host computer is properly configured and reasonably up-to-date.

Our change involved moving to a newer and more modern toolset for managing network connections. There is nothing remarkable about this code and it is in use by countless other software packages on a daily basis.

What went wrong

In our development and testing we assumed that the devices running Roon Server would have operating systems that are reasonably up-to-date and at least have a somewhat recent set of what OS vendors call “critical and security updates” applied. What we discovered in this outage is that there are a subset of devices running Roon Server which are woefully out-of-date.

Please keep in mind that the deficiency in these systems will impact any application attempting to apply modern connectivity and security best practices, not just Roon.

The vast majority of these devices are purpose-built music servers, NAS devices, or custom Linux distributions. In all cases these are seen as appliances which typically run a limited suite of applications, and this limited use served to mask the deficiency in their underlying operating system code.

When the updated version of RoonServer started on these devices the new networking code was not able to establish a connection to cloud services due to out-of-date components of the operating system. This resulted in symptoms ranging from update errors, to search failures, to lack of connectivity to streaming services.

The fix

We are currently working with impacted users to implement a fix which will revert them back to Roon’s prior network connectivity method. This is a simple process and our support team is working with users who need assistance with getting their systems patched.

Going forward

While it is reasonable for us to assume that all devices running Roon meet a baseline for connectivity and security, we do understand that many of our customers have significant investments in specialty hardware and we don’t want to abandon them or force them to abandon their hardware. The entire Roon team is currently working on a long-term solution to accomplish the following:

  1. We will be in contact with our partners who distribute server products to inform them of the deficiencies in their operating systems and the steps they will need to take to rectify. We expect this to result in updates from the server manufacturers which will eliminate this issue entirely.

  2. Our engineering team is threading the needle on some code changes which will allow devices with these security and connectivity deficiencies to continue to function as long as possible.

  3. We are also investigating some other changes to Roon’s infrastructure to better address incidents like this in the future.

If you own an impacted device then you can also help yourself by reaching out to the manufacturer or operating system distributor for instructions on how to get the most up-to-date networking, security, and encryption updates for your product. This should include any updates to the following:

  • Packages related to basic TCP/IP networking
  • Web retrieval tools such as curl or wget
  • samba(windows file sharing)
  • ntp (network time protocol)
  • Name resolution tools
  • up-to-date SSL CA root certificates

The devices that we noted during our investigation have operating system components that are at least 4 years out of date and, in some cases, much more than that. Given the nature of internet threats and security, those systems are dangerously out-of-date and could present a significant security risk to the networks to which they are connected.

Again, we do apologize for the inconvenience for those who were impacted. We strive to provide a 100% positive experience for all of our users and are sorry to have missed that goal in this case.

38 Likes

Seems the thread is not pinned?

2 Likes

As an owner of an “older” device, a Salkstream III purchased in June 2016 (hard drive was replaced in Nov 2020 with current OS at the time, Linux Arch 5.8.5), this concerns me. Fortunately, I wasn’t negatively impacted by the current Roon update, but the above statement from Roon has me concerned about current security of my Salkstream and future compatibility with Roon.

Updating the Salkstream’s OS is way beyond my limited computer skills, and with Salk’s recent retirement, I don’t know how much longer Salk will be able to offer support. I did ask and Jim Salk stated he would continue to offer support (but for how long?).

Am I concerned about nothing at this point?

It’s hard to believe it’s been almost 8 years already. :astonished:

1 Like

Remember computer years are like dog years but with a larger multiplier! An eight year old computer is actually 90 so no wonder it has issues :grinning:

5 Likes

The transparency and informative nature of this is higly commendable, thank you Andrew!

12 Likes

I don’t get it.
we are using a special Software, that the most people won’t call cheap, to listen to our music, but some of us are running it on outdated Hardware?
Why would anyone do that?
The hardwar can be older, no problem, but using your most precioaus data on systems, which are not maintaned anymore? the risk of losing this data due to security fuc****s is huge btw.

1 Like

Just to clarify, I’m not currently having any issues with my “90 year old” Salkstream III. :grin:

1 Like

I find it hard to believe I’m going to lose the Salkstream, its probably 7 ish years old? Just because I got new WIFI service??

Oh, brother…

Perhaps contact Jim Salk. I’m sure he will do what he can to get your Salkstream running properly again.

This has got nothing whatsoever to do with your wifi service. Did you read the detailed first post of the thread? Ask if something is unclear.

1 Like

8 posts were split to a new topic: Confused by what could be impacting my system

4 posts were merged into an existing topic: Tidal Loading Slowly Issue Reoccurrence with Roon / ROON Subscription Frustration (ref#HDMYK9)

I’ve got it in the back of my mind that when it comes time to retire the Salkstream my best option is likely to build a NUC/Rock unit (with LOTS of handholding to get it done :blush:). I figure that way I don’t have to rely on a 3rd party for OS maintenance and support (though Salk’s support has been exemplary).

PS - The downside to going the NUC/Rock route is losing my current “one box” solution that works well with my particular setup; ie ability to Airplay TO the Salkstream, optical spdif connection directly to my older processor (no USB, no HDMI). I would have to add additional equipment to get the functionality I have now, which I’d rather not do if possible.

2 Likes

Thanks Roon for the detailed explanation. The time and effort is appreciated. I was not affected by this issue but it is interesting to read about what is happening.

4 Likes

Perhaps I should clarify. Roon is presently working perfectly for me. The reason for my comment is i expererienced the outage the Roon team described last week and I wanted to let the Roon team and the community know the explanation doesn’t seem to apply to my equipment. It would seem to call into question if the issue might have had more causes than the Roon team thinks.

I run Rock on a NUC 11 - 11th Generation Intel Core i7-1165G7 2.8 GHz - 4.7 GHz Turbo, 4core, 8 thread, 12MB Cache. The BIOS was updated when i set this up in January 2024.

Given the newness of this unit and the fact it is is on Roon’s list of recommended NUCs, it doesn’t seem like Roon’s explanation that the problem was caused by “operating systems that are [not] reasonably up-to-date and [do not] at least have a somewhat recent set of what OS vendors call “critical and security updates” applied” would explain why i experienced the problem.

I could speculate that the problem could be caused by a Router (mine’s not that old - 2018 but it’s from my broadband provider so perhaps it wasn’t latest tech) but i guess the Roon team would need to clarify if that could be a cause.

Wow, the first blameless post-mortem! Thanks Roon for sharing the insights caused the incident. Please continue in providing transparency. I know by heart, it pay’s of at the end of the day.

1 Like

Today something very interesting happened. I turned on my Grimm Audio Mu1 and tried to listen, but for some odd reason my tidal and Qobuz was logged out and a pink pop up kept on bothering me to retry.

Since the services are not reconnecting , I logged out and tried to login again . Unfortunately I can only login to the browser but when it try to redirect back into the unit via the app , the process just stop. In simple I cannot login to Roon using a Mac , iPhone , android or a pc.

May I kindly know how to get back my Grimm mu1 reconnect with Roon?

Something doesn’t make sense with this explanation.
My Roon Server is a standard Windows 11 PC, and has all updates.
Yet I also had the problem and had to replace the “bits” file.

The devices that we noted during our investigation have operating system components that are at least 4 years out of date