Unintended updates to Roon build 886 -- [Issue description and Roon response]

As many of you may have noticed we had a glitch with our cloud services which pushed out updates to Roon and RoonServer on MacOS, Windows, Linux, NAS, and RoonOS. This resulted in approximately 1% of our users having their cores and/or remotes migrated from Roon build 884 stable to build 886 beta. We wanted to let you know what happened and the steps we’re taking to ensure it doesn’t happen again.

What Happened

Yesterday (January 24) we experienced a widespread outage of some older cloud infrastructure which hosts a few services for secondary functions and internal Roon operational processes. One of the services impacted was our update service and this resulted in Roon users not being able to update Roon, RoonServer, or RoonOS.

Our team worked diligently to get these services back online, but there were some errors in the bring-up process with the update server. In correcting those errors a mistake was propagated which resulted in the update server thinking that build 886 beta was to be pushed out to all users.

This issue was discovered when users started to report the problem here on Community and when our QA team saw their stable cores installing beta software. Once notified our team was able to quickly correct the error and reverse the process, but not before several hundred cores had been migrated.

At this point all users who have not yet been migrated back to build 884 stable should be able to prompt the migration by restarting Roon on their core machine.

User Ramifications

Build 886 beta is identical to build 884 stable save for a few lines of code related to a specific partner device type. We are confident that installation of build 886 and migration back to 884 had no negative impact on user libraries, databases, or system configurations. Aside from the inconvenience and confusion that this event created there is no further impact for our users.

Going Forward

The cloud infrastructure which was impacted by this outage is in the process of being retired in favor of the more modern platform used for the majority of our services. The update server was on the old infrastructure in order to maintain compatibility with some partner devices that do not support current versions of TLS.

In light of this event we are now planning to migrate the update service to our new infrastructure in the second quarter of this year. This will greatly improve the stability of our update system and significantly reduce the chances that an outage could lead to a situation like this.

Beyond that we are planning some changes to our update service which will provide better visibility into the impact of changes as well as some enhanced sanity checking.

The entire team at Roon would like to extend our apologies for the confusion that this event caused and assure you that we are taking steps to ensure that something like this doesn’t happen again. Thank you for your patience and understanding.


3 posts were split to a new topic: Nucleus Unresponsive

Thanks. This kind of write up is really appreciated. It’s probably a bit incomprehensible to those without software exposure, but for those who have worked in cloud environments it’s incredibly helpful.


That’s ok, and thanks for the great explanation, much appreciated

Something I still don’t understand:
I dit not update to 886 but nevertheless still receive update messages in Roon? Can I just ignore that, as I am on the 884 stable version. And will this message disappear automatically?

I’m part of the lucky 1%, maybe I should go buy a lottery ticket today ;). My question is this, what should I do to get things working again? What do you recommend? Should I upgrade all my remotes to the beta, or downgrade my core? What is the easiest/safest? This is the first issue I’ve had in a long, long time. I’m running the core on a SonicTransporter with a QNAP NAS and it’s been wonderful.

Restart your core and that should get you back to normal.

This is likely an artifact. Restart Roon on your core and those messages should go away.

Thanks! I rebooted and that did the trick. All good.

Just did a restart. All good now. Thank you Andrew.

This topic was automatically closed after 3 days. New replies are no longer allowed.