As many of you may have noticed we had a glitch with our cloud services which pushed out updates to Roon and RoonServer on MacOS, Windows, Linux, NAS, and RoonOS. This resulted in approximately 1% of our users having their cores and/or remotes migrated from Roon build 884 stable to build 886 beta. We wanted to let you know what happened and the steps we’re taking to ensure it doesn’t happen again.
What Happened
Yesterday (January 24) we experienced a widespread outage of some older cloud infrastructure which hosts a few services for secondary functions and internal Roon operational processes. One of the services impacted was our update service and this resulted in Roon users not being able to update Roon, RoonServer, or RoonOS.
Our team worked diligently to get these services back online, but there were some errors in the bring-up process with the update server. In correcting those errors a mistake was propagated which resulted in the update server thinking that build 886 beta was to be pushed out to all users.
This issue was discovered when users started to report the problem here on Community and when our QA team saw their stable cores installing beta software. Once notified our team was able to quickly correct the error and reverse the process, but not before several hundred cores had been migrated.
At this point all users who have not yet been migrated back to build 884 stable should be able to prompt the migration by restarting Roon on their core machine.
User Ramifications
Build 886 beta is identical to build 884 stable save for a few lines of code related to a specific partner device type. We are confident that installation of build 886 and migration back to 884 had no negative impact on user libraries, databases, or system configurations. Aside from the inconvenience and confusion that this event created there is no further impact for our users.
Going Forward
The cloud infrastructure which was impacted by this outage is in the process of being retired in favor of the more modern platform used for the majority of our services. The update server was on the old infrastructure in order to maintain compatibility with some partner devices that do not support current versions of TLS.
In light of this event we are now planning to migrate the update service to our new infrastructure in the second quarter of this year. This will greatly improve the stability of our update system and significantly reduce the chances that an outage could lead to a situation like this.
Beyond that we are planning some changes to our update service which will provide better visibility into the impact of changes as well as some enhanced sanity checking.
The entire team at Roon would like to extend our apologies for the confusion that this event caused and assure you that we are taking steps to ensure that something like this doesn’t happen again. Thank you for your patience and understanding.