Random Roon Server Crash?

Hi @support !

I keep experiencing random what appear to be software crashes.

It’s noticeable while I’m playing music, doesn’t seem to matter on the bitrate (16/44.1 FLAC).

System:
Host-
Dell Poweredge R510 : Dual Xenon 3ghz Server / Microsoft 2016 Server / PERC 700 RAID hosting both Roon Server (headless) and music. I have verified via all logs there are no system / data / storage issues. OS Drives are 15KRPM and the data drives are 10k SAS in Raid 5. No separate NAS.

Rendering-
Raspberry Pi with Roonbridge feeding a Mytek Brooklyn DAC via USB Connected via Wired Gigabit Ethernet (two gigabit switches in between, both Ubiquiti Managed)

Control-
IPad on WiFi

Symptoms - While playing music, audio will stop, and on the iPad I get the “roon working” logo, and about 15 seconds later, everything returns to normal and I have to hit the play button to resume where the music left off.

Looking at the server, Roon software doesn’t appear to be disappearing, I see the Icon in the system tray saying “running”

I look in my logs folder at “RAATdata.log” and see stuff like:

05/27 09:16:39 Info: Starting RAATServer v1.3 (build 223) stable on windows
05/27 09:16:39 Info: [RAATServer] creating RAAT__manager
05/27 09:16:39 Info: [RAATServer]     appdata_dir  = C:\Users\Administrator\AppData\Local\RAATServer
05/27 09:16:39 Info: [RAATServer]     unique_id    = 383d4e14-1023-4e9b-9efe-d88a3789b9b6
05/27 09:16:39 Info: [RAATServer]     machine_id   = e93a4182-4aa4-34dd-0a8c-305ddc3e9626
05/27 09:16:39 Info: [RAATServer]     machine_name = SERVER2
05/27 09:16:39 Info: [RAATServer]     os_version   = Windows 10
05/27 09:16:39 Info: [RAATServer]     service_id   = d7634b85-8190-470f-aa51-6cb5538dc1b9
05/27 09:16:39 Info: [RAATServer]     is_dev       = False
05/27 09:16:39 Trace: [raatmanager] starting
05/27 09:16:39 Trace: [raatmanager] [System Output] loaded config from C:\Users\Administrator\AppData\Local\RAATServer\Settings\device_9b961fc2035f3dd343e04f8f136e7682.json
05/27 09:16:39 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:16:39 Trace: [raatmanager] initialized
05/27 09:16:39 Info: [RAATServer] running RAAT__manager
05/27 09:16:39 Trace: [raatmanager] starting discovery
05/27 09:16:39 Trace: [discovery] starting
05/27 09:16:39 Info: [discovery] [iface:192.168.50.10] multicast recv socket is bound to 0.0.0.0:9003
05/27 09:16:39 Info: [discovery] [iface:192.168.50.10] multicast send socket is bound to 0.0.0.0:63479
05/27 09:16:39 Info: [discovery] [iface:127.0.0.1] multicast recv socket is bound to 0.0.0.0:9003
05/27 09:16:39 Info: [discovery] [iface:127.0.0.1] multicast send socket is bound to 0.0.0.0:63480
05/27 09:16:39 Info: [discovery] unicast socket is bound to 0.0.0.0:9003
05/27 09:16:39 Trace: [raatmanager] starting server
05/27 09:16:39 Info: [jsonserver] listening on port 50789
05/27 09:16:39 Trace: [raatmanager] announcing
05/27 09:16:43 Debug: [discovery] broadcast op is complete
05/27 09:16:49 Trace: [RAATServer] refreshing @ 10s
05/27 09:16:49 Trace: [raatmanager] refreshing platform
05/27 09:16:49 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:16:49 Trace: [raatmanager] announcing
05/27 09:16:53 Debug: [discovery] broadcast op is complete
05/27 09:17:03 Trace: [jsonserver] [127.0.0.1:50805] accepted connection
05/27 09:17:03 Trace: [jsonserver] [127.0.0.1:50805] GOT[LL] [1] {"request":"enumerate_devices","subscription_id":"0"}
05/27 09:17:03 Trace: [jsonserver] [127.0.0.1:50805] SENT [1] [nonfinal] {"status": "Success", "devices": [{"device_id": "default", "type": "wasapi", "name": "System Output", "config": {"output": {"name": "System Output", "device": "default", "type": "wasapi"}, "external_config": {}, "volume": {"device": "default", "type": "wasapi"}, "unique_id": "edf7dd82-2418-faea-f4e6-f0fe42312175"}, "is_system_output": true}]}
05/27 09:17:39 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:18:39 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:19:39 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:20:39 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:21:40 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490
05/27 09:22:40 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490

I’m an IT director / network engineer (well who isn’t these days) so feel free to ask me anything.

I am 100% confident it’s not my network, I’ve run ping / pathping tests to the renderer and there is no packet loss. Even if it were with the renderer, why would it show the “working icon” on the iPad - which leads me to believe the actual roon server software is faulting somewhere.

The server has a few other duties but under very light load (32GB RAM), and isn’t running low on any resources or have any high disk activity during the time.

Thanks!

Can you zip up a full folder of logs for @support? Instructions are here.

A time stamp for when a few of these crashes would help too. Thanks!

1 Like

Sure - can you provide me with a link to upload the logs to?

Hi @narkotic ---- Can you send them over to us via a dropbox download link in a PM?

-Eric

I don’t have a dropbox account. Before when I submitted issues I was provided with a link. Are you no longer offering this service?

You can use the same instructions we’d given you previously here.

Just rename the file to include your user name and let us know when it’s been uploaded.

1 Like

Hi - I’ve uploaded the logs. I know it’s been a few months since I opened this ticket, but I forgot to attach the logs. The system still crashes - running latest version.

The server resides on a Server 2016 box with a Xenon processor, 24g RAM. I know it crashes as music stops, I look at the roon client which shows the animated logo for a few seconds then its back to where it was and the music is paused.

Looking in the logs I keep seeing - 09/22 14:19:38 Trace: [raatmanager/windows] failed to get default device HRESULT=0x80070490 before a crash happens.

This happens regardless of my endpoint - I have a roon bridge hardwired, a few apple TV’s (airplay) and an Integra AVR (airplay).

Hi @narkotic ----- Thank you for touching base with us, very appreciated!

I just checked the provided package and it appears that you have sent us over your RAATServer logs. Would you kindly please provide your Roon logs and I will attach those to your ticket as well?

-Eric

Oh - sorry. I’ve just uploaded the log files from the roon folder.

Thanks @narkotic, we’ve got them!

-Eric

Hi @narkotic ----- Thank you for your patience while our techs have been looking into this issue you’ve reported to us.

Moving forward, the team is having a hard determining what could be triggering this behavior in your setup. In light of this the team has asked if you could provide any further insight into this problem. They would like to have a better sense of the what is happening, in terms of your “usage”, when the error occurs. Furthermore, do you ever make this observation while in an idle state or is it only noticeable during playback?

-Eric

Hi - The machine that is hosting the core is a dedicated Dell Xenon Processor server (16 cores) with over 20GB RAM. It’s other duties are that it’s a file server using a hardware RAID solution (Dell PERC) which is where I store all of my music on.

It’s not doing much more…

I’ll admit I have seen it happen while it’s not playing anything back, I have an iPad next to the couch that usually runs roon remote 100% of the time and I’ll periodically see the “loading animation” come and go, but it’s not as obvious unless I’m playing music back, either on Apple TV, Integra AVR (via airplay) or on my laptop. It’s bizarre.

Thanks for the insight, @narkotic. Your continued feedback and more importantly, your patience has been very appreciated!

I have updated my report with your latest comment(s), which is now back in our tech team’s queue for review. Once they have had a chance to evaluate your statements, I will be sure to provide you with an update in a timely manor.

-Eric

Hi - Checking status on this. It’s driving me nuts.

Hi @narkotic ---- Thank you for touching base with me and sorry for the frustrations.

I checked on the status of your ticket and can see that it is still with our techs who have been conducting the investigation into this behavior you reported to us. Upon seeing your post I placed a request for an update and should hopefully have some feedback for you soon.

Your continued patience and understanding, are both very appreciated!
-Eric

Please - is there any sort of fix for this in place yet? Nothing more annoying than a random crash while listening to music.

Hi @narkotic ---- Thank you for the follow up.

Moving forward, we are doing our best to understand the cause of these intermittent crashes you are experiencing with Roon under Windows Server2016. Without being able to reproduce this in house, I am sure you can understand that this has been hard for us to pin down. However, please rest assured that we will keep at it :microscope:

We have new diagnostics/log gathering capabilities that may help aide in our understanding of your issue. I have went ahead and enabled this new feature on your account so our techs can take a “closer” look. Furthermore, can you please verify what plugins were installed to get Roon running on WS2016?

-Eric

Hi - Thanks for the update. Initially my reaction was going to suggest some sort of extended debugging .EXE or .DLL (I used to work for a large software company) but it’s awesome that you guys can do this remotely. It does cause me to have some privacy concerns though - if you’re able to do this remotely, what sort of info is it sending you about my server? I’m a private person and would prefer it to not send you the data that is on my hard drives, file names, IP addresses, etc. It would be good for the software to have a pop up allowing this and then empowering the end user to disable it at will to feel better about privacy.

Please keep me apprised to the situation and also forward my concern about privacy to Danny. I am one of the original reviewers of this software for positive feedback.com.

I’ll directly explain the concerns from @eric’s post, and the rest is covered under our privacy policy.

We have 2 new systems in play that you are probably interested in: Push & Bits.

  1. Push
    With our new Push system, we are able to send small commands to a Roon Core. This is used for things like ‘get roon-logs/diagnostic information’ (as defined by the above privacy policy), or ‘update your metadata because David Bowie died’. We plan on using this more for things like new releases, interesting concerts, etc… but none of that is done yet. One other Push command that is relevant here, is one to “recheck bits”. See the next item for an explanation.

  2. Bits
    Many production systems have a way to remotely turn on/off code. For example, at Facebook, their system is well documented and called Gatekeeper. We rarely push out code in a ‘let it fly’ manner, as things can go wrong in the wild, so all new code additions are on what we call a “bit”. These bits can be turned on/off remotely, so we can enable or disable code/features quickly and without complex rollout schedules.

    For example, we had a bug in a recent release that affected a small handful of users (under 10, out of many tens of thousands). We were able to revert a small change for those minority of users by flipping the “bit” for that feature, and their issue was resolved after a restart (unless it was written in a matter where the bit can be flipped dynamically). Immediate satisfaction for these minority of users (those on OSX that had a drive name with a parenthesis in it), and in a later release, we are able to solve this problem for everyone.

While Eric didn’t bring up or use Bits in this case, it’s important to note that things that appear to be DLL injection or downloading of dynamic code, are indeed nothing of the sort. We are well aware of the risk of building such a thing.

As for your concern about privacy and private data. In our backend, we don’t hold your private user data anywhere near any other data, and we don’t hold credit card information at all (it is passed to our credit card processor and then forgotten as to be PCI compliant). Your passwords are hashed using BCrypt with proper salting and then the original is forgotten as well. Afterwards, we are never in possession of your password, encrypted or otherwise. Etc, etc… we take this stuff really seriously.

We are not taking random data from your hard drive, but we do pull part of your Roon data (like the Roon generated logs and the settings in your Roon database). If some of those Roon data items include information about the paths to your storage locations or the names of your NAS shares, that falls under diagnostic information that is accessible to support people at Roon Labs. For example, our sales team does not have access to this data.

As for your IP address, we have that and we go one step further and actually use GEO-IP databases to figure out what region of the world you are in. We use this for copyright notice information in-app and reporting to our metadata license providers (for example, different regions have different lyrics copyrights, and reporting requirements from the provider require region) and TIDAL shows different libraries available to people in different regions. We also use your location to figure out which concerts to show you and in what order.

Thanks for the detailed response.

Placing the power to the end user to allow one to execute remote code, or retrieve logs / extended information would still be a nice peace of mind option. I ask that you still consider it - it would gain a lot of respect from some users. A button like “allow remote support” that if enabled shows a reminder in the client screen that it’s enabled. Roon is running as a user with administrative access - with you guys having a back door, of sorts, one could execute commands and/or see other running processes, obtain registry info, etc, no?

That said - I’m truly interested to know what any debugging information has done to help assist with this ticket. I’m still left with a system randomly crashing. Sometimes it’s just internet radio playing something in the living room and it suddenly stops and then the wife starts asking - what happened to the music?

Thanks