Crashing every few hours; constantly rescanning

So, I’m using ROCK on a NUC with normal ROON clients, all up to date.

Since a few days ago, ROON has been constantly rescanning the library. It’ll do that, indicate that “76 items scanned; 76 files added”, but then spins and spins…and keeps doing that until, eventually, the server crashes and restarts.

And the process repeats.

Nothing about my setup has changed in a while, save for ROON versions. Help me, @support, you’re my only hope!

Hi @David_Nanian ----- Thank you for the report and sharing your feedback with us. Sorry to hear of the troubles.

Moving forward, I have went ahead an enabled diagnostics on your account so we can try to get a sense of what is causing the mentioned behavior in your report. The next time your NUC (i.e core machine) comes online a diagnostics report will be automatically generate/upload to our servers which will contain a fresh set Roon logs.

Once the report has reached our servers I will be sure get it into our tech team’s queue for analysis/feedback.

-Eric

Thanks. I fully rebooted the ROCK server yesterday and it’s been up to a day… hopefully it has previous crash logs that it’s sending, Eric.

Hi @David_Nanian ----- Thank you for your patience while our tech team has been looking into issue for you. Very appreciated!

Moving forward, I touched base with the team today to get an update on your ticket and as per the conversation the team has requested the following information:

  • Please describe your network configuration/topology, being sure to provide insight into any networking hardware you are currently implementing.

  • I seem to remember (from a previous thread) that you are making use of a NAS to store/access your musical collection. Is memory serving me correctly?

-Eric

Hi, @Eric - router is an EdgeRouter Lite. That goes into a switch that has the NAS (Synology). That feeds to another switch on the other side of the room, to which the NUC/ROCK is attached. The switches are identical Netgear ProSAFE GSS116E units.

The NAS is a Synology DS2415+.

1 Like

Hi @David_Nanian ----- Thank you again for your continued feedback here and more importantly, thank you for your patience. Both have been very appreciated!

Moving forward, I had a chance to touch base with the tech team today to discuss your issue and the team has informed me that they are noticing traces in your logs that point to some network related troubles. In light of this, the teams has asked if you could please perform the following troubleshooting exercises to help us try and determine what is causing the reported issues to occur (i.e crashing and constant scanning). Please see below.

Test #1 - Leave networking storage location “disabled” (temporarily):

  • Please leave your network storage disabled in the application for a day, and confirm if ROCK crashes.

Test #2 - NAS + ROCK on the same switch:

  • Please connect the NUC hosting ROCK and the DS2415+ on the same switch and confirm how things hold up. Are you getting the the constant re-scanning of your library still?

-Eric

So - hm. What specific “network related troubles” are you seeing? More information about what they’re noticing would be helpful, since I’m not seeing anything on my end that seems bad (and I do a ton of network related stuff).

Certainly, you’d see occasional network shutdowns (the last one was probably two weeks ago) because I occasionally bring up alternate network configurations. But general stability has been good.

The server has been up itself for two weeks. ROON looks like it crashed a few days ago. What kind of traceback was produced for that crash? The network didn’t go down then, although it’s quite possible I installed a Synology update a few days ago…

I’m not seeing constant scanning at present. But I obviously did have this recent crash (or other event)…

Hi, Eric. So, I didn’t turn off the network storage, but I did attach them to the same switch. The 1.4 update installed, and Roon seemed to crash once within a few minutes of starting, which was weird. But it didn’t crash again after.

Also, a continued problem that might be related in some way - when I play an album from TIDAL, often the first track says that it’s “unavailable on TIDAL”, while the next track works. Hitting |<- will go back to the “unavailable” track and then it’ll work.

That happened here this evening, about four minutes ago, with the album (just went to check, and Roon has crashed again while I"m typing this) “But Not For Me” (Ahmed Jamal Trio).

Anyway, not sure what’s going on, but if it’s networking, it’s not the switch…and literally no other devices are having network issues that I can see. And the Synology is being used extensively for iSCSI transfers and it’s not failing for those…

I had switched from a Skull Canyon NUC to a “plain” NUC with ROCK installed to simplify management. Was that a mistake? (ROCK doesn’t seem to be crashing, just the server.)

Hey David,

We are currently investigating an issue where loss of connection to a network storage device can cause Roon to crash in some circumstances. Our QA team is working on making this happen consistently so our developers can resolve the issue – we have heard a handful of reports of this, but haven’t quite nailed it internally yet.

Eric and I just looked at your logs, and while it’s hard to say exactly what’s going on, it definitely appears that the network isn’t as stable as we’d like. Normally, Roon should perform ok here, but it’s possible you’re running into the elusive crash I mentioned above. That’s why I would still recommend this be your next step:

At least then if things are not stable, we’ll have a much clearer understanding of this issue (namely that you are not experiencing the network storage crash.).

What I honestly don’t understand is the network instability. I’m currently running with a single switch to the Roon. The same switch is plugged into my upstream Ubiquiti EdgeRouter Lite. That goes directly to the cable modem.

I’m not seeing any packet loss. Looking at the switch statistics, I don’t see any errors on any port. Cable tests (using the switch) show no problems. The router doesn’t seem to indicate errors. As far as I can tell, no other devices are having trouble.

So… when you say that the network isn’t stable, I’m just trying to understand what you’re seeing.

This is kind of similar to something that was bugging me a week ago. Roon core all of sudden restarting over and over without any reason to do so. I deleted half of the content that was in the database (2 of the total of 3 folders roon is scanning for content) and added them half an hour later after a cleanup of the database.
This resolved the restarting of the core (it hasn’t restarted while scanning since, auto scan or a manual triggered one), so to me it seems I had a broken database of some sort.
While investigating I also discovered that when I restart my synology nas, roon core crashes allmost instantly and keeps doing this until the nas is up again. The same happens when I just shutdown the port on my switch where the nas is connected to. I would expect roon core to stay running but just throwing errors for the watched folder in the library settings.

Certainly, the Synology is restarted when an update is applied (which happens fairly often). But otherwise I don’t turn it off.

And the corrupt database part?. I left roon playing for over 24 hours with just a single watchfolder on my synology containing something like 200 tracks (used radio). It didn’t crash once. Before that, with the original 100000 tracks in 3 folders on the same synology it crashed almost instantly when hitting rescan watchfolder while it was playing. (I’ve auto scan every 8 hours enabled).

The @support team hasn’t indicated there’s a problem with the database, in their investigation, so I don’t think that’s what’s going on. I still don’t understand their observation about my network, though. Hopefully they’ll be able to clarify.

@mike and @support

I’m in kind of a bind here.

Although I’ve got backups turned on, and the UI says the last backup was last night at 21:00, when I look at the actual backup folder, there hasn’t been a successful backup since October 14th. So, at the very least, you’ve got a bad bug where it says backups have been happening when they haven’t. What’s weirder is that even though the backups don’t seem to be occurring (again, based on looking at the place where the data is stored), ROON thinks they have been when you browse the backups in the UI, too. I haven’t yet tried restoring the backups that seem like they’re bad. [Follow-up - I restored the most recent one. It took a long time, so I assume I’m just misinterpreting the contents of the folder, which haven’t been updated since October 14th.)

On top of that, ROON is now crashing almost constantly, and I can’t get a manual backup to happen. Turning off network storage does seem to stop constant crashing…but it also means my database isn’t really being accessed, because I have no local storage. The only content would be TIDAL.

Even with storage turned off, though, asking it for an immediate backup fails silently. I click, and nothing happens - the dialog closes. There’s no indication of any action taking place, and the folder it’s supposed to be backing up to doesn’t change. So, it seems that it’s the scanning of files and database that’s really the problem, not the content as such.

It looks to me like my database is, indeed, hosed in some way that’s causing serious problems. This can’t have to do with networking, since it can’t even back up with networked shares off - and I’m not sure how I can diagnose when I can’t even get things to run long enough to back up. (I was going to try moving back to the other NUC with Windows rather than ROCK.)

I’m going to try to move to the other NUC using an old database backup. I’ll retain this setup so it can be diagnosed. Hopefully that’ll provide what’s needed…?

Further update: up and running on the old NUC (Skullcandy running W10Pro). Updated Windows to newest version, updated ROON to newest version, restored most recent backup (which was close to current AFAICT). Has been working since without crashing (as best I can see). Networked share behavior is significantly better (new files show up almost immediately, whereas with ROCK it takes until the next scheduled update from what I can determine). Constant rescanning not happening. There’s no “Uptime” indicator for the server process, but the process IDs seem normal.

Hey David,

Sorry for the slow response here.

I suggested this was related to your network or environment because there are lots of network- related errors in the logs. That could be a red herring, of course, but I’m not sure sure I can give you a comprehensive answer of all the issues that could cause those sorts of errors – they vary from mis-configuration to hardware failures to bad cables, and so on.

The issues you and @Edwin_Muskee are describing could also be related to media – if there was a file in your library with certain types of corruption, it could potentially cause your Core to crash, after which the system would restart and rescan. This also matches your symptoms.

I think a good next step here is to try and eliminate variables. If you’d like to confirm this isn’t related to media, try disabling all storage, running with a folder that only contains a few pieces of known good content. If the crashes continue, we can be reasonably sure this isn’t related to media.

You can also access the Data directory on ROCK and rename the RoonServer folder to something else (like RoonServer_main) and then try with a fresh database. Make sure you have a backup before doing this. If Roon is stable with a new database and minimal media, add the rest of your media – if the crashes come back, we’ll know this is related to something in your media library.

The same kind of testing can be done with your network. If you are experiencing these issues with your current switch and router, try connecting everything to a different router with different cables and seeing if the issue occurs – this will help us rule out networking or other environmental factors (like bad cables).

Finally, two other points:

First:

We are aware that the backup folder doesn’t show current modification time and we are fixing that, but this shouldn’t be causing the issues you’re describing. Can you reproduce the issue described above and let us know a timestamp? Then @Eric can grab some diagnostics and get a better sense of what’s happening.

Second:

@Edwin_Muskee – I would be very careful with the process described here:

By removing watched folders and cleaning the database, you are telling Roon “forget everything you know about these files” – any edits, play counts, favorites, playlist or tag appearances, etc will all be deleted.

This is a pretty extreme step, and while that might not matter if you haven’t made many edits, if something related to your media is causing crashes, we can help debug that with you, and you shouldn’t have to wipe out your database.

If you’re still having this issue, please let us know in a new thread, with all the details listed here. We can generate some diagnostics and see if there’s a file that is causing issues.

Looking forward to getting this stable for both of you – your patience is appreciated guys!

So, my last message went through a series of edits that obviously didn’t do it any favors, so let me try to summarize where I am and what I did.

  • After trying to substitute various cables, switches, etc without any improvement, I ruled out the network “issue” at least with regard to anything outside the ROCK NUC.
  • Backing up looked weird when I was checking the folders, but seemed fine when I tried restoring, so that (based on the dates, etc) was a red herring, as you indicate.
  • Substituting the Skull Canyon NUC running W10 for the ROCK NUC, and putting everything else back as it was (cables, switches, connections, etc), and restoring the same database, referencing the same NAS content, worked fine, and (as best I can tell) is stable, crash-free and fast.
  • I’m sorry I can’t provide a specific time for the backup programs I was having, but it’s going to be +/- 30 minutes from the time of the original post. I just tried doing the manual backup as I’m typing, to the same folder I used when I was trying to do this originally, and it worked (is working) exactly as expected.
  • The content didn’t change at all, and the same scanning is happening, and it seems fine, so I don’t think the problem was content, unless backing up and restoring ‘fixed’ it.

Still really confused about the network problems. Your description implies my network is ridiculously unstable, yet I see no evidence of it. Maybe it’s the old NUC? Are you seeing the same errors on my current setup?

Hi @David_Nanian ---- Thank you again for the continued feedback, and more importantly, thank you for your patience.

Moving forward, I want to just touch base with you and see what the current status of your Roon setup is after your most recent post. It sounds like you are currently still running Roon under Win10 and things are functioning as expected. Are you still configured this way OR have you gone back to ROCK?

Furthermore, in your update I noticed you mentioned the following:

“The content didn’t change at all, and the same scanning is happening, and it seems fine, so I don’t think the problem was content, unless backing up and restoring ‘fixed’ it.“

I want to just point out that that it’s important to remember (as I am sure you are aware) that different core platforms will respond to media differently. So the same media can work differently, depending on the Core operating system.

-Eric"

I’m still on the W10 setup, since it’s been (as far as I can tell) completely stable. All of the content came from a Mac system, almost all ALAC, some AAC. As you know, it’s all stored on a Synology DS2415+. But the content has been generally the same since I started with Roon. Certainly, there was no significant change between when ROCK was stable and when it wasn’t.

Part of the reason I moved to the other NUC was that you found my network to be unstable. Do you see the same network instability with the current setup? I’m using all the same cables, switches, even the same port…