Recurring database corruption due to image issue requiring manual intervention (ref#13MXR2) [Ticket In]

Describe the issue

This is the same issue as before: Qobuz Playback Issues and Missing Artwork in Roon (ref#CAFOPT) - #2 by daniel

Database corruption keeps happening.

Yesterday (after the build 1510 updates) I had to manually delete (rename) the RoonServer database folder
before I could even initiate another backup restore (following this guide https://help.roonlabs.com/portal/en/kb/articles/error-message-there-was-an-issue-loading-your-database).

Today, I get the tell-tale signals that my roon instance is still very very sick:

I’m tired. I just want to browse and listen. There’s no substitute for Roon. How do get rid of this corruption?

Describe your network setup

1gb switch wired

Hello @S_Heeren ,

Thanks for reaching out. Do you by any chance still have the previous RoonServer Logs folder, from when the corruption occurred? If you can upload it to the below link and let us know once uploaded, we can take a look to see if there are any clues to the corruption type.

https://workdrive.zohoexternal.com/collection/8i5239cc05950ac07456889838d9319545a82/external

@noris Thanks for the quick reply; I have uploaded the logfiles both from before the rename and after. Note though that the symptoms shown in the screenshots are with the newly restored database/data folder anyways.

Hi @S_Heeren,

Thanks for sharing both sets of logs! That said, I should note that running Roon or Roon Server in a docker setup falls outside the scope of support, could you please temporarily move your server onto a supported setup?

We reviewed the log sets you’ve sent over, and didn’t see any signs of corruption. We did however see repeated failures in connecting to your external storage devices, for example:

Warn: [.NET ThreadPool Worker] [roon/cifs] failed to connect to CIFS storage: mount error(111): could not connect to 192.168.50.25 Unable to find suitable address.
Info: [Broker:Misc] [broker/filebrowser/volumeshare] Volume's CIFSMount's availability changed: False

Do you have any potential firewalls blocking or restricting access? Or, can you try adding a username and password to the network share settings in Roon?

It may be worth ensuring the IP address is correct and not assigned to another device as well. We’ll be on standby for your reply! :+1:

Those are expected. The file shares are on my desktop computer which isn’t always on.

Can you explain to me how moving the installation to the bare metal would make a difference? I fail to see it and it’s exceptionally intrusive to do on a whim, with no expected gain.


PS. I appreciate that you probably get loads of standard PEBCAK problems, but really there’s very little reason here to invent firewall problems. If I couldn’t reach my file-server (A) I would have noticed and told you (B) the images for Qobuz new releases would not fail to load…

Please, consider looking at the entire report (including the details of my previous spate with this when I found out that it was silent database corruption in after all). I’m too tired to entertain frivolous “maybe it’s your DNS” type troubleshooting ideas. If I’m somehow wrong about those being frivolous, I will happily respond to good arguments why I should consider such.

Hi @S_Heeren ,

I took a look over your previous log set as well, but as @benjamin noted, there are no corruption traces present. I do see quite a few image errors though, e.g.:

02/23 22:13:17 Critical: Library.EndMutation: System.InvalidOperationException: Nullable object must have a value.
   at System.Nullable`1.get_Value()
   at Sooloos.Broker.Image.get_IsRoughlySquare()
   at Sooloos.Broker.Music.LibraryAlbum.OnCompute(LibraryMutationEnv env)
   at Sooloos.Broker.Music.LibraryMutationEnv.Finish()
   at Sooloos.Broker.Music.Library.EndMutation()

I have not seen this error type before, so I am inquiring with the team, but it is very possible that this error is related to the docker installation somehow.

When you noticed the corruption error, did you also have to rename the Roon Remote database (the Roon folder on your Windows PC)? Do you happen to have the logs from that instance and if so, can you upload them to the previous uploader as well?

but it is very possible that this error is related to the docker installation somehow.

Explain it like I’m 5. This doesn’t make logical sense and has not been a problem before. Note that I’m not exactly a new user. I have never run a different type of installation.

did you also have to rename the Roon Remote database (the Roon folder on your Windows PC)

I did not. I see that the images not loading half corrected itself over night. There are still some images refusing to load (about 4 per screenful; also in My Library/Albums view). It occurs on different clients (iOS app as well as Windows client; installations don’t get much more controlled than the iOS app, I suppose).

Interestingly, the images not loading are different for each client, e.g. “Bach: The Art of Fugue” is loading on my desktop, but not on the iPad. Several forced restarts do not seem to help.

Note that my issue is NOT about images not loading. It is about the database corruption that prevented the clients to start /at all/ until I manually renamed the RoonServer folderÂą. Note that I went back to the same backup (Jan 26th) that I went to when I had the previous report.

It actively manifested immediately after I clicked “update all” in the UI. I can try to actually reproduce the upgrade if I can tell what version of RoonServer to install prior to the current build, and see whether it reproduces the breakage.

Âą I merely intuit that the image not-loading is a related symptom, because it also happened the previous time (that time being the most notable symptom before I found out that fileshare sync and playback were also blocked).

After clearing the image cache on the Windows client, the missing images do match the missing images on the iPad…

Hi @S_Heeren ,

Thanks for your feedback! We spoke with the development team today, and it does appear that the issue is related to image aspects, though it doesn’t seem to be the typical type of corruption we usually encounter.

We’ve submitted a ticket to improve Roon’s resilience against these kinds of errors, but I can’t provide a timeline for any potential fixes at this time. Let us know if you have any other questions in the meantime!

1 Like

Mmm.

  1. Any idea why is no one else encountering this?
  2. How would trouble with image processing cause the database to become irrecoverably lost?

I’m not confident this is a root cause. Besides, the most important question:

  1. what can I do to keep this from wrecking my Roon experience every other week?

Restoring the Jan 26th backup has proven to only be a reprieve for 17 days. And to an extent that might be due the fact I didn’t use Roon a lot during that time…

Hi @S_Heeren ,

I searched for those image errors in our ticket tracking software and couldn’t find any other tickets ever created with this trace. I am not sure why exactly you seem to be the only one affected, it is still possible that the container is part of the equation, though we’ve submitted the bug ticket to mitigate this issue as just a library error instead of a critical error. If you’re able to reproduce the issue on a standard non-Docker PC, that would also help us provide that data point to the dev team.

@noris Understood. Thanks for the thoughtful reply.

I worry that reproducing it on a fresh installation may still cast doubt on the origin of the issue when using a backup. I seem to recall reading somewhere else that Roon db corruption can be silent and manifest much later on.

That to me is the real issue here. Shouldn’t we strive to crystallize new detecters from the current failures, so we can stop having corruptions going unnoticed?

If I start without a backup I’m going to lose my listen history/statistics, MUSE settings, playlists no matter what. Plus I will have to do a whole lot of reconfiguring storage locations and audio setup.

I’m really tempted to just wing it and hope the breakings are going to be less frequent.

Hi @S_Heeren ,

Yes, we are always working on improving our corruption detection and there was a big push for corruption detection a while back, but your corruption type is a novel one.

You can still try to use a backup, but just try to restore it on a bare-metal machine without the docker involved.

The problem is this only makes sense with the preconceived assumption that Docker is a factor. Which, frankly doesn’t make any sense at this moment. Like I said, it’s going to be pretty invasive to make this change.

I might find the time, but alas I have a life with many higher priorities, so don’t hold your breath.

The kicker is, when it proves to reproduce outside docker (which in my estimation is a 95%+ chance), we’re back to square 1 because the next step would be to assume the backup is already compromised.

A post was split to a new topic: Issue with missing images on Roon mobile

Hi @S_Heeren,

We value your opinion, but as it stands, Roon has never officially supported running in a Docker setup. While we understand that some users prefer this approach, our software is designed to run on native operating systems to ensure stability, proper hardware access, and database integrity. Docker environments introduce complexities with networking, storage, and hardware passthrough that can cause issues.

If you’re unwilling to change your setup, the Tinkering category may have community-driven insights, but unfortunately, we can’t continue to provide official support in this setup, regardless of the situation.

If you find the time and reproce the same issue within a supported setup, we’d be more than happy to continue digging into potential solutions!

Thank you for your understanding and cooperation. :+1:

Just keeping the thread alive. I’ve had another case of the database becoming “too” corrupted to start and having to go back to the same backup. Keeping you posted when I find the time to go though the motions you describe

Hi @S_Heeren,
Thanks for the update. We’ll keep an eye out for the results of trying @benjamin’s suggestion.

I finally wiped the entire box and installed ROCK. Jumping through all the hoops Âą.

So far, it is working. Notably, there’s a weird UI glitch:

test

Somehow the display is unable to decide whether it is showing 506 (!!) artists out of 505 or maybe out of 504 artists (both seem to be mathematically impossible). The behavior is exactly the same for all clients (Android, iOS and Windows).

(¹ note that installing the ffmpeg static binary doesn’t work as advertised for ffmpeg-git-20240629-amd64-static with md5 95c383c030917837143e0cc6e339e322, it still reports missing codecs)


UPDATE @daniel

Sadly a lot is still broken. I removed any mp3s again, just to be sure it wasn’t interfering in some way. I also noticed on album banned/tagger “corruption” (I may have manually tagged this at one point, I don’t recall). However, no matter how oft I “Force Rescan” all shares, and despite reboots, the tagged/banned album stuck around.

It finally disappeared only when I unbanned the album…

Now, the artist count jitters between 504 out of 502/503. Clicking any artist gives zero UI response - just the “Loading Artist…” spinner.

Showing History does the same - no result. To top it off, since last two reboots I’m unable to play back any track, regardless of source (local share or Qobuz). It has worked since I moved to ROCK, as noted earlier.

I have uploaded the logs (rock-logs.zip) to the uploader page.

Really hope we can salvage my listen stats and MUSE settings. I’m ready to nuke the DB for other purposes and just reimport everything from scratch. This is taking a lot of energy. Frankly it’s ruining my Roon experience. Roon has been completely broken for quite a while now.

And just like that, I decided to try another server restart, this time with the fileshares disabled. Now I can playback tracks, again, still after re-enabling the fileshares. History shows again.

However, I can still not show any artist view via the artist browser. After a LOOONG wait (at least a minute or even 2) the top banner of the artist page sometimes appears, but no other content, and still the spinner. Second attempt (to make a screenshot) didn’t even show the banner even after 7 minutes.

Hopefully this might give new leads. Something particular is afoot related to Artists in the db index, less around images this time. I always restart from the same Jan 26th backup, for reproducibility.