Corruption issue - "Can't read library - restore from backup"

BlackJack · September 24, 2023, 6:50pm

As stated in my post, it’s written from the perspective of an onlooker that I am (and I assume we all are).
For example, there might be additional copies of the DB involved in Roon’s backup process but if, this is hidden to us simple observers.

That there are error detection mechanisms of some sorts included in Roon’s DB is a given fact I guess, as Roon Labs wrote they check the DB (now) as a first step when using their integrated backup. What I tried to achieve is to give my thoughts about why there still can be corrupted backups – or why the restoring may fail. What my post was not about though, is runtime error/corruption detection of the master DB in use. We already know this exists and is done (the reason for the “There was an issue loading your database …” message that we might get unrelated to the Roon backup process).

The gist of my post is that it seems that Roon checks its master (and maybe main DB too?) when doing its backup but not any of the copies that are created thereof during the rest of the process. Roon seems to just assume that those copies are without error/corruption. This assumption is what seems to be flawed though and not to be true in all cases given the reports about corrupted backups in the forum.

A user is able to verify the created backup manually by simply (attempt to) restoring it, but to be honest, who’s ever done this (for every backup, just to make sure)? So if the situation arises that a user needs the backup he might be in for a surprise.

gTunes · September 24, 2023, 7:21pm

This isn’t necessarily true. If Roon backs up a database by simply file copying all of the database files into the backup location, a “restoration” might simply be the reversing of that process. Whether or not corruption is detected the next time the database is opened depends on a combination of implementation-specific factors that we can only speculate about.

In any case, we’re all just speculating here. I think there’s room for improvement related to early detection and to reducing the blast radius (including not propagating corruption to backups / aging out good backups in favor of corrupted ones). I suspect we all agree on this but maybe not.

BlackJack · September 24, 2023, 7:50pm

The next time the DB gets opened is now when you restore a backup or restoring a backup would be a pointless task. If circumstances exist that prevent detection of corruption, then this means that no reliable way of corruption detection exists at all. It’s then hard to see how Roon should do better. One can’t prevent something from happening if you don’t know about it and there is no way to find out about.
What are checksums good for if they don’t work?

Suedkiez · September 24, 2023, 9:12pm

At work, among the more than 1 mil customer machines we have seen occasional weird file write errors on Windows. Most of these machines are corporate with all kinds of invasive and buggy security tools, so this may well not be a Windows issue.

Our software writes a simple settings.xml file every so often, and does so in the most straightforward way: It has a memory buffer with the file content and writes it out as a simply stream in one go. One would expect that this always works, but it does not. Sometimes at some later point we would try to read the settings.xml and the file turned out to be corrupt. We never found out why - access to customer machines is limited and it happens infrequently enough so that all we ever got to see was the corrupt file after the fact.

We have now resorted to writing the in-memory buffer to the file, immediately reading it back in, and comparing the result with the buffer. If this fails, we can at least retry and alert the user if it continues to fail.

My library is small enough (4.5K albums) for the backup time to be insignificant - it takes maybe a minute to prepare and a minute to write it out. I would very much prefer a “paranoid” mode in Roon as suggested by @gtunes (if it does not yet do that) where it performs such a check automatically, even if it takes significantly longer. There is enough time while I sleep. For people with huge libraries this probably has to be optional, but I suppose they would also prefer having the option.

gTunes · September 24, 2023, 9:13pm

You selectively quoted, and replied, to a strange part of my post.

You wrote:

And that is the comment that my post was in response to. My point was simply that restoring a database may or may not provide a clear signal about the integrity of the backup. Whether it does or not depends on Roon’s implementation of backup and restore, and I don’t think any of us have enough information to have a definitive opinion.

gTunes · September 24, 2023, 9:19pm

This is a good practice. Of course if the issue is in the generator or something that participates in streaming to the buffer, you’ll have bad data in memory and the check will succeed. Or if it’s caused by machines with bad RAM, same thing. Software is hard

Suedkiez · September 24, 2023, 9:22pm

Sure, but you have to rely on something. If you have RAM corruption, the software won’t work well, anyway. And we can check the integrity of the settings in RAM.

Edit: We have the advantage that the settings are simple, not a huge complex DB. I mainly wanted to emphasize that even simple file writes that you would never expect to fail do fail occasionally for reasons that you may not be able to control (such as a f%&$i~g buggy “security” tool interfering)

BlackJack · September 24, 2023, 9:28pm

Why? How?

Roon tells you if it can’t use a backup because it’s corrupted. Again:

If the backup is flawed in a non detectable way, then Roon will simply use it. You will not get a warning. No “There was an issue loading your database …”. Nothing. You might see strange content in Roon though (or not) or Roon may just stop working because of the undetectable database error.

This, to me, implies that we talk about detectable corruption (checksums) errors or else you would never know that backups are corrupted at all.

gTunes · September 24, 2023, 9:34pm

My experience, and the comments in this thread, suggest that reality is different from what you are saying in a nuanced, but important, way. Based on the comments in this thread, I believe the following to be true:

Roon will, at times, create a backup from a corrupted database, producing a corrupted backup.
When Roon creates a corrupted backup, an earlier backup may be aged out and deleted based on the backup policy set by the user.
Roon will restore any backup, corrupted or not.
Roon may detect that a library is corrupt at any time at which point it displays an error message.

You are making absolute claims about how and when corruption is detected and you are asserting the absolute effectiveness of Roon’s strategies. I don’t think those claims are merited.

I believe Roon could improve at #1. That would help mitigate or eliminate the consequences of #2 and #3.

BlackJack · September 24, 2023, 9:51pm

I don’t know how one can come to such a conclusion after I pointed out in several posts in this thread how flawed Roon’s backup process is, that he’s based on assumptions (missing out on verification steps) which may lead to precisely the mentioned observations of Roon creating corrupt backups.

gTunes · September 24, 2023, 9:57pm

I think our wires got crossed somewhere along the way. Maybe we’ve both said enough on this topic. We seem to agree that the process is flawed and maybe don’t actually disagree on anything.

BlackJack · September 24, 2023, 10:16pm

In my experience, Roon told me that there was an issue with my backup and it couldn’t restore it. I could also see on my disk that Roon created a separate directory for the DB from the backup to preserve the original DB used before in case of an error restoring. Ron then simply reverted back to that original DB.
If a users reason for restoring a backup is to resolve a “There was an issue loading your database …” message then this fallback strategy is futile though.

But you had a different experience?

PS: I have to admit though that this experience of mine dates back to times before Roon Labs claimed to check the DB before backups.

xxx · September 25, 2023, 1:04am

The bottom line, no matter how it’s arrived at, is that Roon’s Backup and Restore is as reliable as Roon’s Search function.

gTunes · September 26, 2023, 3:22pm

Interesting.

The sequence of events was this:

Launch Roon, see the “can’t read library - restore from backup error”. First time seeing an error like this. Try a few things including rebooting my ROCK NUC, and using a few different client devices in case there client - server interaction causing the issue
Attempt restore from the most recent backup. Restore appears to complete successfully
Try to access Roon again, get the same “can’t read library” error

What I thought I was seeing was Roon successfully restoring the backup. That’s still what I think happened because I didn’t get any indication in the UI or in the logs that restoration failed. If your description of what Roon does with restoration is correct, I should have been advised that the backup was bad.

After this, I tried the previous backup. Restoration completed, Roon came up just fine. Haven’t had an issue since and didn’t see evidence of corruption in the logs.