The solution to DB corruption

Thanks for the advise, will do that.
I had no issue with 880/882, that‘s why I was worried that the fix would scramble some of my tracks (as it seems happened to others).

1 Like

Trying a restore of a B831 backup to B884 to see if anything has changed in terms of the application of the update on the database once restored.

I have still have my new Roon database on my other NUC, which is now on 884 also.
In terms of stability this has been solid, just working on the structuring of my local library so it matches how Asset reads it from the metadata associated with the files, and re-identifying where Roon hasn’t been able to.

You totally missed the point, it was a rebuttal of your comment, nothing more.
Thanks and bye.

3 Likes

And no - restoring a B831 Backup into a B884 Build, just bums out with a system halt, so no change there.

Been using Roon for about 2, maybe 3 weeks.

Nothing is corrupt. Please let me know the nature of whatever this is about because this post is written as an assumption.

Thanks.

Hi,

There’s a huge amount of discussion on this topic, I’d recommend using the forum’s search function to investigate.

In the meantime here’s a copy of topics to start your quest…

1 Like

I maintain it would be nicer to detect the beginning of the failure and tell the user “change your SSD now and restore from a backup.” I don’t think that’s too tech. Even cars now tell you when to change the oil, my home air cleaner and furnace tell me when to change its filters, and so on. Much nicer to know it’s time than just find one day the thing doesn’t work.

Guys, I have been working in the storage industry for +15 years. Solid State Drives (SSD) don’t behave like that, because SSDs do not contain erasable magnetic coatings. There is no sudden loss of data or corruption on untouched regions, this is incorrect. It does not mean that an SSD cannot fail suddenly, but creeping data corruption based on “untouched regions” is not a common phenomenon due to the technology used.

So SSD’s do use cells to store data, which have a limited number of write cycles. While this is true, wear leveling is a technique that most SSD controllers use to increase the lifetime of the memory cells. The principle is simple: evenly distribute writing on all blocks of a SSD so they wear evenly. All cells receive the same number of writes, to avoid writing too often on the same blocks. The lifetime of the cells differs for each NAND flash memory technology. Also, every NAND flash device uses Error Correcting Code (ECC) on the controller. The Bad-Block Management of SSDs ensures that data is moved from faulty areas (cells) to functioning cells. The defective cell is then excluded for future data storage and a new one takes its place instead.

In other words: A lot of security controls take care (Wear Leveling, ECC, Bad-Block-Management etc.) of your data on SSDs. While it is still possible that SSD devices fail or lose parts of data, it is very unlikely.

7 Likes

Unfortunately not the case: SSD failure rates not far behind HD

Well, again… Not according to my personal experience. I had multiple customers with hundreds of Petabytes of storage and many thousands of drives. The failure rate of “traditional” magnetic spinning drives was by factors higher. And believe me, those customers hammered their systems with millions of IOs on a daily base…

And one more time: An SSD still might fail, but sneaking data corruption of “untouched regions” is not a common phenomenon of SSD technology. It was a more of a “possible” (also not common) problem on magnetic spinning disk, that’s why our systems used a special formating that included ECC (as it was not a common technology on the disk controllers then). However, SSD has mainly solved that issue built in the HW already.

3 Likes

I think you need to put things into perspective. The short version (from the story linked above):

"In the first table, Backblaze shows the lifetime SSD and HDD failure rates starting from 2013. You can see that HDDs have a significantly higher failure rate than SSDs, making us think that SSDs are indeed much more durable than HDDs, like we’ve been told all along.

However, there are a few problems with this, the main one being drive age. Backblaze only began installing SSDs in 2018. But the company has data pertaining to hard drive health going all the way back to 2013, which is skewing the results quite a bit.

After taking into account drive age and equalizing it between SSDs and HDDs, we can see that the results have changed significantly. SSDs aren’t that far behind hard drives in failure rate, with a 1.05% annualized failure rate compared to 1.38%."

3 Likes

Sorry to aks: Backblaze who?
I prefer not to base my opinion about durability of SSD’s on a non-scientific research of a somehow not so successful company (Total Rev. 53M, Net Grow -564%) that I think nobody has ever heard before… We don’t know anything about the technology and architecture nor the brands, the drive technology, production badge etc. used in this report. And a sample of less than 3’000 drives with a total of 42 failed ones can not seriously lead to such a bold statement, that SSDs have similar failure rates than magentic spinning disk drives. Again, I was dealing with data center technologies from 1992 to 2018 and started with SSD’s in large volumes beginning early 2008 in the enterprise market. My clients have been mainly Fortune 500 companies in the financial services, pharma/life science and beside that in the government sector. And to my experience, I can’t support Backblaze’s statement. But I think we’re missing the point of what it is really about…

As I said multiple times before: An SSD still might fail, but sneaking data corruption of “untouched regions” is not a common phenomenon of SSD technology. Also, please keep in mind that these “untouched regions” don’t really exist, because whenever you run a backup, these data areas are read out. Got it?

OK, let’s explain it again in another way. The main point is that Roon made a strategic choice of database technology in favor of performance over data integrity. For this reason we now have the situation that some users see their valuable work of countless hours over the past years as lost. But instead of helping affected users, Roon points the finger at customers’ faulty SSDs as the main reason.

I hereby question this statement by Roon. Please stop pointing at faulty hardware on the customer side. The problem is your software, which does not protect enough against data corruption. You recommended your own backup tool to your customers, which now turns out to be useless because you failed to check your backups for data consistency in the past. And yes, failed, because this has been best practice for many years. And Roon should now go one step further and introduce regular consistency checks outside the backup process as well. How about a background process that creates consistent snapshots to which you can automatically roll back in case of data corruption? I think it is also not a problem to take the database offline (no need for a more complex online backup) for this task.

In the meantime, it would be nice if your support staff would offer proper help to the affected users.

2 Likes

Probably - https://www.backblaze.com/

1 Like

This was not a question. It was irony.

The often cited backblaze reliability report hits 100s of times a year in my feed.

Strange that you have never heard of it, given how widely read it is within the technology community.

It’s ok we will all just listen to your feedback.

Carry on sir

4 Likes

when you add a ? it’s a question :wink:

4 Likes

Alex, thank you for sharing parts of your really impressive CV. Unfortunately that’s no scientific proof for your claims, either.

So why not accept that SSDs are prone to errors and that these errors may lead to database corruption, as do glitches on rotating media. I mean, just look at the specs.

2 Likes

No, they don’t. And it has nothing to do with specs. But it’s ok. Nevermind.

1 Like

These arguments are getting way too technical. From my point of view, the essentials:

  1. SSDs fail. Many of us have had that happen.
  2. SSD manufacturers offer user tools to test and analyze SSDs.
  3. The Roon OS doesn’t incorporate such tools.
  4. It would be helpful to the user if it did and told the user when error rates were increasing.

This really is a different topic from the title – it’s not about DB corruption but rather avoidance of unexpected failures.

2 Likes

Dear Alex, thank you for your clear explanation.
From our own experience (we started using ssd’s about 5 years ago) I can confirm your information and our own conclusion is, ssd’s are significantly more reliable then spinning disks.

Best :wink:
Sebastiaan

2 Likes