The solution to DB corruption

Sorry to aks: Backblaze who?
I prefer not to base my opinion about durability of SSD’s on a non-scientific research of a somehow not so successful company (Total Rev. 53M, Net Grow -564%) that I think nobody has ever heard before… We don’t know anything about the technology and architecture nor the brands, the drive technology, production badge etc. used in this report. And a sample of less than 3’000 drives with a total of 42 failed ones can not seriously lead to such a bold statement, that SSDs have similar failure rates than magentic spinning disk drives. Again, I was dealing with data center technologies from 1992 to 2018 and started with SSD’s in large volumes beginning early 2008 in the enterprise market. My clients have been mainly Fortune 500 companies in the financial services, pharma/life science and beside that in the government sector. And to my experience, I can’t support Backblaze’s statement. But I think we’re missing the point of what it is really about…

As I said multiple times before: An SSD still might fail, but sneaking data corruption of “untouched regions” is not a common phenomenon of SSD technology. Also, please keep in mind that these “untouched regions” don’t really exist, because whenever you run a backup, these data areas are read out. Got it?

OK, let’s explain it again in another way. The main point is that Roon made a strategic choice of database technology in favor of performance over data integrity. For this reason we now have the situation that some users see their valuable work of countless hours over the past years as lost. But instead of helping affected users, Roon points the finger at customers’ faulty SSDs as the main reason.

I hereby question this statement by Roon. Please stop pointing at faulty hardware on the customer side. The problem is your software, which does not protect enough against data corruption. You recommended your own backup tool to your customers, which now turns out to be useless because you failed to check your backups for data consistency in the past. And yes, failed, because this has been best practice for many years. And Roon should now go one step further and introduce regular consistency checks outside the backup process as well. How about a background process that creates consistent snapshots to which you can automatically roll back in case of data corruption? I think it is also not a problem to take the database offline (no need for a more complex online backup) for this task.

In the meantime, it would be nice if your support staff would offer proper help to the affected users.

2 Likes

Probably - https://www.backblaze.com/

1 Like

This was not a question. It was irony.

The often cited backblaze reliability report hits 100s of times a year in my feed.

Strange that you have never heard of it, given how widely read it is within the technology community.

It’s ok we will all just listen to your feedback.

Carry on sir

4 Likes

when you add a ? it’s a question :wink:

4 Likes

Alex, thank you for sharing parts of your really impressive CV. Unfortunately that’s no scientific proof for your claims, either.

So why not accept that SSDs are prone to errors and that these errors may lead to database corruption, as do glitches on rotating media. I mean, just look at the specs.

2 Likes

No, they don’t. And it has nothing to do with specs. But it’s ok. Nevermind.

1 Like

These arguments are getting way too technical. From my point of view, the essentials:

  1. SSDs fail. Many of us have had that happen.
  2. SSD manufacturers offer user tools to test and analyze SSDs.
  3. The Roon OS doesn’t incorporate such tools.
  4. It would be helpful to the user if it did and told the user when error rates were increasing.

This really is a different topic from the title – it’s not about DB corruption but rather avoidance of unexpected failures.

2 Likes

Dear Alex, thank you for your clear explanation.
From our own experience (we started using ssd’s about 5 years ago) I can confirm your information and our own conclusion is, ssd’s are significantly more reliable then spinning disks.

Best :wink:
Sebastiaan

2 Likes

Alex, der Klügere gibt nach ………
Frohe Weihnachten und einen guten Rutsch :champagne:

——————————

Alex, the wiser gives after …
Merry christmas and a happy new year

1 Like

Well said.

1 Like

A good summary.
You bring a good point on the table: Just check your SSD with tools that are widely available. Most probably, you will find out, that your SSD is in good condition.

There is another point to add: OS & Applications fail.
And this to a much higher extend than your HW. Why? Software is prone to errors. Much more than HW. A lot more. That’s the main reason, why systems/applications crash. But good software architectures have implemented various security measures to prevent data loss or corruption in such an event. And in case that there is an issue, there are tools to analyze and correct possible inconsistencies, without loosing all data (most likely no data loss at all). And if there is data loss, it is telling you what data needs to be recovered. And if everything else fails: There is Backup (which also should include consistency checks).

Just think about: The database world has moved to SSD some time ago. The most critical databases today are on SSD. Your bank account, the stock exchange, your health insurance, your ID, basically all critical transactions: All most likely on SSD. If there would be such a critical issue with SSD and data corruption that the world could not deal with (and had to rely on consistent backups only like Roon does right now), this move to SSD would have never happen.

1 Like

Ja, Du hast ja Recht. Vielleicht bin ich da etwas zu hartnäckig. Wenn man sich aber soviele Jahre tagtäglich mit dem Thema auseinander gesetzt hat, kann man es vielleicht nicht einfach hinnehmen, wenn Behauptungen aufgestellt werden, die einfach nichts mit der Realität zu tun haben. OK, ich bin da auch ein Sturkopf. :innocent:

Mir geht es vor allem darum, Roon in diesem Punkt in die Verpflichtung zu nehmen und den betroffenen Benutzern echte Hilfe anzubieten, denn diese stehen nun während der Weihnachtsfeiertagen vor einem Scherbenhaufen. Und Roon versucht sich dabei durch die Hintertür hinauszuschleichen und zeigt mit dem Finger auf schlechte Hardware der Kunden. Das ist einfach unfair.

Dir auch noch frohe Festtage!
LG Alex


Google Translation
Yes, you are right. Maybe I’m a little too stubborn. But if you have dealt with the topic every day for so many years, you may not simply be able to accept it when claims are made that simply have nothing to do with reality. OK, I’m a stubborn head there too. :innocent:

My main concern is to make Roon committed to this point and to offer real help to the users concerned, because they are now in front of a pile of broken glass during the Christmas holidays. And Roon tries to sneak out through the back door and points a finger at poor customer hardware. It’s just unfair.

Happy Holidays to you too!
LG Alex

2 Likes

We have here a consumer application.

The most important data (well-tagged music tracks) are usually on an HDD for cost reasons.

Qobuz, Tidal will certainly think about your service providers those you propagate here.

Only for the core a small and fast SSD was recommended.

Your special know-how in all honor, but do not teach every car driver here how he can become world champion in Formula 1.

Roon tries to give enthusiasts who don’t have a driver’s license just a simple vehicle that gets the music from A to B.

It’s just the IT engineers here who drive a full crash by own and additional “data maintenance”.

The expert discourse on data corruption goes back to 2015. The very experts who have nevertheless committed themselves to this solution, which they see as far too simple, should now be much quieter.

Credibility always has a side, am I a helper, a destroyer or, after a long passion, an implacable critic who only wants to smash china himself.

A position determination is necessary and I like above all support or I am no longer there!

Peace on earth and a pleasure for the human being

Which is exactly why it should have been on RoonLabs’ plate to make sure there was an airbag in their race car, from the start.

In my mind, this ideally should extend to the music library as well, without relying on the crash-tastic “export” function through another computer (think “plug an external hard drive into the core, or point the core to a cloud account, and don’t worry about it too much anymore”), and I have a little bit of hope that it’ll happen with ROCK 2.0.

2 Likes

OK. I’m going to try to get back on track with the topic.

  1. Is there a solution against DB corruption?
  2. If the solution is the new update, all good then. Right?
  3. What check is done in the new update and how can I be confident corruption won’t happen again, even though it impacted only 75 users, so far?
  4. What about impacted users? Are we leaving them behind?
  5. Communication: why is it so difficult to have proper communication on this issue?
1 Like
  1. My understanding is there is no “solution”… all you can do is update and hope

  2. And if it fails start over with Roon’s promise this will never happen again. We will see.

  3. Who knows, secret Roon stuff, but we know is the check is done before a backup, which should mean backups stay dependable.

  4. With an apology apparently, but all they need to do is start over with a fresh DB

  5. Search the forums, if you’ve the time or inclination - Danny has given a few informative posts, but they’re unfortunately littered around a bit.

2 Likes

While this is way too late for some , it is better than never as new users should not get into this situation again.

1 Like

Without getting one’s hopes up here, the following consideration still remains:

Do I get the old version temporarily still at the start to do some things myself?

  1. can I export my playlists or provide them with Love for Qobuz/Tidal?
  2. do I get my tags in the export or can I note that manually for the transition.
  3. does the new art director bring a lot of new images quickly (more than ever before)?

Goal must always be, look forward and trust Roon or don’t fret twice or ever. It’s just a software decision.

Who knows something better should take it. I stay with all the problems I read here and after a frustrated 1st year booked the 2nd year in August, although never all good wishes can be fulfilled.

As further rules I have imposed on me:

  1. always a system update (complete 1:1 copy) of a SSD /HDD.

  2. tagging only with MP3TAG, Foobar2000, Songkong in industry standard like these e.g.
    Foobar2000:ID3 Tag Mapping - Hydrogenaudio Knowledgebase

  3. always upload or embed pictures and never keep them private in your own Roon solution inaccessible.

After all, the new solution must contain multiple backup lines. Even in the future there is no 100% security and guarantee for working hardware and software.

According to human judgement it is good for almost everyone now, but what do we humans know about the problems of the few affected here. Nevertheless, we do not have to call each other amateurs or idiots. Good communication is needed especially from Roon, but also we maintain our community only in peaceful and civilized interaction with each other.

1 Like
  1. Generically, yes but it can be multifaceted and not generally something a home user wants to go through. Specific to how your question as it applies to Roon, the short answer is “no”. Roon isn’t doing anything different between now and last month as it relates to a “solution against DB corruption”.
  2. The new update will verify the integrity of the database before a backup. That’s it (from my understanding). If the database is determined to be in an unhealthy state at this point then it will refuse to backup and ask you to restore a previous backup.
  3. The database structure used is a bit complicated for general audience but it requires having a basic understanding of it to answer this question. It is a Log Structured Merge Tree (LSM-tree) in-memory database. There are two important parts here with the first being “in-memory”. When something is “written” to the DB it actually updates the in-memory part of the DB. Later it flushes these writes to disk for persistence. Think of the database struck of a tree with ever increasing levels of trunks and branches. As your data expands the number of branches expand. When the DB is read one or more of these branches in traversed. During the write, or more commonly, the flushing of data from memory to disk, a branch can break (I’m over simplifying here). If, while reading, the DB traverses a broken branch that is DB corruption and the DB is unusable. If you don’t traverse that branch you don’t hit the corruption. Before 880, Roon rarely traversed all branches of the DB which is why you could have latent corruption and not know it. In fact, a reboot or upgrade might be the only time the entire tree structure was read because this is part of the DB start-up process to rebuild the in-memory part. A backup would flush the DB to disk and backup this persistent data but not run all the branches. Now, in 880+, the entire tree is traversed as part of the backup process which is why Roon knows there is no broken branches, no DB corruption, and your DB is good at time of backup.
  4. See 2. You cannot be confident corruption won’t happen again without the many multifaceted things I hinted to in 1. However, if your DB becomes corrupt you’ll know at next backup and you’ll be able to restore to a known good copy of the DB. This will limit dataloss.
  5. The database used has a philosophy to halt upon identifying inconsistencies in the data (corruption). There are no official tools to correct these inconsistencies. There are some unofficial tools. I cannot speak for Roon (I don’t work for Roon) on any plans they may or may not have in providing a tool to try and correct data inconsistencies in the DB. However, as there is no official published tool I assume they will not be able to help here. Additionally, unlike the amazing cross-platform support of the DB none of these tools I speak of have the same cross-platform support so they cannot be universally distributed.
  6. Because its complicated. Overly complicated for those that don’t have a background in this stuff. Roon has been very transparent here and it’s generated thousands of posts of misinformation, inaccurate assumptions, anger, inaccurate troubleshooting, false impact numbers, etc. This is the exact reason communications stop communicating and take their support processes behind the public eye. I’ve seen it time and time again with small companies who try to do the right thing but are ultimately forced to pull their public support and transparency. It’s sad but how many times would you put up with pitchforks and torches at your front door from misguided the mob before pulling back on the transparency that incited them?

My personal opinion (summary):
Roon should have never been in a state it backed up a corrupted DB. Roon agrees and has said so. That’s fixed now. I’m not a fan / supporter of the DB that Roon uses in the application that Roon is. My very strong personal feeling is that this DB should only be used in situations where rebuilding the entire DB, without dataloss, from another source is possible. This isn’t supported other than restoring a backup. However, I think the numbers of those that encountered actual corruption are lower than what you’d statically expect so maybe, just maybe, my strong personal feeling, that Roon should be using a different DB, are pure paranoia when looking at the actual number of users who have encountered a DB corruption issue. The vast majority of the issues seen with 880 were software bugs. It’s unfortunate that these software bugs manifested themselves in the same release as the way to detect DB corruption. It’s bad timing against communication that made it all too easy to inaccurately associate the bugs and the new backup check. It’s made worse that official Roon support communication lags significantly behind the communities ability to generate fodder for the mob.

Go do your backups, check they are good, and go enjoy the music. DB corruption is not inevitable and, based on this little exercise over the past two weeks, the chances your issue is a software bug is significantly higher than DB corruption (I still wish Roon would move away from the current DB though). In the two weeks we’ve spent arguing over this… you probably missed the chance to listen some unknown new release that would have changed your life.

On another personal note… I have a valve that has gone microphonic which is more frustrating than anything Roon has ever done / not done.

22 Likes