The solution to DB corruption

How and why Roon chose LevelDB is an interesting question. Costs aren’t an issue, there are plenty of open-source and reliable database implementations. These cover plenty of use cases and types, e.g. RDBMS, “no SQL”, triple stores, and on. With LevelDB Roon have gone for a performant key/value store, little more than a phone directory at heart (the simplest analogy). So it’s fast and simple, the downsides are no model/queries, single process (but multiple threaded?) access, no client-server deployment, and a rep for being corruptible. There are more resilient solutions, e.g. most RDBMSs, but they will have their own downsides, possibly performance among them.

On another note. While it’s not a solution to corruption a good backup strategy helps. Rather than have a single schedule, multiple schedules/locations and strategic retention can work well. Something like:

  • Every day and keep 3
  • Every 2 days and keep 3
  • Every week and keep 3
  • Every month and keep 5
  • Every 6 months and keep 2
  • Every year and keep 5

This keeps plenty of recent backups in different places and you retain a long, sparse tail should the worst happen.

4 Likes

That’s interesting. LevelDB doesn’t offer checksum or anti corruption mechanism? I mean I’m not from the tech world but when I hear about DB, it’s usually MySQL, Azure SQL or Cosmos and their equivalents from Oracle or AWS.

From Wikipedia:

LevelDB has history of database corruption bugs.[15][16][17][18][19][20] A study from 2014 has found that, on older (non-checksummed) file systems, the database could become corrupted after a crash or power failure.[21]

:man_facepalming::man_facepalming::man_facepalming:

LevelDB, despite the name, is not actually a database, as the term is usually understood. It’s a fast key-value store, with persistence features that are kind of a side-effect of its design, which is to handle tables that don’t fit in available memory. To make the persistence features robust, more engineering would be needed. “Conventional” engineering would probably use a completely different database design for backups, not just a copy of the current LevelDB state.

1 Like

I see thanks for the explanation. But why use this DB, then? That doesn’t make sense.

Performance, I suspect, judging from this post:

I have to say, I’ve moved my Core, updated many times over the past three years, and have only had one corruption incident, when I was running the Core on a MacBook. On my Linux/ext4 systems, the database persistence seems pretty stable, conventional engineering be damned. I’ve just updated to 880 without an issue.

3 Likes

I am troubled by this statement in a (closed) Roon Labs post:

We know that many of us have a carefully curated database: our settings, album covers, metadata, tags, playlists and favorites are exactly as we want them to be. Having to start fresh sounds like a nightmare and we do hope it won’t come to it. But, if it does, please know this is so it will never happen again.

That makes no sense. Every database started with a fresh, new database. So how can anyone say that starting fresh will ensure it never gets corrupted again?

What is needed in Roon is a mechanism to periodically check and repair the database if it becomes corrupted. That is hardly the fault or responsibility of the user! Good software always should have that. It is really malfeasance to allow the user to create a large database and have no way of fixing it when the software ()#*%#$&s it up.

It is especially troubling that Roon allows backups of corrupted databases. That defeats the purpose of backups entirely! A consistency check needs to be part of a backup.

It’s very troubling that the luxury solution to music playing is so short on good engineering basics.

5 Likes

Looks like that is one of the fixes in 880.

If so, it’s a start. Repair tools?

Apparently the server just stops if it detects corruption. You need to restore from a backup.

Well, if the db really is checked before backing up, that should at least limit the damage to a few days’ work. Not ideal, but surely better than what some users are reporting.

1 Like

There is this: Smithfarm - the Brain: How to repair a leveldb database

Wonder what it actually does?

1 Like

Good question. I suppose it is like CHKDSK - making the db correct in structure, though possibly with data loss. What else could it do?

Reliability in modern software is far better than I remember form 20, or even 5, years ago. I can’t remember the last time I experienced a problem. I hope that Roon Labs will find a way to bring their database up to modern standards.

I looked at the code (https://files.pythonhosted.org/packages/48/6e/9da3c29c0cbeb5871241ef154f11196867712ee3adc4077a94fa5f7d9dbd/leveldb-0.201.tar.gz, the file leveldb-0.201/leveldb/db/repair.cc, if you’re curious). It rebuilds the DB from the log files.

1 Like

Interesting but see the suggestion? Use a better backup system. Is he being sarcastic or he just doesn’t care? I’m puzzled by this attitude, it alienates the whole purpose of having backups.

If Google Chrome & AutoCAD use such DB, don’t they have checksum mechanisms? I mean we are not talking about a toy for a few thousands audiophiles but entreprise class application servicing millions of users.

1 Like

https://community.roonlabs.com/t/i-think-roon-should-not-be-crowd-sourcing-it-s-metadata-for-content-enrichment/182415/44?u=gigatoaster

I got an answer from Danny.

So saying leveldb is prone to corruption is false in the case of Roon?

I would say it would be better to stop attacking a straw man if ever there was one. How the corruption was caused is largely irrelevant. The relevant bit I see is why Roon would let corrupt database backup happen for so a long time. Corrupt backups defeat their purpose and are not apt to help recover from a unusable database. How that database became unusable is another discussion altogether.

4 Likes

This thread is very useful, Danny weighs in quite a bit in this message and later on.

https://community.roonlabs.com/t/i-think-roon-should-not-be-crowd-sourcing-it-s-metadata-for-content-enrichment/182415/44?u=johnny_ooooops

In fairness Danny recognised this

.sjb