Can't restore from backup without "hacking" things

For the record, I’m not 100% convinced Synology support is correct about this. AFAIK the inotify limit is 8K on my system:

$ cat /proc/sys/fs/inotify/max_user_watches
8192

So no idea how Roon could be using 51K like they claim.

My response to Synology:

So I looking deeper at this there are a few things going on…

that lsof command you showed me doesn’t list the number of inotify handles that Roon is using. That is the number of open files (lsof == list open files).
The output of the command you provided makes no sense. It should be in the format of: <count> <PID> <process name>
You provided output of <number> <name> <number>. So is the first number the PID or the count?
When I run the command you provided and grep for roon I get the following as output:

$ lsof | awk ‘{print $1,$2}’ | uniq -c | sort -n | grep -I roon
4 19321 /app/RoonServer/Server/processreaper
54 12732 /app/RoonServer/RoonDotnet/dotnet
123 19428 /app/RoonServer/RoonDotnet/dotnet
381 19320 /app/RoonServer/RoonDotnet/dotnet

So Roon is using a few hundred file descriptors/open files. These numbers seem pretty reasonable though- nothing like what you were reporting. Of course I can’t tell if you were really reporting 51514 open files or a PID of 51514 and only 12781 open files which is still a lot.

But it’s more than that… none of my processes right now are using anything like those number of open file handles that you are reporting:

$ lsof | awk ‘{print $1,$2}’ | uniq -c | sort -n | tail
97 1523 /usr/bin/nginx
97 1524 /usr/bin/nginx
97 1525 /usr/bin/nginx
101 14267 /usr/bin/nginx
106 7133 /usr/bin/syslog-ng
123 19428 /app/RoonServer/RoonDotnet/dotnet
151 8469 /volume1/@appstore/ContainerManager/usr/bin/dockerd
275 12856 /usr/bin/transmission-daemon
287 15685 /volume1/@appstore/ContainerManager/usr/bin/docker-proxy
381 19320 /app/RoonServer/RoonDotnet/dotnet

Cherry picking the same processes from your list of processes I see:

50 31215 /opt/miner/erts-12.3.2.7/bin/beam.smp

152 8469 /volume1/@appstore/ContainerManager/usr/bin/dockerd

281 15685 /volume1/@appstore/ContainerManager/usr/bin/docker-proxy

7 23225 /usr/local/packages/@appstore/SynoFinder/sbin/synoelasticd

All these processes are using orders of magnitude fewer open file descriptors than what you reported. So I have to really wonder why the numbers you are seeing are crazy high for a number of processes and I see rather totally reasonable values. Again, it’s not just one application (Roon) but a lot of different processes.

So what could possibly impact all these different processes on the same system? Some of these are running in docker, some of them are not. Maybe the problem is with the Linux kernel? Or some other low level issue in glibc?

I dunno, but it doesn’t at all seem to point at Roon.

1 Like

Hey @Aaron_Turner,

Thanks for keeping us in the loop! I also brought this information to one of our lead developers, and we’re curious to see how things function if you increase the watch limit. The system as well as Roon most likely won’t work well if the watch limit is set to ~8k.

Now, it’s hard to say if this is the cause of backup problems, but it would certainly cause performance issues.

Doing some independent research on the subject, we found this link that sparked some curiosity, although I realize it doesn’t fit the exact description of the issue you’re dealing with:

https://www.synoforum.com/threads/inotify-doesnt-appear-to-be-working-in-my-ds1819.4622/

Let me know if you’re able to test out the above, and what results may come from it. :pray:

@benjamin So I have almost 14K tracks in today. Do I need to increase my inotify limit to support all 14K tracks + head room for growth?

Also, you still haven’t provided any guidance on restoring from backups and verifying my backup is not corrupted.

Increased max inotify to 65535:

sysctl -n -w fs.inotify.max_user_watches=65535

Hopefully that is enough :slight_smile:

Hi @Aaron_Turner,

In case this slipped through the cracks:

The database mentioned above was indeed corrupt. I have found recent corruption in your most recent database as well:

07/25 02:01:30 Warn: [backup] syncprepare, manifest file is corrupt: b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98 - System.NullReferenceException: Object reference not set to an instance of an object.

Can you temporarily migrate the core to a different machine and test restoring a backup and see if issues persist? I see you have a macOS device that may be useful to test on.

Let me know, thanks!

Hi @Benjamin ,

I’m still confused how to determine if there is database corruption. You said you looked in my logs. What should I be looking for to see if it was successful or not?

Hi @Aaron_Turner, perhase I can help here.

  • Roon checks for database corruption before a backup is made to ensure backup integrity.
  • Roon checks for backup for corruption during a backup restore, to ensure the new DB is ok.

If the backup or restore operation is performed with no reported errors all is ok at that time.

After a successful restore, if the DB is then reported (at run time) as corrupted again then assuming no power interruptions have occured it’s likely that either the system’s NVMe SSD or RAM is failing.

If this where my system I’d be looking at first swapping out these probably the NVMe first following by the RAM.

What @benjamin is suggesting it to run the Core of a different machine as test if all is well on that new machine it’s another indicator of a hardware fault on your Synology.

Hope this helps.

1 Like

Hi @Carl ,

So I guess there is some kind of confusion…

I’ve done a successful restore without any errors being reported in the Roon UI. So does that mean my DB/backup is not corrupted and all is good? The understanding I had was that there was some kind of corruption that Benjamin was seeing in my logs that wasn’t apparent in the UI.

Hi @Carl

I have had serious problems with my Roon Server database on my new server, due to faulty processor , that now has been replaced.

I just wanted to verify once more that what you are stating here is 100% correct?
There are numerous topics in the forum where reference is made to backups containing errors that have been present over the course of months.

This conflicts with your statement, as in that case Roon would never have ‘finished’ these backups succesfully.

Or am I missing something here.

I really hope your statement is correct, as I have been able to make backups to 3 different locations, all succesfully . And I really would like to have & keep a reliable system from now on.

Thanks for your guidance

Dirk

There was definitely this:

As stated there, a database can be valid at backup time and still be corrupted later due to hardware issues. Plus, of course there is never a “100%” because bugs can happen and a corruption not be detected. I seem to recall that there was one other fix after build 880 that resolved one such issue, but I can’t find it now. I don’t think anyone did a formal mathematical proof that no other such bugs exist (because this would be unreasonable)

2 Likes

See @Suedkiez’s post above and read the linked Roon release notes.

Did you check the dates of those topics?
If they are before 2021 ignore them.
If 2021 or later, can you share links to some examples?

1 Like

Hi Carl

I am fine with the response that @Suedkiez provided, seems that this issue is more or less under control.

So thank you for your clear explanation in your previous response.

It would be nice though if Roon would make a proper DB validity test, available to us users.
Probably wishfull thinking.

Dirk

I’m not sure about that as the DB is validated every time a backup is made.

And it’s good practice to have those scheduled frequently.

I have my system scheduled to take a backup in early hours each night, retaining 99 copies.

I have also setup a monthly backup, to a different location, for long term (years) security.

FWIW, I’m still hoping to hear back from @Benjamin, since he seems to be saying that my successful restore from backup (without an error messages) was corrupted and my DB is currently corrupt.

Hey @Aaron_Turner,

Indeed your current database appears to still be corrupt per my previous reply. And yes that is correct - Roon checks the database for corruption at each backup. In logs there is mention about validation when this happens.

However, the corruption type you’re experiencing isn’t typical corruption. A corrupt manifest file is more-so related to the transfer of the backup itself. If the backup is not transferred properly to its destination, this type of corruption can occur.

In this case, it wont matter if Roon says the backup itself is OK at the time of it being backed up since it has no control offer the transfer of the database itself. This log snippet breaks things down a bit more:

02/09 02:02:20 Trace: [backup] writing backup manifest file b_20230209100129_a30420de-50e9-497a-abbd-82fb7d2f31ef
02/09 02:02:20 Trace: [backup] found 35 backups (we need to have at maximum 31)
02/09 02:02:20 Trace: [backup] computing which files are from the 4 excessive backups found
02/09 02:02:20 Warn: [backup] retrieving backup manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230103100130_2f86af5c-6d4d-45e4-a7e6-51ed7a37ffd1
02/09 02:02:20 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230103100130_2f86af5c-6d4d-45e4-a7e6-51ed7a37ffd1: System.NullReferenceException: Object reference not set to an instance of an object.

You could try setting up a new backup location, but it may also be an issue with the NAS itself writing the file.

Thanks, that’s useful information @benjamin. So now I know what to look for, it seems that error is no longer happening. Does that mean my backups are okay?

$ grep  -F 'Critical: [backup]' *
RoonServer_log.08.txt:07/26 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.08.txt:07/26 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230313090130_a122ac01-1e49-4339-a4b7-279ec837598e: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.08.txt:07/26 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230621090130_09c44163-9350-447e-9bd7-838843b9b0f0: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.08.txt:07/26 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230714090130_4c1c3303-3ef9-448a-a627-5d5529a26397: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.10.txt:07/23 02:00:53 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.10.txt:07/23 02:00:53 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230313090130_a122ac01-1e49-4339-a4b7-279ec837598e: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.10.txt:07/23 02:00:53 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230621090130_09c44163-9350-447e-9bd7-838843b9b0f0: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.10.txt:07/23 02:00:53 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230714090130_4c1c3303-3ef9-448a-a627-5d5529a26397: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/19 02:00:43 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/19 02:00:43 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230313090130_a122ac01-1e49-4339-a4b7-279ec837598e: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/19 02:00:43 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230621090130_09c44163-9350-447e-9bd7-838843b9b0f0: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/19 02:00:43 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230714090130_4c1c3303-3ef9-448a-a627-5d5529a26397: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/20 02:02:15 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/20 02:02:15 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230313090130_a122ac01-1e49-4339-a4b7-279ec837598e: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/20 02:02:15 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230621090130_09c44163-9350-447e-9bd7-838843b9b0f0: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.14.txt:07/20 02:02:15 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230714090130_4c1c3303-3ef9-448a-a627-5d5529a26397: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.17.txt:07/18 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230312100000_9530cb7f-861e-4b93-8c0e-c17f2ee5ae98: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.17.txt:07/18 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230313090130_a122ac01-1e49-4339-a4b7-279ec837598e: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.17.txt:07/18 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230621090130_09c44163-9350-447e-9bd7-838843b9b0f0: System.NullReferenceException: Object reference not set to an instance of an object.
RoonServer_log.17.txt:07/18 02:00:46 Critical: [backup] failed to parse manifest for cleanup: AttachedDir:/backup/9892a01e-ff48-b763-06c7-85996c98f7f7/xx/b_20230714090130_4c1c3303-3ef9-448a-a627-5d5529a26397: System.NullReferenceException: Object reference not set to an instance of an object.

$ ls -l
total 90136
-r-xr-xr-x+ 1 root root 4387368 Jul 31 02:00 RoonServer_log.01.txt
-r-xr-xr-x+ 1 root root 3894118 Jul 30 02:00 RoonServer_log.02.txt
-r-xr-xr-x+ 1 root root 8454198 Jul 29 20:32 RoonServer_log.03.txt
-r-xr-xr-x+ 1 root root 3170382 Jul 29 20:32 RoonServer_log.04.txt
-r-xr-xr-x+ 1 root root 5244275 Jul 29 02:00 RoonServer_log.05.txt
-r-xr-xr-x+ 1 root root 2863580 Jul 28 02:00 RoonServer_log.06.txt
-r-xr-xr-x+ 1 root root 1891751 Jul 27 02:01 RoonServer_log.07.txt
-r-xr-xr-x+ 1 root root 4732283 Jul 26 16:03 RoonServer_log.08.txt
-r-xr-xr-x+ 1 root root 3688243 Jul 25 02:01 RoonServer_log.09.txt
-r-xr-xr-x+ 1 root root 8452440 Jul 23 21:42 RoonServer_log.10.txt
-r-xr-xr-x+ 1 root root 4765040 Jul 22 18:03 RoonServer_log.11.txt
-r-xr-xr-x+ 1 root root 3662217 Jul 22 02:00 RoonServer_log.12.txt
-r-xr-xr-x+ 1 root root 1606390 Jul 21 02:00 RoonServer_log.13.txt
-r-xr-xr-x+ 1 root root 8308440 Jul 20 06:45 RoonServer_log.14.txt
-r-xr-xr-x+ 1 root root  126295 Jul 18 19:23 RoonServer_log.15.txt
-r-xr-xr-x+ 1 root root 8439382 Jul 18 19:22 RoonServer_log.16.txt
-r-xr-xr-x+ 1 root root 8439921 Jul 18 07:59 RoonServer_log.17.txt
-r-xr-xr-x+ 1 root root 8349914 Jul 17 18:23 RoonServer_log.18.txt
-r-xr-xr-x+ 1 root root   93899 Jul 17 14:00 RoonServer_log.19.txt
-r-xr-xr-x+ 1 root root   88561 Jul 17 13:58 RoonServer_log.20.txt
-r-xr-xr-x+ 1 root root 1593066 Jul 31 13:11 RoonServer_log.txt

Hey @Aaron_Turner,

My extreme apologies for the delay in getting back to this thread. Are you still running into issues?

So I just did a test and restored the backup without error. Increasing the inotify limits has seemingly solved the root cause of the issues as best as I can tell. Honestly, this whole thing is very black box and hard to tell.

1 Like

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.