Drives do a monthly health check and just passed with flying colors.
My (rather educated guess) is that when the Linux kernel on the NAS starts having issues (it’s definitely related to CPU utilization and makes the system unstable and somewhat unresponsive), it is preventing writing/updating files during backups. But rather than ending up with an incomplete/failed backup, it is corrupting the metadata files which seem to prevent Roon from knowing these are backups.
The expected behavior is this should operate like a relational/SQL database WAL (write ahead log) which ensures a transaction (or in this case a backup) either completes successfully or does not “exist” at all rather than ending up making all previous backups unavailable (worst possible outcome). Luckily, I’m pretty tech savvy and was able to figure out what was going on!
The real first step here is to solve this hard power cycle problem. Who knows what problems that causes with the data on the NAS. The hard power cycle could prevent cached data from being written to the drives. Absolutely awful if you are running any version of RAID.
Are the file permissions by any chance wrong for that directory? It sounds like Roon might not have access to write to those files in particular.
It may be interesting to see what Roon logs show when the backup runs. Next time you encounter the issue, can you upload a manual set of your NAS Core logs (accessed via these instructions) to the following link and let us know?
Roon is reporting my backups are successfully completing and as I described above, once I fix the two metadata files I can restore. As for “are the permissions correct” can you tell me what the correct permissions are??? I don’t know what Roon expects.
No, I am not aware of the changes impacting your case. I looked over the logs and I’m seeing errors around cleaning up previous backups. I wonder if you create a new folder and set it as the new backup directory, does that work without issue?
So I shutdown Roon, moved /backup out of the way and created a new directory in it’s place.
Did a manual backup and it created /backup/RoonBackup/9892a01e-ff48-b763-06c7-85996c98f7f7 which wasn’t expected… the old backups were in /backup/9892a01e-ff48-b763-06c7-85996c98f7f7 Not sure why the extra directory or if that is a concern???
You said there were errors… can you tell me what errors to look for in the logs to make sure this fixes it? There is nothing in the UI indicating any errors and when I dig through the logs I only see some errors like:
Are you still seeing this issue? If so, then yes, can you please upload a new log set and let us know? I saw a strange error related to your backup and I’m curious if it’s still the same one or if there’s any new information in the logs. Thanks!
Well I’ve disabled the process on my NAS which was causing the kernel to hang, so I’m not having to hard restart my NAS. This was the 2nd time I’ve found a user-land process to cause the issue (seems related to high CPU utilization).
But yeah, the only way I know of recreating this issue is to cause the NAS to hang/crash. I’d rather not do that to be honest The purpose of this post was to inform Roon of this bug so hopefully you could fix it.
Ah, so you did give my suggestion some thought…the one about fixing the problem causing the need for hard power cycle…even though you insulted it when I gave it.
So, I have done a hard power cycle of a Roon Core running on linux many times and the backups were just fine. I have done the same on macOS running Roon. Never have I had issues with the database.
So why do you think this is a problem with Roon instead of Synology? Maybe Synology are doing something with the caching of written data that allows Roon to think the backups are done and data written when, in fact, that is not the case. Maybe the writes were cached by the Synology OS and never actually written to the storage media.
Look @musicjunkie917, you don’t know what is going on. And I’m going to guess you probably know less about “Synology OS” which is Linux than I do. I was literally running and hacking on the Linux kernel in the early 90’s before the 1.0 kernel was released. Not saying I’m the most knowledgeable person by any means, but I definitely have more information about what is going on than you do.
Which is to say, your suggestions to a question I did not ask is not helpful.
For the record, Synology support claimed this was caused by 3rd party RAM. So I went out and spent over $700 on the official Synology branded RAM which has the exact same specs as what I had installed. Guess what… it happened again and I’m continuing to work with them to see if they can fix the issue.
In the mean time, it would be great if Roon could be as reliable as the other software I have running on my Synology- software with various databases like PostgreSQL and SQLite which don’t experience any ill effects when this happens.
I have been hacking on various systems and operating systems since the 70’s. I have worked in the computer industry for over 40 years. I started cracking copy protection schemes on Apple II’s and went on to developing software in assembly language and a variety of higher level languages. I worked at Apple for a decade. I ran my own company with hundreds of servers running various Unix-based operation systems in a data center. I’ve run storage pools using a variety of RAID and ZFS implementations. Now that we have our résumés (or CVs) out of the way…
Do you know more about Synology than the folks at Synology? Are they saying the problem is with Roon or is the problem the hard crashes and restarts? I’ll tell you as a data center professional, Synology was not held in high regard in that market segment.
Is this type of problem happening with Roon on all Linux-based systems? Or just the Synology systems? Does it happen on Windows or macOS-based systems? If not, why is Roon reliable on these systems but not Synology systems? What’s different?