Since my last post, I another restart sadly permanently broke playback. So, 12 out of 13h no playback (local nor Qobuz) is succeeding. Another restart just now provides no relief.
UPDATE Random observations to add:
ROCK refuses to reboot some of the time, while Roon is failing to playback on some device.
Then when rebooting succeeds, it still fails playback.
Meanwhile, ROCK being inscrutable makes it impossible for me to track any system health indicators beyond the Roon log files.
Loading the home screen takes about 2 minutes after a restart. O darn, itâs still not fully loaded. It apparently loads top-down in sequence.
History also fails to load, maybe Iâll let that sit for some time too: Nope. Not even after 10 minutes.
No rescans stuck this time. Maybe itâs because of the failed backup? Redirecting my backup to a different location⌠Just gets stuck doing database snapshot. Yup. Restarting the client shows it stuck in exactly the same state as usual: no playback, no artist page load, no history page load. No new backups have been made.
To rule out complete hardware failure, let me move this entire party to different hardware. It would be nicer if I still had my old linux installation so I could actually diagnose it live.
@benjamin Thatâs why I told you guys⌠I assumed you got that the first time you replied
Iâve been running with the faulty DIMM removed, but without restoring any database. Looks fine now. However, I realized that a LOT of work went into tidying up my Roon library aside from the parts I already mentioned before (playlists, MUSE profiles, history and statistics). So Iâll probably select a suitably old backup (I should have one from early 2024) and re-test with that somewhere next week. If that works ( ) then I should be out of the woods.
Iâve run with a new database-from-scratch for a few days, realizing that I was missing a LOT of album edits (identifying box-sets and customizing cover-art between different re-releases) as well as all the play-lists and MUSE profiles.
So I went back to the recent oldest back-up I had (December 25th 2024). It ran well for about 8d, but then disaster struck and the database was found corrupt at a restart.
So now, Iâm trying my luck with a very old backup that I found lingering on an old medium (Jan 13th 2023). Thatâs really old. But at least contains some of the work Iâve done.
Sadly, I still have intermittent glitches
covers not loading until I zoom the art in Album View
specific library albums missing (e.g. âViola Sonata, Shostakovichâ)
the search for that specific string consistenly crashes the Windows client, locks up the Android client (until force closed).
Worse, both these clients got stuck in a crash loop because a restart brings back the same search screen. The only way to get out was to reinstall the clients
Interestingly, the iOS client executes the same search without problems (literally repeated it from the ârecent searchesâ drop down, no server restarts in between) and shows the album that was missing from the play history (no album art, âalbum not foundâ when clicked)⌠Even AFTER that, the Android and Windows clients still kept crashing on the same search.
I donât know what to do here.
[PS. Iâve repeated about an hour of memtest86 a few days back just to make sure no new failures had appeared with the remaining DIMM - it ran completely clean, canât be paranoid enough]
Sorry to hear youâre still running into issues. Could you please share a set of Roon logs from the issue windows machine? Here are the directions found here and send over a set of logs to our File Uploader.
@benjamin Thanks for the interest. Iâve not been able to reproduce the crashes (I think (?) Iâve updated server from 1510 to 1517 in the mean time). However, the weird album art glitches still present themselves:
Weâve taken a look at the diagnostics you shared from the client-side and compared them to logs from RoonServer during the corresponding time period. Here are the patterns we observe:
RoonServer is logging generic network failures when requesting content from Qobuzâs servers. Qobuz playback requests occasionally time out. This occurs when playing to System Output.
Network reachability changes interrupt the connection between RoonServer and the upstream servers responsible for certain background processes.
Certain image requests from Qobuz or Roonâs discovery service return 404. There arenât any caching issues.
Roon doesnât show signs of corruption events during indexing.
Reviewing this thread, it seems youâd like to target either RAM/resource constraints or latent corruption as the underlying problem here and not focus on loading failures.
For background: Roon attempts to detect corruption âon the flyâ during background indexing by checksumming every page in the database, on a single core. That doesnât guarantee that tiny changes canât accumulate.
The only way to ensure that a DB doesnât contain latent corruption at this point would be to fully recreate it; we recognize how time consuming and frustrating this can be. However, a current hardware failure inducing corruption independently in each restorated Backup is also a possibility.
What do you see in the RoonOS Web UI when you reboot and it fails? What happens if you reinstall RoonOS itself?
Also, we understand youâre resistant to troubleshoot your network. But what is the basic topology serving this ROCK?
I donât remember the exact message, but the web page dryly informs me that âthe roon server failed to rebootâ (or similar message). Iâm not currently considering re-implementing ROCK (since it doesnât give me the basic means of system monitoring or maintenance; I prefer to keep my system up-to-date and be able to monitor for health).
Not really hesitant, but I had the feeling it was unlikely culprit. I think the finding of the faulty RAM DIMM confirmed that hunch. The network topology is really simple: itâs all directly-attached 1Gbit LAN cables to the source router (Asus RT-AX92U, which is 2.5Gbit capable) connected to a fiber-optic WAN operating 1Gbit network.
The LAN is rock solid (the non-wired clients use a WiFi mesh with 3 WiFi6 access-points, bridging the the same subnet as the LAN, but the server and desktop are connected by cable anyways).
Thatâs good to hear. Iâll probably keep running with the current DB then. I have a suspicion that particular album I focused on to show âmissing artworkâ is somehow no longer available on Qobuz. The fact that the search triggered a crash is no longer reproducible so Iâll write it off as a freak incident.
The only reachability changes should be:
A daily router restart at 4:30am (the router firmware has a bug in their TZ database so for one week that would appear as 3:30am in the logs). Iâm unlikely to be listening at that time, but it does happen
Weâve gotten an upgrade enabling 8Gbit upstream fiber, which left us with no internet for a few hours on April 1st (not a joke, or at least not a good one )
I might have tinkered a bit with the firewall setup (to allow for ARC) but that was longer ago, I think. Iâll leave the network config stable for a few weeks at least, so we can see that the only âholesâ are at 4:30am now.
slight update to keep this alive.
Another update made things unstable again. Now went back to regenerating entire database from scratch Will post conclusions later