When I was troubleshooting this while the issue is happening, I also don’t see anything in those logs which is why I stopped checking them. How do you propose we move forward? To summarize, when the issur starts:
The roon service is still started but with a state of “not responding” as shown in some of the logs snippets in my previous posts.
While the service is in this state, it will consume a lot of CPU and RAM resources until OOM killer kills the service.
Reinstalling roon without deleting /var/roon does not solve the issue.
Reinstalling roon after deleting /var/roon and starting fresh (no database restore) works temporarily until it starts happening again after around 1.5 days of uptime.
danny
(Danny Dulai)
April 11, 2022, 6:39pm
#53
I need the newest logs from when the problem occurs. The logs you sent show me a functioning RoonServer.
Ok. You posted a little bit too late since the issue happened 30 minutes before your first post here. Now that I applied the workaround (reinstall plus delete /var/roon and restore from backup), I will need to wait for approximately 1.5 days for the issue to happen again. I’ll wait and make sure it comes to a point where OOM killer kills the service and I will get you the updated Logs folder again. While I’m at it, do you need any other set of logs for when the issue happens again?
danny
(Danny Dulai)
April 12, 2022, 2:30am
#55
/var/log/mesages or journalctl is good
Ok. Right now, one endpoint is streaming off of the Core and it’s taking a lot of RAM. Is this normal?
This might be related to this: Memory leak in Roon?! [See Staff Post] - #70 by Bill_Janssen
The person in the last post is saying his was using 48GB of RAM! That is insane.
@danny
Here are my roon logs so far:
Shared with Dropbox
The service didn’t stop yet but the RAM usage is messed up. It got to a point that it was at 90% (of 32GB) usage even without any endpoint streaming. I see tons of Tidal errors in there but not sure if that’s what’s causing it.
@danny ok, OOM killer just kicked in to kill roon. And as expected, the service won’t start anymore (not responding message). Here are the fresh set of logs:
Shared with Dropbox
/var/log/messages
: Dropbox - messages - Simplify your life
journalctl -u roonserver.service -b
: Dropbox - journalctl.txt - Simplify your life
These logs should show all events from the time I reported the high RAM usage a few hours ago up to the point where OOM killer kicked in. The physical RAM usage is very evident in the Roon logs and is mostly at 28GB or so.
1 Like
noris
April 14, 2022, 3:52am
#63
Hi @Kevin_Mychal_Ong ,
Your var-log is showing quite a few hard disk errors, perhaps this is attributing to the memory leak:
Apr 10 03:14:35 nuc kernel: [738485.103974] EXT4-fs (sda1): error count since last fsck: 138
Apr 10 03:14:35 nuc kernel: [738485.103982] EXT4-fs (sda1): initial error at time 1629819004: htree_dirblock_to_tree:1003: inode 70516840
Apr 10 03:14:35 nuc kernel: [738485.103988] EXT4-fs (sda1): last error at time 1647358208: ext4_empty_dir:3005: inode 95815045
You may want to run a disk check, reinstall the OS and/or use a different hard drive.
Is it pretty common for hard disk errors to cause memory leaks? Shouldn’t it be affecting other things on my server too if that is the case?
@noris
Also, sda1 is the partition on my storage SSD where my local music files are. It’s not the OS disk (nvme drive) so I don’t think reinstalling the OS will do anything to the errors.
root@nuc:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 4.4M 3.2G 1% /run
/dev/nvme0n1p2 23G 5.3G 17G 25% /
tmpfs 16G 4.0K 16G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 1.8T 999G 742G 58% /mnt/storage
/dev/nvme0n1p5 1.9G 6.4M 1.7G 1% /tmp
/dev/nvme0n1p3 9.2G 3.3G 5.4G 38% /var
/dev/nvme0n1p6 193G 26G 158G 14% /home
/dev/nvme0n1p1 511M 18M 494M 4% /boot/efi
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/7e4b7bc557d6dbebe65848a8e647c81480fa8607e8f81e7e127051b37004096a/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/b99eec5fc2214896f6f23a788607149f579d217c15c6c11e996677c031313c2f/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/505205bf611016752431128729b71ffbc2f13139f53f7ddcb7ed6b77befd7f3a/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/25f374918223be019b25f35f21a053d8b4b455ddaada0259437c215bd85c78cc/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/6b47ff466d400750ad8cd9dffb86e415426a3f6b2cfc81e43befdf4fbb744304/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/dfd78587070db6673a0b6c32e6cb38cc2b3a00ffd8cb272252f473a7cb96730e/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/bc93535b62129f20b981d88af9990350a47ba40e9fc6a6f5702f4e209f1b2871/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/8c75355d40afdac24635870bc92fae92927b02e020718c1ec63887ea85fba062/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/71c1393dd3d2b40e72bc5c6081d1123c9635d2fcd012d58b39e7727f8f38d80b/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/8eb664d4fcce3c4afe32e302a13763884dbc170369c3bb8d14b0a21dfe26c55c/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/c9ce4eb7c674ebc8fcfae7f629c8c8097d3600603668d7655d0b2e6834343cbd/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/ca999edab1cd3beb0bbd53795f02e6c768ee17108f5831ce79c978f20df0f178/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/73034525e50a1794c6ea849b0a1e595cca9ceaf592243c2f969025eb39a0653a/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/97a5e015299c33a497f3d4649333ea0981abb49f7f60b2646b879c894f15273b/merged
overlay 193G 26G 158G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/a0b81d42d80ede5627b0f29f9352c9664ae8e3a33791269903188060a9ff2a85/merged
tmpfs 3.2G 0 3.2G 0% /run/user/1000
synology.home.arpa:/volume1/data 104T 65T 40T 63% /mnt/data
My experience only is.
Once you start seeing errors somewhere, it will snowball.
If the base OS isn’t stable everything depending on that OS can’t be predictable.
I understand that. But we’re not seeing OS errors here, are we?
I’ll go back to reading along.
If the sda drive only holds music files, I’d simply disconnect it and see if Roon starts up and runs without memory problems. That your Linux reports problems with sda seems to be clear, and what effect that may have on the working of Roon remains to be tested. In any case, sda should be revised and probably replaced.
3 Likes
I wasn’t trying to be offensive. I just want to base my next course of action on facts. If any of the logs point me to the OS being corrupted or anything, then I’m all for fixing that. But if the logs are pointing to the storage disk (which only holds my local music files) having errors, then reinstalling a whole OS won’t really do anything to solve those errors.
Yes, that makes sense and this will be my next course of action.
EDIT: I already ran fsck against /dev/sda1
and fixed a couple of corrupted directories. I’m guessing this is because there are lots of albums in there that are in Chinese. But anyway, I have it clean now:
root@nuc:~# fsck -y /dev/sda1
fsck from util-linux 2.36.1
e2fsck 1.46.2 (28-Feb-2021)
/dev/sda1: clean, 1113099/122101760 files, 270035392/488378385 blocks
I still kept it in an unmounted state for now and see how run the Roon Core behaves without the drive.
I’m not seeing any changes to the RAM usage after I unmounted the /dev/sda1
partition from the system. I’ll post another set of logs when the OOM killer kicks in to kill the roonserver service.
root@nuc:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 4.1M 3.2G 1% /run
/dev/nvme0n1p2 23G 7.0G 15G 33% /
tmpfs 16G 4.0K 16G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p3 9.2G 4.1G 4.6G 48% /var
/dev/nvme0n1p5 1.9G 6.3M 1.7G 1% /tmp
/dev/nvme0n1p6 193G 25G 159G 14% /home
/dev/nvme0n1p1 511M 18M 494M 4% /boot/efi
synology.home.arpa:/volume1/data 104T 65T 40T 63% /mnt/data
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/25f374918223be019b25f35f21a053d8b4b455ddaada0259437c215bd85c78cc/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/c9ce4eb7c674ebc8fcfae7f629c8c8097d3600603668d7655d0b2e6834343cbd/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/8eb664d4fcce3c4afe32e302a13763884dbc170369c3bb8d14b0a21dfe26c55c/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/6b47ff466d400750ad8cd9dffb86e415426a3f6b2cfc81e43befdf4fbb744304/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/505205bf611016752431128729b71ffbc2f13139f53f7ddcb7ed6b77befd7f3a/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/b99eec5fc2214896f6f23a788607149f579d217c15c6c11e996677c031313c2f/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/dfd78587070db6673a0b6c32e6cb38cc2b3a00ffd8cb272252f473a7cb96730e/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/ca999edab1cd3beb0bbd53795f02e6c768ee17108f5831ce79c978f20df0f178/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/73034525e50a1794c6ea849b0a1e595cca9ceaf592243c2f969025eb39a0653a/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/a0b81d42d80ede5627b0f29f9352c9664ae8e3a33791269903188060a9ff2a85/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/35b60600b6e354127b36965bb2930545fc4356cd7d9e464713fd7122b9a9250e/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/71c1393dd3d2b40e72bc5c6081d1123c9635d2fcd012d58b39e7727f8f38d80b/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/97a5e015299c33a497f3d4649333ea0981abb49f7f60b2646b879c894f15273b/merged
overlay 193G 25G 159G 14% /home/kevindd992002/docker/var-lib-docker/overlay2/8c75355d40afdac24635870bc92fae92927b02e020718c1ec63887ea85fba062/merged
tmpfs 3.2G 0 3.2G 0% /run/user/1000
@danny @noris The OOM killer just killed the process a few hours ago and here’s a new set of logs:
Shared with Dropbox
Shared with Dropbox
Shared with Dropbox
Again, sda1 is unmounted here so you should not see any errors related to it. The last error I saw in /var/log/messages
for /dev/sda1
was yesterday before I unmounted and ran fsck
against it:
Apr 14 21:20:18 nuc kernel: [184845.834857] EXT4-fs (sda1): last error at time 1647358208: ext4_empty_dir:3005: inode 95815045
If you grep “nvme” (my OS disk), you shouldn’t see any errors too:
root@nuc:~# cat /var/log/messages | grep nvme
Apr 11 04:57:11 nuc kernel: [ 2.021043] nvme nvme0: pci function 0000:3a:00.0
Apr 11 04:57:11 nuc kernel: [ 2.030490] nvme nvme0: 12/0/0 default/read/poll queues
Apr 11 04:57:11 nuc kernel: [ 2.033344] nvme0n1: p1 p2 p3 p4 p5 p6
Apr 11 04:57:11 nuc kernel: [ 3.604254] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 04:57:11 nuc kernel: [ 3.905338] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
Apr 11 04:57:11 nuc kernel: [ 4.065875] Adding 1000444k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:1000444k SSFS
Apr 11 04:57:11 nuc kernel: [ 4.107595] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 04:57:11 nuc kernel: [ 4.123826] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 04:57:11 nuc kernel: [ 4.216924] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 20:02:49 nuc kernel: [ 1.397951] nvme nvme0: pci function 0000:3a:00.0
Apr 11 20:02:49 nuc kernel: [ 1.412712] nvme nvme0: 12/0/0 default/read/poll queues
Apr 11 20:02:49 nuc kernel: [ 1.415762] nvme0n1: p1 p2 p3 p4 p5 p6
Apr 11 20:02:49 nuc kernel: [ 2.971730] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 20:02:49 nuc kernel: [ 3.250802] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
Apr 11 20:02:49 nuc kernel: [ 3.406679] Adding 1000444k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:1000444k SSFS
Apr 11 20:02:49 nuc kernel: [ 3.439309] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 20:02:49 nuc kernel: [ 3.440938] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 11 20:02:49 nuc kernel: [ 3.441442] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 12 17:59:32 nuc kernel: [ 1.396130] nvme nvme0: pci function 0000:3a:00.0
Apr 12 17:59:32 nuc kernel: [ 1.412406] nvme nvme0: 12/0/0 default/read/poll queues
Apr 12 17:59:32 nuc kernel: [ 1.415251] nvme0n1: p1 p2 p3 p4 p5 p6
Apr 12 17:59:32 nuc kernel: [ 2.955983] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 12 17:59:32 nuc kernel: [ 3.304213] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
Apr 12 17:59:32 nuc kernel: [ 3.471838] Adding 1000444k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:1000444k SSFS
Apr 12 17:59:32 nuc kernel: [ 3.496214] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 12 17:59:32 nuc kernel: [ 3.496695] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 12 17:59:32 nuc kernel: [ 3.499688] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 01:52:55 nuc kernel: [ 1.437639] nvme nvme0: pci function 0000:3a:00.0
Apr 15 01:52:55 nuc kernel: [ 1.453691] nvme nvme0: 12/0/0 default/read/poll queues
Apr 15 01:52:55 nuc kernel: [ 1.456548] nvme0n1: p1 p2 p3 p4 p5 p6
Apr 15 01:52:55 nuc kernel: [ 3.012376] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 01:52:55 nuc kernel: [ 3.286245] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
Apr 15 01:52:55 nuc kernel: [ 3.427827] Adding 1000444k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:1000444k SSFS
Apr 15 01:52:55 nuc kernel: [ 3.466271] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 01:52:55 nuc kernel: [ 3.468530] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 01:52:55 nuc kernel: [ 3.470367] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 02:14:21 nuc kernel: [ 1.382223] nvme nvme0: pci function 0000:3a:00.0
Apr 15 02:14:21 nuc kernel: [ 1.397743] nvme nvme0: 12/0/0 default/read/poll queues
Apr 15 02:14:21 nuc kernel: [ 1.400574] nvme0n1: p1 p2 p3 p4 p5 p6
Apr 15 02:14:21 nuc kernel: [ 2.967067] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 02:14:21 nuc kernel: [ 3.296254] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
Apr 15 02:14:21 nuc kernel: [ 3.452466] Adding 1000444k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:1000444k SSFS
Apr 15 02:14:21 nuc kernel: [ 3.478669] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 02:14:21 nuc kernel: [ 3.480230] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Apr 15 02:14:21 nuc kernel: [ 3.480586] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Are these info enough for us to say that this isn’t an OS issue?
1 Like
noris
April 15, 2022, 8:47pm
#74
Hi @Kevin_Mychal_Ong ,
Thanks for the further checks here.
It is possible that something about that drive impacted the Roon database, can you please confirm, if you set up the fresh database and hold off on importing content from the sda1 drive, does Roon still have the OOM issue?
Please confirm with a small library of content not on the drive, and then after confirming try to import content from that sda1 drive and see if the system is still stable or if the OOMs start then.