Ubuntu 16.04.3 / RoonServer/Unhandled Exception error


(Steven Neighorn) #23

Hi Noris, sorry for the delay. Well, your suggestion puts me in a bit of a spot. The server that is running on is a general purpose server that has a lot of interfaces active, including a 2gig bonded port and a couple VPN tunnels. I could move it to a completely new server that is running ubuntu 18 and test it there. I guess that is probably what you would suggest? I wanted it on the server it is on because it is pretty fast hardware but I would really like to get this figured out. I realize you normally prefer a dedicated server for this purpose hence no issues with a lot of interfaces. I can try that in the next few days and let you know how it goes. Do I need to shut off the existing server to bring up the new one or can they both be active at the same time?

list of interfaces, FYI:
bond0
eno1
eno1:1
eno1:2
eno1:3
eno1:4
eno1:5
eno1:6
eno1:7
eno1:8
eno1:9
eno2
eno2:1
eno2:2
eno2:3
eno2:4
eno2:5
eno2:6
eno2:7
eno2:8
eno3
eno4
lo
tun0
tun1


(Noris) #24

Hi @Steven_Neighorn,

Last time I spoke to QA they indicated that this behavior was similar to another case with regards to the multiple network interfaces and getting this variable eliminated will be our best step forward here.

Yes, this suggestion would be a good diagnostics tool.

You can only have one active core per Roon subscription, so you can feel free to leave Roon installed on both but you will be presented with an “unauthorize” screen once you try to log in to the new core.

– Noris


(Steven Neighorn) #25

Thank you again Noris. I shutdown the U16 server, and installed it on a Ubuntu 18 system. I did two things, due to the complexity of my network - I have an “office” network and a “music/storage” network. So this new system did have two interfaces. I thought, just guessing and hoping, that maybe the roon server was crashing due to the number of interfaces I had on the U16 box. So this new box just had
two, one for management/admin duties, and one for the NFS mounted storage device. Everything came up and I went ahead and added the NFS music library in. It crashed 3 times in 12 hours, with the exact same error as before. So what I have just now done is completely remove any extra interfaces, so there is just one now, and that interface has access to the NFS storage and I can get to it from the Roon application as well. So now we are at the place you wanted me to test: one interface only. I will let you know how it goes - it will either continue to crash or it will now run and I know the issue is what you and your team suspected: issues with servers having multiple interfaces. Thank you.


(Steven Neighorn) #26

Hi Noris, very sorry to report that the crashes continue. Same errors as before. I am getting 1-2 a day now.

Apr 15 12:38:09 qiclab04 start.sh[3194]: Unhandled exception NetworkError at Os/OsWrapper.cpp:91 in thread SsdpListenerM
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0x14ae50) [0x7f8f9810de50]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(_ZN8OpenHome9ExceptionC2EPKcS2_j+0x25) [0x7f8f980b1b85]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0x14a01a) [0x7f8f9810d01a]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0x12d630) [0x7f8f980f0630]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0xbe8cf) [0x7f8f980818cf]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0xbf385) [0x7f8f98082385]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0xb29be) [0x7f8f980759be]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0xb36e1) [0x7f8f980766e1]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0x1414d9) [0x7f8f981044d9]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /opt/RoonServer/Appliance/libohNet.so(+0x14a978) [0x7f8f9810d978]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f90e316a6db]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f90e2c7b88f]
Apr 15 12:38:09 qiclab04 start.sh[3194]: /proc/self/maps:
Apr 15 12:38:09 qiclab04 start.sh[3194]: 00400000-0096e000 r-xp 00000000 fd:00 54394973 /opt/RoonServer/RoonMono/bin/mono-sgen
Apr 15 12:38:09 qiclab04 start.sh[3194]: 00b6d000-00b71000 rw-p 0056d000 fd:00 54394973 /opt/RoonServer/RoonMono/bin/mono-sgen
Apr 15 12:38:09 qiclab04 start.sh[3194]: 00b71000-00bc8000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 01088000-01bb3000 rw-p 00000000 00:00 0 [heap]
Apr 15 12:38:09 qiclab04 start.sh[3194]: 4122a000-4123a000 rwxp 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 41879000-41f75000 rwxp 00000000 00:00 0

Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e392b000-7f90e3b2a000 —p 0019d000 fd:00 4194776 /lib/x86_64-linux-gnu/libm-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b2a000-7f90e3b2b000 r–p 0019c000 fd:00 4194776 /lib/x86_64-linux-gnu/libm-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b2b000-7f90e3b2c000 rw-p 0019d000 fd:00 4194776 /lib/x86_64-linux-gnu/libm-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b2c000-7f90e3b53000 r-xp 00000000 fd:00 4194706 /lib/x86_64-linux-gnu/ld-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b53000-7f90e3b56000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b56000-7f90e3b73000 r–p 00000000 fd:00 54395187 /opt/RoonServer/Appliance/Base.dll
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3b73000-7f90e3be1000 r–p 00000000 fd:00 54395186 /opt/RoonServer/Appliance/RoonBase.dll
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3be1000-7f90e3c50000 r–p 00000000 fd:00 54395194 /opt/RoonServer/Appliance/Roon.Broker.Api.dll
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3c50000-7f90e3cd3000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3cd3000-7f90e3d31000 —p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d31000-7f90e3d39000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d39000-7f90e3d42000 r–p 00000000 fd:00 54395174 /opt/RoonServer/Appliance/Roon.Client.Models.dll
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d42000-7f90e3d43000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d43000-7f90e3d47000 r–p 00000000 fd:00 54395136 /opt/RoonServer/Appliance/RoonAppliance.exe
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d47000-7f90e3d52000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d52000-7f90e3d53000 rw-s 00000000 00:19 4 /dev/shm/mono.27410
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d53000-7f90e3d54000 r–p 00027000 fd:00 4194706 /lib/x86_64-linux-gnu/ld-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d54000-7f90e3d55000 rw-p 00028000 fd:00 4194706 /lib/x86_64-linux-gnu/ld-2.27.so
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7f90e3d55000-7f90e3d56000 rw-p 00000000 00:00 0
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7fff9ea6f000-7fff9ea90000 rw-p 00000000 00:00 0 [stack]
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7fff9eaaa000-7fff9eaad000 r–p 00000000 00:00 0 [vvar]
Apr 15 12:38:09 qiclab04 start.sh[3194]: 7fff9eaad000-7fff9eaaf000 r-xp 00000000 00:00 0 [vdso]
Apr 15 12:38:09 qiclab04 start.sh[3194]: ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Apr 15 12:38:09 qiclab04 start.sh[3194]: Native stacktrace:
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/RoonMono/bin/RoonAppliance() [0x5064e6]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/RoonMono/bin/RoonAppliance() [0x5ca974]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f90e3175890]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7f90e2b98e97]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7f90e2b9a801]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/Appliance/libohNet.so(+0x14ae29) [0x7f8f9810de29]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/Appliance/libohNet.so(+0xeefc3) [0x7f8f980b1fc3]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/Appliance/libohNet.so(+0x1416c7) [0x7f8f981046c7]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/opt/RoonServer/Appliance/libohNet.so(+0x14a978) [0x7f8f9810d978]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f90e316a6db]
Apr 15 12:38:09 qiclab04 start.sh[3194]: #011/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f90e2c7b88f]
Apr 15 12:38:09 qiclab04 start.sh[3194]: =================================================================
Apr 15 12:38:09 qiclab04 start.sh[3194]: Got a SIGABRT while executing native code. This usually indicates
Apr 15 12:38:09 qiclab04 start.sh[3194]: a fatal error in the mono runtime or one of the native libraries
Apr 15 12:38:09 qiclab04 start.sh[3194]: used by your application.
Apr 15 12:38:09 qiclab04 start.sh[3194]: =================================================================
Apr 15 12:38:09 qiclab04 start.sh[3194]: Error
Apr 15 12:38:11 qiclab04 start.sh[3194]: Initializing
Apr 15 12:38:11 qiclab04 start.sh[3194]: Started
Apr 15 12:38:12 qiclab04 start.sh[3194]: Not responding
Apr 15 12:38:13 qiclab04 start.sh[3194]: aac_fixed decoder found, checking libavcodec version…
Apr 15 12:38:13 qiclab04 start.sh[3194]: has mp3float: 1, aac_fixed: 1
Apr 15 12:38:17 qiclab04 start.sh[3194]: Running


(Noris) #27

Hi @Steven_Neighorn,

Thank you for giving that a try and sorry to hear that the issue is still present. I can confirm the new Core’s diagnostics have been received by our servers and I have forwarded them to QA alongside with the changes you made to the network interfaces. I will be sure to let you know once I hear back.

Thanks,
Noris


(Noris) #28

Hi @Steven_Neighorn,

I just spoke to the QA team regarding your case again. There are a few points which could use clarification:

  • You mentioned that you switched over to Ubuntu 18 and it is still crashing. Is this OS located on the same machine as Ubuntu 16 or are they physically two different machines?

  • What kind of the machine are you using? Can you let me know the model/manufacturer/specs?

  • Is this a VM, Docker or normal install?

  • Were there any changes to Ubuntu in the network stack?

  • Are you using any additional network-related software?

  • If you disconnect the network from one of the box that’s crashing and operate Roon with just a local library and no network connection, do you still experience the crashes?

Thanks,
Noris


(Steven Neighorn) #29

The Ubuntu 18 system is a brand new (to me) server. I had installed it as a testbed for U18 since I have not switched to it on any of my other servers yet. This is a completely new server (physically different than the U16 server I first used).

This is a Dell R720.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz
Stepping: 4
CPU MHz: 1662.568
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 6000.02
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

[0:2:0:0] disk DELL PERC H710P 3.13 /dev/sda
[0:2:1:0] disk DELL PERC H710P 3.13 /dev/sdb
[0:2:2:0] disk DELL PERC H710P 3.13 /dev/sdc
[0:2:3:0] disk DELL PERC H710P 3.13 /dev/sdd
[0:2:4:0] disk DELL PERC H710P 3.13 /dev/sde
[0:2:5:0] disk DELL PERC H710P 3.13 /dev/sdf
[0:2:6:0] disk DELL PERC H710P 3.13 /dev/sdg
[5:0:0:0] cd/dvd PLDS DVD-ROM DS-8DBSH CD51 /dev/sr0
Filesystem 1K-blocks Used Available Use% Mounted on
udev 132015664 0 132015664 0% /dev
tmpfs 26409692 3164 26406528 1% /run
/dev/mapper/qiclab04–vg-root 1920282912 71236100 1751431964 4% /
tmpfs 132048448 12 132048436 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 132048448 0 132048448 0% /sys/fs/cgroup
qiclab4-pool01 37756567296 0 37756567296 0% /qic04pool
wdex4100:/nfs/media 17499105024 7579037184 9744361856 44% /mnt/media
tmpfs 26409688 0 26409688 0% /run/user/0
tmpfs 26409688 0 26409688 0% /run/user/1000

no VM, docker, this is normal install. I used the Roon install script.

No changes to the network stack. Hardly anything is on this system. I did a basic server install.

No other network-related programs beyond the default apache stuff.

This system doesn’t have a local library. I could sync my network drive (nearly 8TB) to a local
disk and point roon at that but then I could’t reach the box remotely. Maybe I don’t fully understand the question about running with no network connection.


(Steven Neighorn) #30

I don’t know if it is any help but I could send you the complete crash log and also the RoonServer log from right before the crash and then after. The server crashed again today at 3:02PM. The last thing logged in the pre-cash was a music file check and then the next thing logged is the server starting up.

I will also mention, but this is probably a SEPARATE topic, that I have a music file collection of over 300K songs, but about 50K of them are considered “corrupt” by Roon. Other programs like Audirvana and MediaMonkey are able to see and play all those songs. I was going to open another support thread after the server crash issue was solved (hopefully). I very much doubt the music collection is at issue because we removed it and just served the access to my Tidal library and it still crashed.

More food for thought I guess.


(Noris) #31

Hi @Steven_Neighorn,

Thank you for letting me know that information. Sure, the complete crash logs might help, if you can send it over to me as a shared Dropbox/Google Drive/Firefox Send link I can get it added to the case notes.

The question regarding disconnecting the network was if you have just a keyboard, mouse and monitor attached to the machine, with no LAN connection and all of the network interfaces disabled, do the crashes still occur?

The best way to go about this would be to have a small sample of your library on a USB drive and see if the same issues occur.

A few other notes from the technical team:

  1. Have you also restored your Roon database on the new machine, or was the backup not even restored for the new build?

  2. Since it’s crashing in the same place on two different systems, so there must be something that is the same between the two systems, we just need to figure out what this is.

  3. The crash says it’s in thread “SsdpListenerM”, “Ssdp” is the protocol used to discover upnp devices. That might be a place to look for common things between the two systems, would any of your apache add-ons possibly be using these protocols?

Thanks,
Noris