Qua Continuum, Continuum One causes Roon Server to crash during Audio Analysis (see February 3, 2019 posts)

Hey there @Noris, I’ve just uploaded a new RoonServer_log201901161200.txt file. I restarted the Roon Server at 11:41 this morning, waited till it seemed to stabilize and then started up the iPhone Roon Client app at 11:49am. The iPhone app was unable to connect. The Roon Server did a whole bunch of stuff and then settled into its endless status update messages.

Casey

Hey @Casey_Leedom,

Thanks for providing that info. I have submitted these notes in your internal case and have added your case to be discussed by the senior devs again, I will let you know their thoughts after I hear back.

– Noris

Thanks @noris!

By the way, I ~think~ I’ve got the DNS stuff worked out. But I’ll work with a friend who’s much more familiar with bind9.

Casey

1 Like

Hey @noris,

So I had the bright idea to go and look at /var/log/syslog right at the time when the RoonServer “spontaneously restarted” (it was archived into syslog.2.gz by this afternoon). And in there I found evidence of a crash of the Roon Server. I’ve uploaded that as RoonServer_log.01-201901161430.txt.syslog …

Casey

Thanks Casey, @noris and I were discussing this issue earlier.

Can you run dmesg on this machine and dump us the output as well?

Will do! I’ve done that in the past and didn’t see anything. I didn’t remember to look at /var/log/syslog till today — dmesg is used predominantly for messages from the kernel, while syslog contains important messages from all services (including the kernel).

I’ll spend some time reproducing the issues and generate complete correlated data sets of Roon Server Logs, dmesg and syslog output. To recap, I see two different phenomenologies:

  1. The Roon Server just stops responding to the Roon Controller, but doesn’t restart and I see an endless series of “Status Messages” in the Roon Server Log. Usually there’s something “interesting” in the Roon Server Log right around the time the Roon Server stoped responding.

  2. The Roon Server spontaneously restarts, rotates its Logs, etc. This afternoon was the first time I thought to check /var/log/syslog and discovered that one of those restarts was correlated with a process crash …

These are probably all related — the Roon Server was running flawlessly (and awesomely) for just about 13 months after I installed it. I’ll check to see what new music was installed near that time. Maybe we’re overflowing a File Pathname array variable or something silly like that …

Casey

Hi @noris, Is there any progress on debugging this at your end? The Roon Server was stable and mostly working for almost three days starting on Saturday and then it started flaking out again either stalling or spontaneously rebooting with a SEGV. The new 1.6 Roon Server just loaded and I had hoped that would change things, but it’s hitting SEGV as well.

In the mean time, I had noticed that the Roon Server was running the system out of notify nodes, so I boosted that from 8192 to 1048576 via:

root@hypatia:/var/log echo 1048576 >/proc/sys/fs/inotify/max_user_watches

thinking that there might be some bugs in the Roon Server when its efforts to get notify nodes failed. But no luck there.

Casey

I’m seeing the following error coming from my currently stalled 1.6 Roon Server in its Logs:

...
01/23 20:40:20 Info: [stats] 5717mb Virtual, 1414mb Physical, 791mb Managed, 0 Handles, 137 Threads
01/23 20:40:35 Info: [stats] 5717mb Virtual, 1414mb Physical, 791mb Managed, 0 Handles, 137 Threads
01/23 20:40:49 Warn: Error in web request https://push.roonlabs.com/push/1/connect: NetworkError (Aborted.)
01/23 20:40:49 Trace: [push] request to manager failed
01/23 20:40:49 Trace: [push] retrying connection in 175569ms
01/23 20:40:50 Info: [stats] 5717mb Virtual, 1414mb Physical, 791mb Managed, 0 Handles, 138 Threads
01/23 20:41:05 Info: [stats] 5717mb Virtual, 1414mb Physical, 791mb Managed, 0 Handles, 137 Threads
...

When I do a wget to push.roonlabs.com I get a 301 response: “Permanently moved” …

Also, would it be of any use to send in the current SEGV for the 1.6 Roon Server?

Casey

Hi @Casey_Leedom,

Apologies for the slow response on our part, Roon 1.6 release was a little busy :slight_smile:

I have a meeting scheduled tomorrow with our senior devs to discuss your case and make sure to get you the best possible info here. If you think the SEGV may help please do send it to us.

I really do appreciate your patience and will let you know as soon as I have more info or any follow up steps.

Thanks,
Noris

Okay @noris, no problem! I know how busy releases can make you. Congratulations on the new streaming integration with Qobuz!

I’ve uploaded the last SEGV from last night with the 1.6 Roon Server as the file “syslog-segv-1.6”. I haven’t been able to get it to do that tonight in a short amount of testing. Tonight it’s just stalled and failing to play anything. (sigh)

Casey

Hello @Casey_Leedom,

Thanks for syslogs. Those traces look like a crash during file audio analysis. Would you mind to share your Roon Core logs, because we are failing to receive them through our diagnostic tool.
If this is an analysis crash we can try to identify the cause.

Hi @vova,

Sorry for the delay in answering — I was at an all-day conference yesterday followed by a lecture in the evening.

So, are you talking about the Roon Server Logs in /var/roon/RoonServer/Logs/ or the RAAT Server Logs in /var/roon/RAATServer/Logs/? I’ll try to get another SEGV crash and provide matching logs of /var/logs/syslog, the above two, and anything else. The Roon Server behavior when this happens is a restart so I’ll need to send you two Roon Server Logs: the previous one RoonServer_log.01.txt and the current one RoonServer_log.txt. The previous one will simply abruptly end, and the current one will show a normal startup.

Trying to reproduce now!

Casey

Okay, so here is a case where the behavior was that I started the Roon Server at “01/25 10:04:24”, waited till it settled into a steady state at “01/25 10:12:56” when I started up the Roon Controller app on my iPhone. I selected “most recent” and then tried to play the 96kHz/24bit copy of Robert Rich Tactile Ground. Nothing happened. I stopped the Roon Server at “01/25 10:16:15”. I’ve uploaded /var/log/syslog for this period, and the complete /var/roon/RoonServer/Logs/RoonServer_log.txt and /var/roon/RAATServer/Logs/RAATServer_log.txt as files with the same names appended with “-01252019-stalled”.

Interestingly, in the syslog excerpt, we see the following:

Jan 25 10:11:12 hypatia start.sh[5614]: Stacktrace:
Jan 25 10:11:12 hypatia start.sh[5614]:   at <unknown> <0xffffffff>

but the RoonServer_log.txt continues with its business. So it’s like a thread of the Roon Server crashed but didn’t completely hose the Roon Server. Unfortunately, the stack trace is pretty worthless.

I’ll continue trying to reproduce the issue.

Casey

And I’ve captured a crash @noris, @vova! The files are named:

/tmp/RAATServer_log.01.txt-01252019-crashed
/tmp/RAATServer_log.txt-01252019-crashed
/tmp/RoonServer_log.01.txt-01252019-crashed
/tmp/RoonServer_log.txt-01252019-crashed
/tmp/syslog-01252019-crashed

The “.01.txt” files represents the logs of the Roon/RAAT Server crash and the ones without, are the next iteration of the Roon/RAAT Servers which were automatically restarted. I immediately stopped the new servers so I could grab everything. The “syslog-01252019-crashed” is the excerpt of /var/log/syslog covering this period.

This time when I started the Roon Controller app on my iPhone I simply selected “Play” for the Robert Rich album (which was the remembered selection). It played for a bit and then abruptly stopped, coinciding with the SEGV crash and restart of the servers.

Hopefully these will provide some clue as to what’s going on. Roon is fundamentally unusable for me and has been for most of the last weeks since I initially reported this …

Casey

Ah, small correction: The “RoonServer_log.txt-01252019-crashed” appears to be from the crash, not the “RoonServer_log.01.txt-01252019-crashed”. I guess the Roon Server hadn’t gotten restarted yet and rotated the logs before I stopped it. Sorry for the confusion!

Casey

Also, one final issue of interest on the “crashed” files: when I selected the Robert Rich album to play, it took an incredibly long time before it actually started to play. Perhaps 15s. And the overall experience has been very sluggish as well with lots of pauses, etc. It was quite snappy before all of this started happening.

Casey

Ah, never mind on the long startup. That’s the recording. But the rest of the comments regarding sluggishness are right.

Casey

So I have quite a few different stack traces from today. They all start with:

Stacktrace:
at <unknown> <0xffffffff>
... { different subsequent next trace backs [ some with nothing more ] } ...

Some of the stack traces appear to represent a thread streaming content, others doing HTTP requests and still others Heap Management. I can of course collect all of the stack traces over the last few days and provide them if you think this will help.

If I had to guess, this feels like random memory corruption. Maybe because of the size of our collection (110089 tracks) or possibly because of the size of a particular artists contribution (Bob Dylan, 72 albums), or possibly because of UTF-16 characters in filenames triggering a problem (e.g. " Naomi Berrill - <E3><80><90>NEW<E3><80><91>To the Sky(ETH04) - 01 Lady Lighthouse.flac" added on December 11, 2018), etc. Do we have a special DEBUG version of the software that I can run to try to suss out what’s going on?

Casey

@noris, @vova,

Huh, now here’s a particularly interesting set of traces. This one is a SIGABRT which was triggered by a Roon Server ASSERT. From the syslog I just uploaded (see below) we have:

Jan 25 19:30:40 hypatia start.sh[9122]: mono_class_from_mono_type: implement me 0x23
Jan 25 19:30:40 hypatia start.sh[9122]: mono_class_from_mono_type: implement me 0x23
Jan 25 19:30:40 hypatia start.sh[9122]: * Assertion: should not be reached at class.c:6467
Jan 25 19:30:40 hypatia start.sh[9122]: * Assertion: should not be reached at class.c:6467

And the Roon Server Log has:

...
01/25 19:30:40 Trace: [zone MacBook] Playing 11 Items
01/25 19:30:40 Trace: [zone MacBook] Selecting Source state=Stopped
01/25 19:30:40 Trace: [zone MacBook] Playing TransportItem
01/25 19:30:40 Trace: [zone MacBook] Selecting Source state=Stopped
01/25 19:30:40 Trace: [MacBook] [Inactive] [LOADING @ 0:00] Lady Lighthouse - Naomi Berrill
01/25 19:30:40 Trace: [musicpowerstate] music is playing, preventing idle sleep
01/25 19:30:40 Info: [MacBook] [zoneplayer] Playing: /music/flac/Naomi Berrill/To the Sky/Naomi Berrill - <E3><80><90>NEW<E3><80><91>To the Sky(ETH04) - 01 Lady Lighthouse.flac
01/25 19:30:40 Info: [MacBook] [zoneplayer] Queueing: /music/flac/Naomi Berrill/To the Sky/Naomi Berrill - <E3><80><90>NEW<E3><80><91>To the Sky(ETH04) - 02 Sparkling Sea.flac
01/25 19:30:40 Error: [decoder.trycreate] unexpected error: System.InvalidCastException: Specified cast is not valid.
  at Sooloos.Media.CustomMediaFile.CallbackableLengthCallback (System.IntPtr userdata, System.Int64& out_length) [0x0000e] in <db6aa5e4fc674a67bae693437c8149ae>:0 
  at (wrapper managed-to-native) Sooloos.Media.Native.soom_decoder_get_params(intptr,int&,int&,int&,int&,Sooloos.Audio.StreamChannelLayout&,long&)
  at Sooloos.Media.Decoder.TryCreate (System.String filename_or_url, Sooloos.Media.MediaFile mediafile, System.Int32 seek_ms, Sooloos.Media.Decoder& decoder) [0x00048] in <db6aa5e4fc674a67bae693437c8149ae>:0 
01/25 19:30:40 Error: [decoder.trycreate] unexpected error: System.InvalidCastException: Specified cast is not valid.
  at Sooloos.Media.CustomMediaFile.CallbackableLengthCallback (System.IntPtr userdata, System.Int64& out_length) [0x0000e] in <db6aa5e4fc674a67bae693437c8149ae>:0 
  at (wrapper managed-to-native) Sooloos.Media.Native.soom_decoder_get_params(intptr,int&,int&,int&,int&,Sooloos.Audio.StreamChannelLayout&,long&)
  at Sooloos.Media.Decoder.TryCreate (System.String filename_or_url, Sooloos.Media.MediaFile mediafile, System.Int32 seek_ms, Sooloos.Media.Decoder& decoder) [0x00048] in <db6aa5e4fc674a67bae693437c8149ae>:0 

Additionally, the Roon Server was stuck and sucking up 100% of a CPU.

I’ve uploaded these files as:

RAATServer_log.01.txt-01252019-abort
RoonServer_log.01.txt-01252019-abort
syslog-01252019-abort

Casey

Hello @Casey_Leedom,

Thanks for getting back to us. We are going to talk to our dev team on Monday regarding your latest crashes, since they are pretty different from what we saw in a previous report.

Your library size or amount of albums associated with Bob Dylan shouldn’t be an issue. I could look at few media examples where you were using special chars, just to make sure they are being processed fine, but this also shouldn’t be an issue.

There is no debug version and there is absolutely no need to debug Roon on your own. My biggest problem at the moment is that you are getting too many different and unrelated to each other ( at glance ) issues at the same time, as I said earlier we will discuss all those crashes with the dev team and will try to get to the bottom of this issue.