Playback failing when playing MQA Content to multiple Grouped Rendu Devices

I’ve been seeing errors like this recently:

17

Things which I believe have always been true when this has happened:

  • The track was MQA-encoded (whether on local disk or coming from TIDAL).

  • The group of zones I had synchronized contained one or more non-MQA DACs which I was asking Roon to do the MQA first unfold for, plus one actual native hardware MQA DAC (A Meridian Prime, FWIW).

I don’t think this was happening before Roon 1.5, but I can’t be 100% sure. It’s definitely pretty recent.

In these cases, Roon fails to play any MQA tracks which are in the Queue, and depending on how many sequential failures there were either continues play at the first non-MQA track or gives up entirely.

All DACs in question are being fed by assorted Rendu models (micro, ultra, or signature) all at their latest firmware.

These tracks will all play with no problem if I send them to either all the non-MQA DACs or just the MQA DAC.

In case this is causing trouble: note that many releases ago I manually set Clock Master Priority values for all zones, giving the lowest to the zones whose sound I cared about the most, but reserving the absolutely lowest in use (in my case, 2) for the native MQA DAC - this in an attempt to guarantee that the MQA device would get an uncorrupted bitstream. Is this likely to be a cause of problems, and is it still necessary (or do you now have special provisions built in to make sure MQA streams are delivered undamaged)?

Hello @Jeffrey_Moore,

Thanks for contacting support, I’d be happy to take a look over this issue with you. Can you please provide more information regarding your Core and your network setup here? Please let me know your Make/Manufacturer of the core and how it is connected to your DACs, Ethernet to the Rendu Models and then USB to the DACs? Please list all of your DACs present in this setup as well.

Additionally, I have gone ahead and enabled diagnostics mode on your account and what that will do is next time your Core is active a set of logs will automatically be generated and uploaded to our servers for analysis. I kindly ask for you to attempt to replicated this issue and then mark the exact local time in your country (ex. 4:10PM) and let me know the time of the failure. We can then cross-reference the time you notice the issue with the logs and see if there are any specific kinds of errors being outputted. Please let me know when possible.

Thanks,
Noris

While trying to reproduce this for the purpose of getting something logged while in diagnostics mode, I’ve discovered that I had apparently found a bad correlation - while trying to simplify, I found that I was indeed now able to play MQA material to the native MQA device and non-MQA devices which were having the MQA core unfold done for them, all grouped together - as long as the number of zones grouped together was small

Failed:

Didn’t fail:

…so, maybe this is unrelated to whether or not there’s a native MQA device in there, and is more like this oldie but goodie:

[earlier bug](https://community.roonlabs.com/t/playback-significantly-less-reliable-recently-more-frequent-an-audio-file-is-loading-slowly-errors/34465)

Sure. I guess I should save myself a boilerplate system description for use in bug reports, because it feels like I’ve done this a lot.

RoonServer 1.5 (334) is running on an Ubuntu 16.04 LTS system, Supermicro motherboard with Xeon E3-1270 v3 CPU, 32GB of memory.

Audio tracks are stored in a filesystem local to that machine, in a ZFS pool.

Networking is all wired Gigabit (except for wired 100Mb to one zone: Bedroom).

Network hardware is UniFi. IGMP snooping is currently turned off, because it’s been possibly-implicated in Android app flakiness.

The micro-, ultra-, and signature rendu devices are currently in their own VLAN, which RoonServer has to my delight been happily feeding music to via a second ethernet interface on the Core box which is talking only to that VLAN; but I don’t believe the current problem could be related to that because the problem began occurring while everything was still on the one main VLAN. For the purposes of reading logs you’ll pull: 10.0.1.* is the main VLAN, which user-interface clients are using to control RoonServer; but you’ll see the devices being played to in 10.0.22.*

These are the zones being played to:

The one called “Office” is a Meridian Prime, the only native MQA device; “Bedroom” is a Schiit Yggdrasil being fed via its Gen5 USB interface; Kitchen is a Schiit Bifrost via its Gen5 USB interface; others use USB-to S/PDIF boxes (couple of Schiit Eitrs, one Matrix X-Spdif) to feed various Meridian, TacT and Wadia gear - but all of those are out of sight to Roon behind the S/PDIF interfaces.

Known failures at 20:05:06, 20:05:26, and 20:05:37 EDT today, 15 July - because those are the timestamps on screenshots I took while the error messages were still up.

Let me know if I need to send you logs – in the past, you’ve had trouble pulling logs directly from my system for some reason.

Hello @Jeffrey_Moore,

Thank you for taking the time to write that comprehensive description of your setup and performing those group tests.

I can confirm that we have received the necessary diagnostics information from your machine and I have started a case for you with out QA team who will be taking a closer look at what could be going wrong here.

Just in case this hasn’t been mentioned yet, I would also recommend rebooting your Rendu’s, Core and any relevant networking equipment to see if that has any change in zone grouping functionality. I would also double check the Signal Path Information as you are adding these zones just to confirm that the processing speed doesn’t drop too low, with the Core you describe it should’t but I would like to be sure if possible.

Thanks,
Noris

Oh, good!

The Rendus all got rebooted a couple of days ago to encourage them to ask DHCP for their new IP addresses; this issue occurred both before and after that operation.

The Core host hasn’t been rebooted in awhile; I can do that for giggles.

[What most often helps when things get sluggish is not necessarily rebooting the Core’s host, but just restarting RoonServer. There’s still a tendency for RoonServer’s memory footprint to increase over time, and it seems to be more snappy and forgiving after a restart (once any post-start rescans have finished).]

I didn’t see processing speed drop below something over 20x yesterday, and that was only when transcoding from DSD was going on followed by PCM resampling (to different target rates for different zones) all at the same time. There’s plenty of processing power, and the simple MQA unfold going on when things went funny is I believe way less expensive than the other pipeline described above.

Hello @Jeffrey_Moore,

I have just received a reply from the QA team asking if you can please run a test:

Can you please group the following 3 zones together

  • Bedroom
  • Living room
  • Office

and attempt to play back these tracks

/z/nonitunes/indexed/purchased/lossless/MQA/OnkyoMusic/Herbert Von Karajan/Beethoven_Symphonies_Nos_1-9_n_Overture/Herbert Von Karajan - 29. Symphony No. 6 in F Major, Op.flac
/z/nonitunes/indexed/purchased/lossless/MQA/OnkyoMusic/Herbert Von Karajan/Beethoven_Symphonies_Nos_1-9_n_Overture/Herbert Von Karajan - 30. Symphony No. 6 in F Major, Op.flac

Please let me know if those 3 groups work

Afterwards, add the Kitchen to the group and attempt to play those same tracks now with 4 devices in the group and let me know if that works as well. Please let me know your findings when possible and I will forward your results to the QA team.

Thanks,
Noris

Will do after I get back home!

1 Like

That was cannily judged by QA!

{Bedroom, Living Room, Office} did indeed consent to play those tracks, but when I added the Kitchen zone:

56

(at 00:20:56). And just because I was curious, I tried that last play again after restarting RoonServer fresh - no change, same failure.

Now I’m wondering if they noticed something about the run rates of the master clocks at those zones, such that something pessimal happened in the 4-zone set…

Anyway, this pinpoint predicting by QA of no-fail / fail cases gives me a good feeling!

I hope the RoonServer restart didn’t screw up your log capture.

Hello @Jeffrey_Moore,

Thank you for running those tests for us. I can confirm we have just received the new diagnostic info from your machine and I have sent it to QA for further analysis which will hopefully show why adding that Kitchen Zone causes issues.

No worries about the RoonServer restart, the information you provided before doing that should be sufficient to have a few more clues as to what’s going wrong. I will update you on next steps after I receive the report back from from QA and let you know what the next troubleshooting steps are here, appreciate your patience in the meantime!

Thanks,
Noris

Hello @Jeffrey_Moore,

I have some good news, we know where the issue lies and have been able to reproduce it, a ticket has been submitted with our dev team and while we don’t have a firm time-frame of when it will be released, we are hoping to fix this issue soon. Myself or someone from the support team will update you once we have any more information to share.

Thanks,
Noris

1 Like

Excellent! This is exactly how I hope bug-hunting will turn out.

I suspected y’all were hot on the trail when you asked me to test two cases with one difference between them and you’d correctly predicted the outcome.

1 Like

Hey @Jeffrey_Moore,

I wanted to reach out and let you know that we just released build 339 of Roon, which includes a fix for a bug causing MQA playback to fail when large numbers of zones are grouped. Please give this update a go and let us know if you’re still experiencing any difficulties.

You can read more about this release here:

We genuinely appreciate your patience here!

Kind regards,
Dylan

1 Like