Tidal playback randomly skips to next track part way through playback

Hi @support,

I’ve been having this problem on and off at least since I built a new Roon server back in August '18, possibly earlier on my older setup. I’ve been working through diagnosing it on and off and finally have enough info to post a support request. I think it is either something to do with my local network, possibly my ISP, or regional Tidal cache. I’ve read through a few posts in here where other people have had a similar problem, however the resolutions in those cases have not worked for me so far. I’m really just posting here to get some feedback on my diagnosis so far, and see if you have any other suggestions.

The problem

Part way through playback of a Tidal track, sometimes 65%, sometimes 75%, often 85% or 95%, playback suddenly stops, and then a few seconds later the next track begins playback.

This only occurs when playing back an album from Tidal. Playback of local media is always fine. More often than not, it is a track part way through the album, sometimes a much later track, and less often it happens for the first track.

The problem only seems to occur for albums I’ve never listened to before, or possibly (ie I’m not yet 100% sure about this scenario) that I haven’t listened to in a long time.

If I immediately tell Roon to replay the track that had the problem, so far it has always played back fine ie it completes 100% of playback, so the problem is not track or album specific, rather it seems to be a random streaming or connection problem.

I have a Tidal HiFi account, and the problem occurs with playback of both redbook FLAC or MQA files from Tidal.

Some days I might play 2 or 3 new albums from Tidal before the problem occurs, other days, at least one track on every new album from Tidal has an issue. Most days, I’m listening to local content or Tidal album’s I’ve been playing regularly, ie its only every week or two I go on a bit of discovery for new music and will see this issue.

The pattern of errors and warnings in the Core log file is 100% consistent every time I’ve looked, which is useful at least.

Detailed log examples follow at the end of this post, however, here’s the summary:

There is a pair of ‘Connection reset by peer’ messages from the [streamingmediafile] logger in the Core’s log whilst caching the Tidal file. Then sometime later, ie during playback of the file, playback stops and there is another error in the Core’s log from the [zoneplayer] logger that ‘Track ended unexpectedly’. Then there are log entries where it tidies that up and skips to the next track.

My setup

ADSL line enters house -> xDSL Master filter -> DrayTek Vigor 130 -> pfSense router -> Netgear GS108 switch -> Netgear Orbi Router (in AP mode) -> Netgear Orbi Satelite -> Netgear GS108 switch -> Roon Core server (with local media on SSD) -> bridged network on 2nd NIC -> KEF LS50 Wireless.

I put the xDSL master filter in 12 months ago and since then ADSL line has consistenly had a rock solid connection at 16Mb/s (as measured by SpeedTest) with no drops, and ping times to the ISP’s gateway are typically 20-25ms. On a side note, I’ve requested an upgrade to VDSL, which should go through next week sometime (scheduled for 22nd).

The Roon Core is an Intel NUC 7i7 with 8GB and 2 local SSD’s, one for OS + Roon Server + DB, and one for media. It is running Arch Linux which is updated on average every 2 weeks. I used your supplied script to install RoonServer.

All connectivity in the house is 1Gb/s ethernet over Cat6 or 6a, except the ADSL line and the Orbi Wifi backhaul.

There is triple NAT’ing because my ISP (flip.co.nz) does CG-NAT, the Vigor 130 is NAT’ing the PPoA connection and then pfSense is also NAT’ing. Ideally only the pfSense router would NAT, however a) my current ISP doesn’t offer a service without GC-NAT and b) despite trying, I have not yet been successful at setting up a PPoE/PPoA bridge mode between the Vigor 130 and pfSense. When the VDSL upgrade happens, the PPoA will change to PPoE, so I’ll be able to set that to PPoE bridged and thus drop the NAT layer on the Vigor at least.

Control points are an old Dell Laptop, iPad and iPhone on the Orbi Wifi network. There are plenty of other devices on my network, however none are involved in streaming or playback.

Over the dozens and dozens of times this has happened over the months, no other devices on the network or indeed any connectivity to the Core has been affected by any network drops or internet connectivity issues at the same time. As far as I’m aware the other devices off the Orbi satelite or the 2nd GS108 are all having no problems. That said, the AppleTV does have the very occasional (ie relatively rare compared to the Roon issue) buffering issue on Netflix or Apple Movies, however I’ve never been using it at the same time as Roon, and a few buffering issues streaming HD movies or Netflix on a 16Mb/s ADSL line is to be expected.

Everything in my setup is regularly checked and updated to latest firmware, patches etc (Vigor, pfSense, Arch, Roon, Kef).

Working theory

Something is dropping or actively closing the connection between my Core and the local Tidal cache server. I’m basing this on the relatively specific ‘Connection reset by peer’ message in the logs. From my knowledge of such things after 20+ years working in IT, I know this specific error is typically because a network device, ie a proxy, firewall, router or NAT’ing layer, etc, or the server itself has closed the connection. My current suspects are a) one of the 3 NAT’ing layers, b) the Tidal cache server.

As I say this is my current working theory, happy to hear other suggestions.

Other diagnosis notes thus far

I’ve swapped out network cables between both the Core and switch, the switch and Orbi Satelite, and the core and the Kef endpoint. No other devices running off the Orbi’s or off the switches are experiencing any drop outs (that I’m aware of).

I noted a number of people have had issues with Netgear Orbi’s in their setup, however in every case (from what I’ve read so far at least) these have been in Router mode and have been solved either by changing the IGMP proxy setup or MTU. In my case my Orbi’s are in AP mode so these are not applicable.

That said, I have also tried bypassing the Orbi’s and 2nd GS108 by running a spare 30m Cat 6a cable from the 1st GS108 directly to the Core, ie ADSL line enters house -> xDSL Master filter -> DrayTek Vigor 130 -> pfSense router -> Netgear GS108 switch -> Roon Core server (with local media on SSD) -> bridged network on 2nd NIC -> KEF LS50 Wireless. However the problem persists with this config as well.

It occurs to me as I write this I haven’t tried temporarily connecting the Kef endpoint to the GS108 and disabling the bridge network on the Core, I’ll add that to my list of things to try this week.

I initially wondered, based on some other similar posts in these forums, if it was a DNS issue, so I have experimented back and forth a number of times between my ISP’s DNS vs CloudFlare’s DNS, on a number of occasions, including reboots of the router, Orbi’s, and the Core server to flush any old state, however there is no change in behaviour.

I’ve also checked the pfSense firewall logs and there are no firewall events at the times of the playback issues or the ‘Connection reset’ errors in the Core’s log file. There are occasional entries for out of state Fin Ack or Reset Ack packets being dropped by the firewall for connections between the Core and the Tidal server, however these appear to be consistent through out playback, regardless of whether the problem occurs or playback is uninterrupted.

I’m running NetData on the Core and I see a small spike in RST’s on the network interface at the same time as the ‘Connection reset by peer’ message. Otherwise the Core server is not showing anything different from normal playback where there is no problem (or at least nothing I’ve spotted yet). NetData on the Core also shows regular UDP errors, however these are present 24/7 ie even during no playback. Note they go away if RoonServer is stopped, and come back when it is started again. I was surprised by this, as I thought Roon had stopped using UDP many releases ago?

On the odd occasion I’ve been watching the Traffic Graphs on pfSense, during Tidal playback I see the inbound bandwidth usage jump to around 16Mb/s and sit there until either the file has been cached, or until the ‘Connection reset by peer’ message appears, ie it doesn’t appear to be due to a slow internet connection.

I have checked the logs on the Vigor 130 and it’s firewall log appears to empty, which seems strange. I haven’t figured out if/where it logs NATing events yet. I’ll have a look in to that this week sometime as well.

I’m based in Wellington, New Zealand. The ping times to my local Tidal cache server from the Core server are often around 70ms, and the trace routes lead me to think it is probably based in either in Auckland, New Zealand, or a datacentre in Australia, (do Tidal use AWS?). This is fairly typical for content providers in my part of the world.

Questions

Is it possible the Tidal server is closing the connection despite not having served the entire stream? If so is there a way to determine that other than by process of elimination in all the other layers?

Why doesn’t the streamingmediafile component of the Core retry a new connection at least once before abandoning the caching of the file?

Logs

Here’s an example of the messages when the caching of the Tidal stream errors out:

01/13 08:59:51 Trace: [tidal/http] GET https://api.tidalhifi.com/v1/tracks/89267305/streamurl?countryCode=NZ&sessionId=c72fa599-c963-4af3-a63f-95889d29c58a&soundQuality=LOSSLESS => Success
01/13 08:59:51 Trace: [tidal/http] GET https://api.tidalhifi.com/v1/tracks/89267305?countryCode=NZ&sessionId=c72fa599-c963-4af3-a63f-95889d29c58a& => Success
01/13 08:59:51 Info: [Kef] [zoneplayer] Queueing: http://ab-pr-ak.audio.tidal.com/adae746a4e84adaff631fc32765f9b6a_39.flac
01/13 08:59:51 Trace: [roonapi] [apiclient 192.168.1.9:48344] CONTINUE Changed {"message":"Core paired","is_error":false}
01/13 08:59:52 Info: [stats] 2716mb Virtual, 611mb Physical, 228mb Managed, 0 Handles, 66 Threads
01/13 08:59:53 Info: [Kef] [zoneplayer] Open result (Queueing): Result[Status=Success]
01/13 08:59:53 Trace: [streamingmediafile] immediate read of http://ab-pr-ak.audio.tidal.com/adae746a4e84adaff631fc32765f9b6a_39.flac at 524288 length=40970428
01/13 08:59:54 Warn: [streamingmediafile] in immediate read: System.Net.WebException: Error getting response stream (ReadDone1): ReceiveFailure ---> System.IO.IOException: Unable to read data from the transport connection: Connection res
et by peer. ---> System.Net.Sockets.SocketException: Connection reset by peer
  at System.Net.Sockets.Socket.EndReceive (System.IAsyncResult asyncResult) [0x00012] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x00057] in <126998f2e5ae42fe95554117eb649feb>:0
   --- End of inner exception stack trace ---
  at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x0009b] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.WebConnection.ReadDone (System.IAsyncResult result) [0x0001b] in <126998f2e5ae42fe95554117eb649feb>:0
   --- End of inner exception stack trace ---
  at System.Net.HttpWebRequest.EndGetResponse (System.IAsyncResult asyncResult) [0x00058] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.HttpWebRequest.GetResponse () [0x0000e] in <126998f2e5ae42fe95554117eb649feb>:0
  at Sooloos.Media.StreamingMediaFileImpl._ReadImmediate (System.Int64 file_off, System.Byte[] buf, System.Int32 off, System.Int32 count) [0x00086] in <67cdbfbc98564a67a315f25b32208b8c>:0
01/13 08:59:54 Error: [cachingseekableurimediafile] while reading: System.Net.WebException: Error getting response stream (ReadDone1): ReceiveFailure ---> System.IO.IOException: Unable to read data from the transport connection: Connecti
on reset by peer. ---> System.Net.Sockets.SocketException: Connection reset by peer
  at System.Net.Sockets.Socket.EndReceive (System.IAsyncResult asyncResult) [0x00012] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x00057] in <126998f2e5ae42fe95554117eb649feb>:0
   --- End of inner exception stack trace ---
  at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x0009b] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.WebConnection.ReadDone (System.IAsyncResult result) [0x0001b] in <126998f2e5ae42fe95554117eb649feb>:0
   --- End of inner exception stack trace ---
  at System.Net.HttpWebRequest.EndGetResponse (System.IAsyncResult asyncResult) [0x00058] in <126998f2e5ae42fe95554117eb649feb>:0
  at System.Net.HttpWebRequest.GetResponse () [0x0000e] in <126998f2e5ae42fe95554117eb649feb>:0
  at Sooloos.Media.StreamingMediaFileImpl._ReadImmediate (System.Int64 file_off, System.Byte[] buf, System.Int32 off, System.Int32 count) [0x0013d] in <67cdbfbc98564a67a315f25b32208b8c>:0
  at Sooloos.Media.StreamingMediaFileImpl._Read (System.Int64 file_off, System.Byte[] buf, System.Int32 off, System.Int32 count) [0x00089] in <67cdbfbc98564a67a315f25b32208b8c>:0
  at Sooloos.Media.StreamingMediaFileImpl.Read (System.Int64 file_off, System.Byte[] buf, System.Int32 off, System.Int32 count) [0x0005e] in <67cdbfbc98564a67a315f25b32208b8c>:0
  at Sooloos.Media.StreamingMediaFile.Read (System.Int64 file_off, System.Byte[] buf, System.Int32 off, System.Int32 count) [0x00000] in <67cdbfbc98564a67a315f25b32208b8c>:0
  at Sooloos.Media.CachingSeekableUriMediaFile.ReadCallback (System.IntPtr userdata, System.IntPtr buf, System.IntPtr count, System.IntPtr& out_bytesread) [0x00055] in <67cdbfbc98564a67a315f25b32208b8c>:0
01/13 08:59:54 Warn: [prebuffer] in buffer threadSystem.Exception: Read failure: IoFailure
  at Sooloos.Audio.MediaDecoderAudioSignal.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x00232] in <5535008d6900425285ae140d60437aa6>:0
  at Sooloos.Broker.Transport.FormatDetectAudioSignal.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x000ca] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
  at Sooloos.Audio.SeekableBufferedAudioSignal._Buffer (System.Int32 buffer_seq) [0x0003e] in <5535008d6900425285ae140d60437aa6>:0
  at Sooloos.Audio.SeekableBufferedAudioSignal+<>c__DisplayClass30_0.<_StartBuffering>b__0 () [0x00000] in <5535008d6900425285ae140d60437aa6>:0
01/13 08:59:54 Trace: [Kef] [Lossless 37.2x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:05/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 08:59:59 Trace: [Kef] [Lossless 37.3x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:10/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:00:05 Trace: [Kef] [Lossless 37.3x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:16/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:00:07 Info: [stats] 2751mb Virtual, 643mb Physical, 301mb Managed, 0 Handles, 66 Threads
01/13 09:00:10 Trace: [Kef] [Lossless 36.9x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:21/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:00:16 Trace: [Kef] [Lossless 37.0x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:26/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:00:16 Trace: [streamingmediafile] finished caching http://ab-pr-ak.audio.tidal.com/adae746a4e84adaff631fc32765f9b6a_39.flac
01/13 09:00:21 Trace: [Kef] [Lossless 37.0x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:32/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:00:22 Info: [stats] 2708mb Virtual, 600mb Physical, 256mb Managed, 0 Handles, 66 Threads
01/13 09:00:26 Trace: [Kef] [Lossless 37.0x, 24/44 MQA TIDAL FLAC => 24/88] [100% buf] [PLAYING @ 0:37/4:39] Bloodlines - The Adults / Estére / Jessb

And here’s the same track then erroring out at the sudden end of playback about 14 seconds later:

01/13 09:04:01 Trace: [Kef] [Lossless 36.6x, 24/44 MQA TIDAL FLAC => 24/88] [70% buf] [PLAYING @ 4:11/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:04:06 Trace: [Kef] [Lossless 36.7x, 24/44 MQA TIDAL FLAC => 24/88] [18% buf] [PLAYING @ 4:16/4:39] Bloodlines - The Adults / Estére / Jessb
01/13 09:04:07 Info: [Kef] [zoneplayer] Track ended unexpectedly: Sooloos.Audio.BufferedReadException: error during buffered read ---> System.Exception: Read failure: IoFailure
  at Sooloos.Audio.MediaDecoderAudioSignal.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x00232] in <5535008d6900425285ae140d60437aa6>:0
  at Sooloos.Broker.Transport.FormatDetectAudioSignal.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x000ca] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
  at Sooloos.Audio.SeekableBufferedAudioSignal._Buffer (System.Int32 buffer_seq) [0x0003e] in <5535008d6900425285ae140d60437aa6>:0
  at Sooloos.Audio.SeekableBufferedAudioSignal+<>c__DisplayClass30_0.<_StartBuffering>b__0 () [0x00000] in <5535008d6900425285ae140d60437aa6>:0
   --- End of inner exception stack trace ---
  at Sooloos.Audio.SeekableBufferedAudioSignal.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x0003b] in <5535008d6900425285ae140d60437aa6>:0
  at Sooloos.Broker.Transport.ZonePlayerTrack._ReadBacking (System.Byte[] buffer, System.Int32 offset, System.Int32 frames) [0x00041] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
  at Sooloos.Broker.Transport.ZonePlayerTrack+_Stream.ReadImp (Sooloos.Audio.AudioBuffer buf, System.Int32 nsamples) [0x00041] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
  at Sooloos.Audio.AudioStream.Read (Sooloos.Audio.AudioBuffer buf, System.Int32 nsamples) [0x0005e] in <ded7aa6d297649a693b703f945f65998>:0
  at Sooloos.Broker.Transport.AudioFileStreamWrapper.ReadImp (Sooloos.Audio.AudioBuffer buf, System.Int32 nsamples) [0x00047] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
  at Sooloos.Audio.AudioStream.Read (Sooloos.Audio.AudioBuffer buf, System.Int32 nsamples) [0x0005e] in <ded7aa6d297649a693b703f945f65998>:0
  at Sooloos.Broker.Transport.ZonePlayerBase.ReadImp (Sooloos.Audio.AudioBuffer buf, System.Int32 nsamples) [0x00115] in <bbda6a553eb048f1b5c5c7aa2e58a1a5>:0
01/13 09:04:07 Info: [zone Kef] OnPlayFeedback StoppedEndOfMediaNatural
01/13 09:04:07 Debug: [zone Kef] _Advance
01/13 09:04:07 Info: [library] recorded play for profile 37622537-cfea-48bf-83d0-ef3c4d802e57: mediaid=50:1:bef74ea5-7733-4e15-af46-0f3af222f8f6 metadataid= contentid=168:0:89267304 libraryid=50:1:bef74ea5-7733-4e15-af46-0f3af222f8f6
01/13 09:04:07 Trace: [library] finished with 20 dirty tracks 2 dirty albums 9 dirty performers 20 dirty works 20 dirty performances 0 clumping tracks, 0 clumping auxfiles 0 compute tracks, 0 deleted tracks, 0 tracks to (re)load, 0 tracks to retain, 0 auxfiles to (re)load, 0 auxfiles to retain, and 52 changed objects
01/13 09:04:07 Debug: [library/index] updating search indices: 10 ops 0 adds, 0 removes
01/13 09:04:07 Trace: [Kef] [Lossless 36.7x, 24/44 MQA TIDAL FLAC => 24/88] [6% buf] [LOADING @ 0:00] That Gold - The Adults / Raiza Biza / Aaradhna
01/13 09:04:07 Debug: [query] Sooloos.Broker.Transport.TransportItem: 4148 dirty items. rebuilding query instead of re-sorting item-by-item (internaltype=TransportItem)
01/13 09:04:07 Debug: [query] Sooloos.Broker.Transport.TransportItem: 4148 dirty items. rebuilding query instead of re-sorting item-by-item (internaltype=TransportItem)
01/13 09:04:07 Trace: [Kef] [zoneplayer/kef] reached end of stream, closing connection
01/13 09:04:07 Trace: [Kef] [zoneplayer/kef] transaction canceled, isplaying: True, did stream end: True, tx path: /54bc6249b68d4d098c4663b80eed8129/Roon614d9ea2054d499491dc62e200af8849.flac, stream path: /54bc6249b68d4d098c4663b80eed8129/Roon614d9ea2054d499491dc62e200af8849.flac, method: Get
01/13 09:04:08 Info: [stats] 2733mb Virtual, 626mb Physical, 291mb Managed, 0 Handles, 65 Threads

I’ll be grateful for any help, feedback, or tips! :slight_smile:

Cheers,
Dunc

DNS settings? Are the tracks mqa or higher bit rates that fail?

Hello @Duncan_Simpson,

Thanks for your very detailed post here. This will likely end up as a case for the QA team to look over but I would try changing your DNS servers from the ISP provided ones to Google DNS or Cloudflare DNS as a troubleshooting step.

Your network here seems pretty complex too, we have some suggestions for setting it up properly in our Networking Best Practices Guide that you may want to take a look over, but I would also simplify it a bit and see if you are able to experience the same issue if you connect both your Core and the Endpoints (KEF via Etherent) directly to the pfSense Router as this would give us another good data point. Can you please give those suggestions a try and let me know if it helps?

Thanks,
Noris

Hi @noris and @wizardofoz, thanks for the quick replies and the suggestions.

Regarding the DNS alternatives you both suggested, I’ve already tried swapping my local ISP DNS for CloudFlare’s DNS on numerous occasions, as I noted in the initial post. This included reboots of the stack to clear any cached DNS states. However there was no change in behaviour. I have not tried using Google’s DNS however, so I’ll add that to my list of things to try and report back, thank you.

@wizardofoz - Regarding your MQA question, it fails for both red book FLAC and MQA encoded files from Tidal (as I also noted in the initial post)

@noris - I’ve already been through the Networking Best Practices Guide and simplified it as much as possible with the current kit, for example I removed the pair of Orbi’s and the 2nd switch from the path to the Core, and the problem remains (that said, the Orbi’s and 2nd swtich are still connected to the network so that I can connect my control points to the network).

Regarding the suggestion to connect the Core directly to the pfSense router, the pfSense router box currently has only 1 WAN port and 1 LAN port. This means that I can’t plug the Core directly into the router and still retain connectivity for my control points. That said, I do have an old Netgear R7000 4 port router with Tomato firmware that I can put in place of the pfSense router, which would also remove the pfSense appliance from the mix, and add Tomato firmware to the mix.

The KEF endpoint is wired to the bridged pair of NICs on the Core server. I’ll retain that for the initial test with the R7000 (ie try to change one thing at a time to isolate where the problem is), then try the Kef’s wired to the R7000 on separate link if I can find another 30m cable.

While I have the R7000 in play I can enable the Wifi radios on the R7000 and disconnect the Orbi’s altogether to see if their mere prescense on the network is the issue (tho I doubt it, its worth a try at this point).

Thanks again for the suggestions, any others welcome! :slight_smile:

Also, is there a way to clear the Roon Core’s TIDAL cache so I can speed up my testing by playing back the same albums?

Cheers,
Dunc

1 Like

I had this same problem. Very eventually, I found that I had set up the network such that a loop generated a lot of excess traffic.

So maybe temporarily disconnecting everything else from the network except for Roon may shed some light.

My only training in networking is my limited experience. So, my 2c

Hey @Duncan_Simpson,

If you go about replacing the pfSense router with the R7000, please be sure to disable the “Enable Smart Connect” setting on it as we have seen that sometimes cause issues. If everything is working on the R7000 you might want to try adding an unmanaged switch directly after that pfSense port and use that as a method to verify where the issue lies.

As for TIDAL cache clearing, you can use these instructions:

Thanks,
Noris

Thanks for the replies @noris and @John_V .

I changed my DNS to Google’s yesterday and queue’d up some albums to play over night. The problem occurred again however. I realised this morning that I hadn’t rebooted the Core after changing the DNS servers. I suspect this doesn’t matter because Arch Linux doesn’t cache DNS queries by default, however I have rebooted it this morning and will queue up some more tracks, just to be sure.

If the problem persists today, I will swap the pfSense appliance out for the R7000 and go from there.

I will also do some more digging regarding the loop suggestion from John. I don’t think I have a loop, and NetData isn’t showing excessive traffic on the Core, and neither is pfSense, however 20+ years in IT has taught me that when you’re chasing a needle in a haystack, so to speak, you should validate your assumptions and shouldn’t rule anything out without checking it first.

Thanks again!

Cheers,
Dunc

1 Like

I think I’ve resolved the problem, although I will continue testing over the weekend to be sure.

It appears to be something to do with the triple NAT’ing I had in my setup. I tried a number of things since my earlier posts (listed below), based on suggestions here from @noris and others, however it wasn’t until I removed the one of the 3 NAT’ing layers in my setup, in an attempt to further simplify my setup, that the problem went away.

My Roon system has now been playing new music for about 24hrs from Tidal with no problem. Previous best stretch was about 5-6 hours before it would skip part way through a track with this error. I will continue testing over the weekend, however for now I’m relatively confident the problem is gone.

For reference, in case in helps others with similar issues, I had 3 NAT’ing layers because my ISP does Carrier Grade NAT (GC-NAT), my Vigor 130 modem/router got the IP address from the ISP after authenticating via PPPoA, then NAT’ed to the R7000 (or pfSense when I had that in the mix) which is a more feature rich NAT’ing firewall router and the R7000 used DHCP to get an address from the Vigor. While the Vigor supports various full bridge modes, neither the R7000 or pfSense supports PPPoA authentication required by my ISP, hence the need to do NATing twice in this case. The Vigor 130 does also have a PPPoE/PPPoA pass through mode (sort of a half bridge?) however that worked only with the R7000 in PPPoE mode, and not the pfSense in PPPoE mode for some reason, so I didn’t have that enabled earlier. This PPPoE/PPPoA mode on the Vigor allows the R7000 router, when in PPPoE mode, to obtain an IP address directly from my ISP and thus removes a NATing layer.

I enabled this last night which changed my setup from 3 layers of NATing, each with its own NAT session state tables and timeouts, to 2. It isn’t clear if the issue was with one specific layer, or some combination of two or all three layers.

From what I’ve read on NAT layers, they typically hold a state table for each TCP or UDP connection traversing the NAT layer. In order to manage memory on the device, these tables have a max number of entries they allow, and each entry has a timeout (presumably an idle timeout, but possibly also a max TTL timeout regardless of activity).

My theory for the problem I was seeing is one of the NAT layers was removing the entry for the NAT session for the link between the Roon Core to the Tidal Cache server. This may have been because it reached max entries, or timed out, or possibly packets were traversing with delays and getting out of sequence causing one of the layers to close the connection because it didn’t like the out of sequence packets. When Roon next tried to reuse the connection, the Tidal server, or one of the NAT layers would have sent a RST packet as it should in this scenario because it legitimately thinks the connection no longer exists. Thus the ‘Connection reset by peer’ error occurs in Roon whilst streaming the media file. Some apps and servers get around this issue by offering a keepalive option, however from what I’ve read, this can create other problems, so it isn’t always turned on by default.

The part that still puzzles me is why would I notice this only on new tracks, and not any track streamed from Tidal? I’m starting to doubt myself that it is only on new tracks, but I’m choosing to move on given it seems to be working now :smiley:

@noris - while my situation is a relative corner or edge case, it might make sense for the Roon Core to have a retry for the streaming media cache component, perhaps the devs could look into this, maybe retry a new connection once before giving up and moving on? Just a suggestion of course.

For completeness in case it helps others, here are the other things I tried (other that what is already noted in earlier posts):

  • I rebooted my Core server after switching the router to use Google DNS, problem persists.
  • Next I swapped the pfSense appliance out for my old R7000 running Tomato firmware, with the first GS108 (an unmanaged switch) still in play. So the chain became ADSL line enters house -> xDSL Master filter -> DrayTek Vigor 130 -> R7000 -> Netgear GS108 switch -> Roon Core server (with local media on SSD) -> bridged network on 2nd NIC -> KEF LS50 Wireless. The problem persists. The constant UDP errors reported in NetData on the Core (that happened regardless of playback) are now gone tho so that seems to be a separate interoperability issue between pfSense and RoonServer.
  • After that I connected the Core and the Kef LS50W endpoint to the 2nd GS108 and removed the link from that switch to the Orbi (to take that out of a loop), replacing the Orbi link with a 30m cat 6 cable to the first GS108 by the router. So the chain became ADSL line enters house -> xDSL Master filter -> DrayTek Vigor 130 -> R7000 -> Netgear GS108 switch -> 2nd Netgear GS108 -> Roon Core server (with local media on SSD) and the KEF LS50 Wireless also off the 2nd Netgear switch. Thus the Kefs are no longer using the network bridge on the core (the bridge is still defined, just 1 link is active). This ran for quite a few hours with no problem and I thought ah ha! To be sure, I ran this over night, and unfortunately the problem came back.
  • My next step was to remove the network bridge on the Core server (given the KEF endpoint is no longer served via that bridge after previous test). The problem came back on the 2nd track of the first Tidal album I played.
  • The two GS108 were replacements I put in December for a pair of aging fanless 8-port Cisco 2960’s, and the problems started well before then, at least since I built the Arch server for Roon back in August. So I don’t suspect the GS108’s. That said, I plugged the 2nd GS108 directly in to the R7000. After a day of playing the problem persists.
  • Next I renabled the radios on the R7000 and disconnected and turn off the Orbis and the other switches so they are completely disconnected and powered down. Problem persisted. (And wifi was terrible because its at the wrong end of the house - this was why I got the Orbis).
  • I reviewed my Arch server build log to see if anything could be network related, can’t see anything obvious.
  • Reconnected and powered up the Orbi’s, disabled the radios on the R7000, so I could get good Wifi back at least.
  • Put the Vigor into PPoE/PPoA pass through, rebooted and put R7000 into PPoE mode. Connection is good, state tables on Vigor show it is no longer doing any firewalling or NATing, so that reduces NATing from 3 to 2 layers. After playing constantly over night, ie about 9-10 hours, problem appears to have gone. There were two endpoint connection errors (that I’ve occasionally seen in the past after long playback sessions), however no streaming media file errors. It was at this point I began to feel sure it might have been some interaction between the NATing layers.
  • Next I disconnected 30m cat6 cable from the R7000 to the switch that has Roon Server and my endpoint, and I put the Orbi link back in. Then I queued up a day’s worth of new Tidal tracks. With these last two steps, that represents 24 hrs of playback without the issue occuring.

As I said earlier, I will continue testing this weekend to be sure it is working now, and after that sometime next week I will also reinstate the network bridge on the Core and reconnect the KEF end point to that (and re-validate whether there is an improvement in SQ that warrants the extra complexity)

Thanks again for the suggestions. If anything changes over the weekend, I’ll post again.

Also, let me know if you choose to make any changes to Roon Core to handle connection drops in the streaming media component and maybe I can do some testing :slight_smile:

Cheers,
Dunc

Think of a fractal edge, go into the fractal, see an edge, repeat…at the far, far edge is your edge case :blush:

Indeed! :grin:

Hey @Duncan_Simpson,

Thank you for your post outlining everything you have done to correct the triple NAT issue. I’m going to pass your feedback to the tech team and will let you know what they say.

– Noris

Thanks Noris. FYI another 12+ hours of playback and its still going well so far :+1:

Cheers,
Dunc

1 Like

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.