Errors while identifying added music, paused metadata improver: hints for roonlabs to troubleshoot

I’m sorry that you don’t seem to be convinced by the evidence that this is not a networking issue. That both identifying new music and the metadata improver work from time to time on my network when the network hasn’t changed. That when the network does change and roon breaks, roon doesn’t start working correctly again when the network changes are reverted. That sometimes the bugs appear for long periods on one core while, when authorized, a different core doesn’t have these problems on an unchanged network. That roon connects to roonlab’s backend servers to authenticate my account and download metadata and identify music via the “Edit” and subsequent dialogs even when the bugs are manifest at the same time in the same running instance of the core.

I don’t believe providing the information you’re after will convince roonlab to put the necessary resources into fixing this, as they’ve chosen to let it fester for at least two years. But to satisfy your curiosity, I will tell you that I have either tried roonlab’s specific recommendations, or that my lan already meet particular criteria so there was nothing to be changed.

If one is highly concerned about security, they wouldn’t take all of roonlab’s recommendations, especially those in #4 in the list you quote.

This is for troubleshooting purpose, not for ongoing operation.

1 Like

Don’t think so. Directly before the list in Resolving “Metadata Improver: Halted” error messages it is stated that

If you’re seeing this type of issue frequently, it likely means that your Core machine or network is having trouble communicating with our servers. Below is a list of common troubleshooting steps that will help resolve this type of problem

If turning off your VPN “helps resolve this type of problem”, and the bug vanishes, and when you turn the VPN back on, the bug returns, the troubleshooting has revealed that the metadata improver won’t work over your VPN. Does one then have to find an alternative to using a VPN, say securely proxying through a rented machine in a datacenter in a different country, or reconfigure their VPN, or find a new VPN provider, so the metadata improver bug might go away, but only until DHCP hands out a different IP address, or if it’s Wednesday?

I have only ever seen the Error once , rebooting fixed it

I suspect that the answer lies in your specific config . I see reports of this error but if it were a widespread bug then this forum would be alight

Just my 2p

As @Mike_O_Neill mentioned, there’s little mention of this problem on the forum at the moment, so the issue is most likely related to the configuration of your problematic core - something about that device, or the way in which it connects to your “unchanged network” is triggering an edge case bug. If I were you, and I’ve mentioned this before, I would edit your original post to include as much info as possible regarding the devices in question, your network, connected audio devices and so on.

When you initially raised this problem in 2020 you were advised to try connecting your core to your primary router rather than going through a bunch of wifi extenders. Did you try that?

I wasn’t using any “wifi extenders”. I had a pair of routers running in AP mode acting as a wifi bridge. That’s very different.

After @noris suggested that they “believe” the problem occurs because of packet fragmentation, I analyzed the MTU from my roon core to the wan and saw no settings that would cause packet fragmentation. I asked him if they detected any service request timeouts in the logs that I sent and then the conversation went dead on his end.

In the interim, that machine has been put on the wired lan for a bit and roon was still showing the bug. Another roonlab network-at-fault hypothesis bit the dust.

The core server that suddenly got buggy prompted this thread has suddenly self healed after a few days.

Good news. If it happens again, try connecting it directly to your primary router.

It is unfortunate that roonlabs is still ignoring this bug.

It is a time sink for me to continue responding to all of you who are trying to be helpful but are only reiterating the party line that it’s the network when all evidence points elsewhere.

It is time for roonlabs to stop ignoring the 160 reported instances of this bug, plus the who-knows-how-many unreported instances when it magically heals itself and so goes unreported. I count approx 6 reports in the last 6 weeks. Given 130 weeks since my first bug report on this and around 160 hits from the query I posted earlier, the reporting frequency is pretty consistent (1 x 130 ~= 160).

It is time for roonlabs to fix this nuisance and give users back the time they waste working around this or fiddling with their networks, hoping that some random change will trigger another edge case that temporarily “fixes” the bug.

FWIW, gentlepersons, I’ve been writing software since the 70s and architecting large systems since the 80s. I designed and built back-end systems and user-facing apps at amazon. In 1987, we put our dev team on the NSFNET (a successor to the ARPANET) before the internet existed and enabled remote developers. I’ve been maintaining my home networks since the dawn of the PC era. I built the first browser-based ATM locator mapping app on the internet (and the back end for it) for Visa for the 1996 Atlanta Summer Olympic games a decade before google gave free maps to the world. We have sufficient expertise in-house to isolate and tell the difference between network and software problems without having the code to root-cause bugs.

Again, @DaveN, I’ll answer your question even though it’s not relevant to resolving this. Yes, I’ve had the NUC on the wired part of the lan once and it was still broken. noris postulated that packet fragmentation might be the problem, and as you read, I showed there was no packet fragmentation here. I continued this discussion via PM with other support staff a couple of months later, they didn’t want any more information, said they’re still looking into it, yet I doubt they are after 2y4m since they are definitely capable of fixing this.

1 Like

Thanks for taking the time to answer my question. I guess the bottom line is that this is a relatively rare bug - 1 report per week, from several hundred thousand users - and doubly difficult to pin down as it’s intermittent. There’s clearly something in your setup (by which I mean both hardware and software), and the way in which it interacts with the relevant Roon server, that triggers the problem.

Yeah, you would think they would be able to fix it, but I think it depends on what needs fixing. There are a few examples here that demonstrate that it’s not always a straightforward process, as I’m sure you’re aware.

Hi @ezman,

I’d like to take a look at your core’s logs. Can you leave the core online and if possible, cite the date/time you observe the issue occurring?

I already know from reading your thread here that this question won’t make you happy but I need to know. Has your modem or router config changed? From what I see in your old post you have a modem gateway (and I know from experience that your IP scheme is Xfinity when I look at the server side of things). It also sounds like you have another router being used, is that correct?

Thanks,
Wes

Thanks for responding, @Wes.

The bug had gone into hibernation by the time I wrote this post four days ago. The logs from before then, when the bugs were active, are gone. Log backup retention seems to be set to 20.

My ISP is Charter/Spectrum. The router is still the same R6400 as when I made by first report 2 years ago. The NUC is still on the other side of the wifi bridge, same as before. Its metadata improver was paused for most of the past 2 years while running roon core under Windows 8.1. A few months ago, I installed ubuntu on the NUC and roon core has been behaving well on it. Note that there was no change in network topology, just a fresh install on a fresh/different OS.

Charter/Spectrum replaced my gateway (cable modem) in 2020 or 2021. That was almost certainly after I filed the original bug report.

I looked at one of the recent logs on the core server that this report pertains to, while roon was working correctly, and could see that it was having trouble getting to some of your google cloud hosts. The log had several “[easyhttp] … No route to host” errors though it was able to resolve the IP addresses! I don’t know how easyhttp was doing the DNS name resolution because the resolved configuration was screwed up by me while trying to troubleshoot these bugs after adding the second NIC. (Since fixed!) In spite of these errors, I was pulling metadata on demand.

08/17 12:22:37 Warn: [easyhttp] [1] Post https://accounts5.roonlabs.com/accounts/3/login web exception without response: No route to host (34.148.110.116:443) No route to host (34.148.110.116:443)
08/17 12:22:37 Warn: [easyhttp] [4] Post https://accounts5.roonlabs.com/accounts/3/login web exception without response: No route to host (34.148.110.116:443) No route to host (34.148.110.116:443)
08/17 12:22:37 Warn: [easyhttp] [2] Post https://accounts5.roonlabs.com/accounts/3/machineallocate web exception without response: No route to host (34.148.110.116:443) No route to host (34.148.110.116:443)
08/17 12:22:37 Warn: [easyhttp] [5] Get https://geoip.roonlabs.net/geoip/1/lookup web exception without response: No route to host (35.231.208.158:443) No route to host (35.231.208.158:443)
08/17 12:22:37 Warn: [easyhttp] [3] Get https://devicedb.roonlabs.net/1/devicedb-prod.zip web exception without response: No route to host (35.231.208.158:443) No route to host (35.231.208.158:443)
08/17 12:22:37 Warn: [devicedb] While refreshing, status: 999, body: System.Net.WebException: No route to host (35.231.208.158:443)
 ---> System.Net.Http.HttpRequestException: No route to host (35.231.208.158:443)
 ---> System.Net.Sockets.SocketException (113): No route to host
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at 
<snip>
   at Base.EasyHttp.QueryAsyncInternal(HttpMethod method, Params p, CancellationToken canceltoken, IAuthProvider auth)

Finally, the core server is usually on a 10Gbe QNAP managed switch with another machine with a fast NIC. Both it and the other machine were temporarily on an old D-Link 1Gbe 8-port unmanaged switch while troubleshooting this. They’re both back on the QNAP and working correctly.

The next time the bugs rear their heads, I’ll grab logs and get back in touch.

- Eric

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.

About a week ago, I again reported this 2+ year old bug again where roon cannot identify music when it’s added to a library and the metadata improver pauses indefinitely. It’s been closed by someone so I’m continuing it here.

I notice in the log files that roon appears to cache dns entries.

08/17 12:22:31 Info: Starting RoonServer v1.8 (build 1021) stable on linuxx64
08/17 12:22:31 Info: Local time is 8/17/2022 12:22:31 PM, UTC time is 8/17/2022 7:22:31 PM
08/17 12:22:31 Trace: [roondns] loaded 22 last-known-good entries

@Wes, does this explain how roon seems able to resolve FQDNs when it can’t route to google’s cloud as noted in my last post in the above thread?

This has been reported five more times in the last five days. Maybe some users have resolved this by following the usual advice to check their users’ lans, reboot, and so one, maybe some have been resolved by random acts of buggy behavior, maybe some are still open.

Step up, roonlabs!

@ezman, I read your previous posts. Have you had any more instances with the metadata improver since your last post? If so, were you able to submit your logs to @Wes? If no issues, has Roon been stable otherwise for you?

Hi @ezman,

You’ve definitely got a lot of data for us to peruse here. I’ve been discussing your case with research and development and we’d like you to try something for us.

Will you please try changing your core to a different machine, adding music, and seeing if you can duplicate the issue? Temporarily changing your core to a Windows or Mac machine could help us tremendously. If you duplicate the issue, please notate the date/time and provide logging for the instance.

We’ll look forward to hearing back from you.

Thanks,
Wes

@Wes I’d like to make a plan for this experiment with you, and perhaps with input from your R&D team, regarding the network location/configuration of another machine. I’m moving our conversation to PM for now.

- Eric

@ezman - I appreciate your fighting the cause here. I am one of the five users that you highlighted a few posts ago and the issue has still not gone away. Please keep us updated with progress. Like you, I’m a software veteran, admittedly more in the mainframe side of things although my current role involves branching into many Linux-related areas.

I must admit I haven’t got around to the router / modem restart yet, but there are two reasons for that, one is that it would inconvenience other users of the network, and the second is that none of my other devices are experiencing any network or connectivity issues whatsoever, so we’re sort of living without the metadata improver right now.

But why is my networking hardware assumed to be the problem here? Every other component within Roon that requires network comms is working perfectly. I have also highlighted the anomaly between this message in the log:

08/18 18:24:19 Warn: [easyhttp] [16] Post https://identifier.roonlabs.net/identifier/2/album web exception without response: Network is unreachable (35.231.208.158:443) Network is unreachable (35.231.208.158:443)

yet the same box can ping that IP address from the command line without fault:

dloader@deb-mus-svr:~$ ping 35.231.208.158
PING 35.231.208.158 (35.231.208.158) 56(84) bytes of data.
64 bytes from 35.231.208.158: icmp_seq=1 ttl=104 time=106 ms
64 bytes from 35.231.208.158: icmp_seq=2 ttl=104 time=100 ms

can someone explain to me how my router could cause that? Either a remote IP address is reachable through a network device or it isn’t. If ping can reach it but the software can’t, then I’m sorry but the problem is with the software.

As for those 4 networking ‘best practices’ steps - I’m sorry, what? Why? Roon should damned well work with whatever DNS I have set, thank you very much. Every other piece of network-dependent software I run manages to. Disable IPv6 and not use certain bits of networking gear? I expect this kind of nonsense with open source but this is a $600 piece of proprietary software. This stuff should be taken care of within the code, not lumped onto the back of the customer to deal with.

@ezman - if you need me to provide any logs from my machine in order to make meaningful progress, please let me know.

We concur. Thanks for the support.

For me, the only plausible network-related problem that could make these two functions fail while everything else network-related works is some critical latency requirement, probably related to an authentication call.

Along the way – PM? bug thread? – someone from roonlabs speculated that packet fragmentation might have been the culprit in my lan. I surmised from that that if some network call to an authentication service sends packets that fragment, and the fragments take very different routes, and the round trip latency for the call after reassembly exceeds some threshold, their code assumes that the user isn’t licensed and so they turn off these two metadata-related functions in the code.

Otherwise, yeah, if you can ping their cloud hosts, and dns is resolving, and your browser is working, and everything else in roon that uses the internet is working, then the problem with the metadata improver and batch loading metadata for new music isn’t related to the lan or wan interface.

- Eric