Help me understand why my NUC keeps becoming unavailable?

Hi all. I’ve been running my Roon core on a NUC 7i7DNH1E for about a year; in the last month or so it started to behave unexpectedly: Every few days, but sometimes more often and sometimes less often, I fire up Roon and it can’t find my core. The NUC won’t allow an incoming remote desktop connection, and indeed the router shows that it’s no longer connected to my network.

To troubleshoot, I hook up a monitor/kb/mouse (it’s running headless) but it never sends any video output in this state, either, so I can’t observe if there’s something obvious. Inevitably, I just power cycle it, and it reconnects, finds the monitor again, loads up Roon, and things appear back to normal.

It’s additionally hard to troubleshoot because it’s erratic. I’ve tried adjustments to the power modes, ensuring that it doesn’t sleep, etc; but have to wait a few days before I see if it’s had any effect. I’ve swapped it from ethernet to wifi and vice-versa. I’ve made sure that my profile was properly logging in after updates, so that it wasn’t getting hung up in some “finishing update” state, so I don’t think that’s the cause. The fact that it’s not sending video output (or perhaps it’s not even recognizing the KB/mouse input) or appearing on the network makes me think maybe something is up in the BIOS, perhaps after what would otherwise be a routine windows event like a patch-restart. But I really don’t know what to look for, and am particularly puzzled that this didn’t happen for nine months or so – what changed? I really don’t know.

Anybody have any ideas what I should try? I’m open to just about anything!

Are you running ROCK? Or some other OS?

From your short description its sounds like something is causing it to lock-up hard. That can a number of things including overheating, disk errors, memory errors, etc. Need to know more about the set-up, OS, memory, drive configuration, etc. What can change in 9 months? Hardware failure sadly. But… provide more and the community will dive-in I’m sure.

Ah, thanks @ipeverywhere, I should have noted: My NUC is running Windows 10 pro.

component details
OS Build 19042
Processor Intel® Core™ i7-8650U CPU @ 1.90GHz 2.11 GHz
Installed RAM 16.0 GB (15.9 GB usable)
Disk 1 1TB Samsung M.2 - boot/OS/apps
Disk 2 500GB SSD - library/storage

I did a little more spelunking tonight. Chkdsk turns up no bad sectors on either disk.

Then I ran a memory test; the windows log says this produced no errors. BUT: the NUC didn’t come back up successfully when restarting after the post-test reboot.

I started exploring the events viewer around the time the memory test was completed and see a bunch of warnings like this:

The application-specific permission settings do not grant Local Launch permission for the COM Server application with CLSID
to the user NT AUTHORITY\SYSTEM SID (S-1-5-18) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.

This is where the guts of Windows are way beyond me. I see clusters of errors like that which might correspond in time to the NUC going dark on me. But I’ll never figure out what to do about them using the Component Services admin tool.

So — maybe I’ve found something?

Good hunting… also beyond my Windows Foo :frowning:

I’d say switch to Linux but that’d be rude. I’d still assume hardware error but hopeful someone with deeper Windows experience will jump in.

1 Like

Nope, those are Windows useless errors and mean nothing. The errors are by design and are harmless. So, not the issue.

I’d run a OS drive test.

Oh, I’m thinking about that, already! I have plenty of room to dual boot and try it out. Should probably make sure I’m not dealing with something in hardware, first, but this is definitely on the list.

Thanks, Rugby. By this, do you mean something beyond chkdsk? That didn’t turn up anything.

No, I meant chkdsk.

Check to see if there is an awaiting Microsoft update. Sometimes, a waiting update can cause system weirdness.

I’ve had the NUC hooked up to a spare monitor for the weekend and wandered past it in the middle of the night to find a blue screen of death proclaiming a WHEA UNCORRECTABLE ERROR, which apparently is a pretty broad category but at least gives me something to hunt for.