Possible solution to NUC11 and Samsung 980 Pro random reboots

Hi Everyone, this is my first post but I figured I would share.

Recently I purchased a Nuc11 (NUC11PAHI7), I put a 2TB Samsung 980 Pro 2TB and 32GB of RAM. Of course I intended to run ROCK, but I realize I am more than capable of running it this way, so it’s not a problem.

I am running Ubuntu Server 20.04, in the NUC BIOS I have disabled all forms of sleep and extras such as firewire, sdcard, and WLAN.

THE SOLUTION

Basically, grab the firmware for your Samsung SSD from here SSD Tools & Software | Download | Samsung Semiconductor

Then paste it in the below and run it. The post advised unmounting, but since I only had the one disk I took the risk and ran it as is and just took the gamble that a reboot would fix everything. So far so good.

url="PASTE_THE_URL_HERE"
wget ${url}
apt-get -y install gzip unzip wget cpio
mkdir /mnt/iso
sudo mount -o loop ./${url##*/} /mnt/iso/
mkdir /tmp/fwupdate
cd /tmp/fwupdate
gzip -dc /mnt/iso/initrd | cpio -idv --no-absolute-filenames
cd root/fumagician/
sudo ./fumagician

I want to also say, 32GB is overkill in the memory department. 16GB is more than enough breathing room.
Below is what I have going on with 5 endpoints playing simultaneously, all have varying degrees of DSP

music@music:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi       4.7Gi       260Mi       5.0Mi        26Gi        25Gi
Swap:         8.0Gi       1.0Mi       8.0Gi

5 Likes

Welp, it rebooted again. Steps I have taken since the previous post is I removed thermald, and set the cpu governor to performance.

sudo apt remove thermald
sudo apt autoremove

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done

I have also disabled all sleeps

systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

At this point I am throwing mud at the wall and seeing what sticks.

Crashed again last night. Below is what I have done today. I am not saying any of these things are good ideas, or even necessary. Just documenting what I have done in my endeavors.


/etc/modprobe.d/blacklist.conf
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
blacklist i915
blacklist iwlwifi

I also upgraded my kernel to 5.4.0-92

5.4.0-92-generic #103-Ubuntu SMP Fri Nov 26 16:13:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Is this a known or common problem? I have a NUC11 with i7 CPU running windows and a 980 pro (256GB) that doesn’t reboot.

I would suspect RAM rather than the 980. Given you have 32GB, I’m assuming that it’s two sticks which means you could try removing one, then the other, and see if the reboots stop. If not you have eliminated the RAM from the equation.

It very well could be the RAM, but I’ve seen so many threads with similar issues and people definitely blame the memory quite frequently, but the common thread seems to be the Samsung 980 and Ubuntu 20.04. I am working on the assumption that my memory is fine, and seeing what I can disable/tweak to get it to work. If nothing works by this time next week, then I will probably take a stick out, that’s easy enough.

1 Like

Yeah at least with so much RAM at your disposal you can drop to half and still have loads while your ruling out the possibility of a bad stick!

Didn’t know about Ubuntu and the 980 thing at all. Wondering if I dodged a bullet. I went with windows on a bit of a whim.

I am also running Windows on a NUC11 with a 2TB Samsung 980 Pro, in a fan-less case without issue.
I even went with Windows 11, against some recommendations, yet have been pleasantly surprised.

It seems like a 24 hour thing. I have removed one of the RAM sticks. I also have a WD SN850 I can swap with. I don’t feel like transferring all the data over, but I might end up doing that if this doesn’t work out.

I think I just had bad hardware. Also I am mad at Intel, so I got an Asus PN50 (AMD CPU) and moved everything over. It’s not been 24 hours but I’m sure it’s fine.

It may be too late for me to say this, I think you should try disablling Turbo Boost in the NUC.

@wklie Not too late in that I disabled that, along with basically everything else. Most of the time this thing was headless with just an ethernet and power cable plugged in so it would reboot and music would stop. I realized it was dying when the stupid thing powered off for no reason in the BIOS screen while making changes.

This is a hardware fault, unrelated to the SSD firmware (unless the m.2 has bad contact). I guess it’s probably unrelated to RAM too - but it does not hurt to test it, and reseat everything.

If you have already disabled Turbo Boost, the next thing to do is send it to be repaired.

Alternatively, clean the CPU thermal pad, and apply an aftermarket CPU thermal paste.

Definitely hardware fault. I was barking up the wrong tree. That’s why it’s all in a new box now. Because even without the hardware fault, the thing was still flaky in Linux. I moved the memory and drive over. I ended up wiping and restoring from backup. It took a few hours to transfer my library back over, but no big deal. Everything is smooth sailing now.