M.2 SSD in NUC8 keeps dying

If I were you I’d have purchased a different brand after it failed a second time.

It could be an issue with the controller hardware in the NUC. One option would be to install Windows as a test and run it for several weeks. That would allow you to run software to check the hard drive. Might give you some clue(s) as to what is happening, like if you notice bad sectors increasing every day.

A SMART error is a hardware error.

If this were a desktop, I’d recommend replacement of the ATX power supply. However, since it’s a NUC it’s not using ATX this cannot be done.

Although it is possible that the NUC8 itself is faulty, this cannot be proven until a different brand of m.2 SSD also failed. I suspect doing RMA for the NUC8 would only result in a “no problem found” diagnosis and got returned while wasting time and shipping cost in the process.

Another suspect is heat (although SSD should throttle instead of dying when overheated). What type of CPU are you using in the NUC8? i3, i5, or i7? Are you using a fanless chassis? If you’re not using a fanless chassis, have you set the BIOS FAN option to something non-default such as Quiet? Is your NUC stored in a closet instead of an open area?

A cure?

Mean time between failures(MTBF) reaching 1.5 million hours

Lowest capacity is 500 GB. A waste for running RoonOS, but it’s cheap on Amazon at $75, which is competitive with a Samsung 970 EVO of 256 GB.

I don’t see how it improves reliability other than reducing costs. According to one SSD reliability study 2020, “Higher density cells exhibit more failures”.

I don’t think MTBF can be interpreted on a standalone basis at the user level. It might, however, point to relative reliability among products (only) from the same brand. Other than that, I’d ignore MTBF. I’d say user reports are infinitely more useful. I’d regularly read all the one-star and two-star comments before I buy something. See if there is an intrinsic recurring pattern of failure.

I still don’t think, this is a particular problem of the Kingston SSD. The manufacturer gives a 5 year warranty for that product. They wouldn’t do that, if it had a systematic HW issue, that causes early failures.
Anyhow, I follow Peter’s suggestion, I installed a different SSD from WD. Let’s see how that goes.

Heat shouldn’t be an issue, I use the original housing with unchanged fan options in BIOS.
It’s an i7 NUC, which doesn’t get warm at all. The fan usually only starts, when I add a few new albums during the audio analysis. The NUC stands in an open rack.

You can’t get better clues for what might be causing things unless you can run diagnostic software. Could it be heat? You are arguing it can’t be because environment, but that doesn’t mean it isn’t heat. You need accurate real-time temp readings.

What you know is that the system is throwing a SMART error on boot. SMART collects error information and throws that error when a threshold for that error has been reached, see here for more info. What error is actually triggering it?

The next step should be running software to access the SMART data on the drive and see if there are better indicators/descriptors of the actual error. That software can come from the drive manufacturer, like SeaTools for a Seagate Drive, or, other tools, like Linux’s “smartmontools”.

My earlier point is that you have to be using a different OS to run these, as ROCK does not let you. So, you can use Windows, or, you can install Linux, or even setup a Linux Live boot to run the utilities.

1 Like

If you can obtain a SMART report from another OS as Rugby suggested and post it, I’ll read it for you.

(Although I don’t think this is in my job description, I actually read several SMART report every month for SATA and USB HDD at work.)

2 Likes

Hi Ruby and Peter,

I appreciate your ideas, but I won’t be able to do that. I don’t have the equipment and patience to do all these installations and tests. I really bought into Roon’s idea of the NUC as an “appliance”. Set it up, put it in a corner and forget it.
If this won’t work out, I’ll have to look into another solution.
I hope I’ll get a final statement from Noris and the QA team.

This is so until it fails… same with all appliances or gadgets, isn’t it? Now, when it fails, you can either put in some energy and try and repair, or you throw it out onto the garbage and get a new one… It’s your call.

You have your NUC and a USB stick to fire up a Linux live distro which can be downloaded isn’t new equipment either, is it? So, lack of patience it is.

See, how could anybody possibly support your assumption if you cannot be bothered with doing some homework to gather diagnostic data? People offer you help with doing the hard part, but not even so. I sometimes do wonder what folks expect when coming onto the forum with a problem…

If this is the way, people are treated in an official support forum, I’m off.

Seems like a lot of people tried to help you. That includes Roon employees as well as other Roon users who are not Roon employees. If this was happening to me, I would not have the knowledge to try some of the suggestions. I guess I would swap out the SSD with a different brand another time or two. If that didn’t work, maybe I would return the NUC as defective.

Hi Matthias_Dahms,

I spoke to our QA team regarding your report and QA agrees that this issue is hardware related, it is not clear what is causing the drives to fail, but further diagnostics is required to troubleshoot this issue, one possible suggestion is installing another operating system and running a diagnostics tool.

While ROCK is an “appliance style” device, it is built using your own hardware, and hardware sometimes has issues and needs to be replaced.

Since this is not a Nucleus, we cannot comment on the quality of SSD in use of if there are some other issues with your specific NUC’s hardware.

If you don’t want to troubleshoot this further yourself, you can also look up your local Roon dealer, they might have some experience with ROCK and/or troubleshooting bad SSDs.

Hi @Supersonic,
Stumbled upon your post. I too have had a bad experience with NUC and Kingston NVMe-disk, similar to yours and @Ken_Talbot (I’m not running Roon on it as of yet though). I would be very happy if you continue to share your experience with the forum. I’m about to open a support case at intel and any help and further information about this problem would be highly appreciated. Somewhat comforting not being alone in this.

I have a NUC8I3BEH and Kingston SA2000M8/1000G. Failed on me twice since march due to smart bad blocks. Replaced the first time and refunded the second.

edit: Spelling item=>Intel

I don’t understand. Why should this be an Intel support request when a Kingston model failed twice?

Unless a SSD from a different brand using a different SSD controller design also failed, there is no reason to suspect the NUC to be faulty.

Are these Kingston modules listed by Intel as compatible?

Thanks for your reply @wklie. The two Kingston disks I tested may both have been faulty of course, I see your point. What leads me to believe that this is an Intel issue is that Kingston seems to be a good brand in general and that the faulty disks behaved in a very similar way before crashing. The model is also listed as Manufacturer Validated on Intels web site.

Additionally, I would indeed like to test another disk and get my system up and running, but i feel that I’m running a risk on spending more time and money on failing disks. I hope that intel support can shed some light on this matter, or at lease letting them know that these issues occur.

HI Fullscreen

Have you seen this thread

I am as sure as I can be that the issue I had was with a Transcend M2 SSD

This is the thread (below) where I raised the issue, I replaced the SSD well over a month ago and so far no more issues (fingers crossed). When the Transcend SSD was in place I had to reboot every day or so

Yes, under “Drives: M.2 PCIe”, model “960GB A1000 PCIe NVMe”, part number “SA1000M8/960G” in my case.