General discussion: What is killing boot SSDs?

The boot SSD on my Nucleus+ became corrupted, and this seems to be a worryingly common problem.

Power failures are one reason why SSDs can get corrupted. I think this is a big problem with the Nucleus, which is presented as an easy to use, turn-key solution with reliable and robust operation. Pulling the power plug should not cause damage to a consumer device like this.

The Nucleus should have an internal battery, to be able to handle temporary interruptions in power supply and perform a graceful shutdown in case of total loss of power.

I am not implying that I think this is the reason for all dead SSDs, just pointing out one possible cause, and a related Nucleus specific problem.

I have a separate support topic for the problem with my Nucleus, I created this topic for a more general discussion about what is going on with SSDs. Not sure about the best category for this topic, but most likely it is not Support.

I also very much hope it is not Roon Software.

My Nucleus SSD was a Mfg Date 0608 for the 64GB ADATA SATA drive which I assume is June 2008. I’m not sure if others have checked their installed hardware - but I know at least 3 with SSD’s that have died/crashed/become issues and have needed/received replacements directly or via Roon Distributor channels. Some of these are Rev A and others Rev B units.

I recently replaced my Nucleus SSD with an M.2 NVMe image cloned from the original SSD - but its not something most would be able to accomplish due to the steps and hardware involved plus a working Nucleus image for which those in this situation are no longer capable of performing even if the wanted to. Interestingly the new SSD was an NVMe drive so its a bit zippier than the older SATA drive.

Note my Nucleus is no longer in warranty and was a Rev A unit.

2 Likes

That is a feature request and is not a trivial one to implement. Unexpected loss of power is a system issue and easily dealt with at home by running everything that matters from a small UPS. It isn’t just the NUC that might suffer corruption, any powered audio file storage drives could too.
Performing graceful shutdowns requires suitable UPS software to be running on the computer. When power fails, the UPS signals the computer to shutdown, usually via a dedicated USB connection, otherwise it doesn’t know to do so. So clearly, this would not be possible currently with the ROCK/Nucleus.

Alternatively, using a laptop as Core and storage would achieve the same thing, having a built in battery. It would also likely shut itself down gracefully when the battery was low.

5 Likes

I disagree. Servers should be run on a UPS. RoonServer is, in essence, a database server using leveldb. And, yes, it should have utilities to have a graceful shutdown when connected to a UPS via USB.

This is also one of the reasons I run Roon on Windows, so I can run UPS monitoring software and have things shut down gracefully. (In addition to being able to run hardware diagnostics / preventative maintenance). Haven’t had corruption in 6 years.

There is a linux utility called apcupsd which would do the job if it was loaded on RoonOS. But, I’m not sure on the licensing on that software. Perhaps, this should be a Feature Request.

I don’t know if Nucleus/Rock runs fsck at startup, it not, it possibly should. The fast M2 drives might also be cooking, I was surprised how how they get - my Alaska case included an M2 heatsink, not sure if a Nucleus does.

The Nucleus throttles the CPU to maintain temps. I haven’t taken a Nucleus apart so I can’t speak to any special heat sinks, aka an Akasa case. However, I’m not sure the Nucleus actually uses the “fast and hot M2 drives” you are referencing.

1 Like

Yes, to clarify, my limited “hot M2 drives” experience is with Rock, not Nucleus.

This is my Nucleus Rev A with upgraded 128GB NVMe M.2 with a heatsink clamped to it and I used the old pink thermal transfer on top of that to the 2.5" SSD mounting plate for added heat transfer.

Seems like the Rev B un its are using an NVMe part now.

I also removed the NUC board and removed and reapplied new thermal paste on the CPU die as the old original one was dried up (maybe 4 years old) and is a good preventative measure if you have the skills to do so.

4 Likes

Nice pics, thanks, and good thinking on thermal paste.

2 Likes

Once you have stripped down to this state no point to no go the whole hog so to speak.

Do note that the process of getting the original SATA SSD cloned over to an NVMe disk is not for the average user - involved 2 different M.2 carrier units and 2 different OS’s utilities (maybe could have just used Ubuntu) but it was a exercise in possibility and due diligence on my part with the stories of so many Nucleus SSD’s calling it a day in one form or another. Plus the fact that mine was SATA not NVMe and from the looks of things an older drive manufactured date, plus my Nucleus is out of warranty.

My Nucleus is essentially a back up unit for my Win10 beefier machine but does now have the 128GB NVMe and 12GB of RAM instead of the original 4GB - but its still an i3 CPU so isn’t going to replace my W10 i7-7700 in performance where I have 270K tracks in the database.

1 Like

What drive shipped with the Nucleus and was that pink stuff the entirety of the heat sink for the drive?

It was an ADATA 64GB SATA M.2 drive Mfg Date 0608. And yes the pink stuff (2 layers) approx 30x15mm and maybe 4-5mm thick stacked to meet the SSD 2.5" plate for the original drive (its rather shorter) cooling. But in my installation of the NVMe 128GB 2280 length drive and the aluminium heat sink (also a aluminium plate on the bottom) the 2 old pink ones were a good solution to the top part tho I wonder if just an air gap would be similarly OK.

I don’t disagree. But due to the way the Nucleus is presented, the way it looks, and the way it is sold in Hi-Fi stores, I wonder how many people there are who don’t realize it can’t be treated like most other consumer devices.

What makes things worse, the power button is located behind the unit, so it can be hard to see if the Nucleus is on or not. Yes, I know it should be kept on all the time, but in a domestic setting this might not always be possible or even desired.

Probably too difficult to implement for a company the size of Roon Labs, I guess. But I think that there is a market for “an easy to use, turn-key solution with reliable and robust operation”, without having to worry about things like losing power. And it may seem to many that the Nucleus is in that category, and this could be causing dead drives.

I have the new fanless MacBook Air, and have to say it is an absolutely wonderful computer that makes the Nucleus, basically a big and very heavy heatsink, look like ancient technology. And I believe Apple silicon performance is adequate enough for running a Roon server. I don’t know much about other fanless laptops, are they performant enough and can they already run Roon software natively?

This sounds very scary. My Nucleus at least does not have an M.2 heatsink.

So I wonder if the majority of dead drives have been on a Nucleus? Making this a Nucleus specific problem?

Please identify the brand, model and capacity of the m.2 SSD you have.

Yes. SSD is especially prone to such a problem. In fact, the older SSD had firmware issues with sleep too, so they were best run in 7x24 environment and never powered off.

Yes, some of them do get really hot. I recall a hardware review website published a measurement of over 110 degrees Celsius several years ago.

No, this is not a Nucleus specific problem.

I reiterate my position that people should buy only from a SSD manufacturer who manufactures its own NAND chips.

Do not buy QLC, especially not for boot drive - even though everyone else thinks QLC is a good idea for read-only music data.

2 Likes

Transcend TS128GMTE110S, manufacture date 10 2019.

I ordered a new Transcend drive with the same product code and noticed it had a different number of chips (4 instead of 2).

I would have gone with Samsung. My Samsung pro has been in my ROCK NUC for years without a hiccup.

1 Like

If you can cancel the purchase, I highly recommend you get a better one, perhaps Samsung 970. The EVO or cheaper versions should be using TLC. Rugby’s 970 PRO is even better and uses MLC.

Search for the brand you purchased in this forum, and look at its track record.

1 Like

Given that the boot drive on a Nucleus is a fairly critical application, why not spend a few more pennies and invest in a ~3DWPD Enterprise grade SSD with a 5 year warranty?

I have a ROCK/NUC so I suppose I have this risk too. By the way, can you point me to the source data for these failures? I’m not questioning anyone’s veracity, just want to read more about the problem.