Absolutely zero mention of retention for a storage device is disturbing.
The endurance figures seem to suggest anywhere between 6.6k and 11k cycles, which is both a wide range and unusually high for TLC flash - this is the normally expected range for decent MLC and 5 years of retention, so I suspect they're massaging the retention downwards to get those numbers.
I don't think most people grasp how abstractly high even 1 DWPD is compared to enterprise HDDs. On the enterprise side you'll often read that a hard drive is rated for maybe 550TB/year, translating to 0.05~0.1 DRWPD [1] (yes, combined read AND write) and you have to be fine with that. (..yeah admittedly the workloads for each are quite different, you can realistically achieve >1 DWPD on an nvme with e.g. a large LSM database).
What makes NVMe endurance ratings even better (though not for warranty purposes) is when your workload has sequential writes you can expect much higher effective endurance as most DWPD metrics are calculated for random 4k write, which is just about the worst case for flash with multi-megabyte erase blocks. It's my understanding that it's also in large part why there is some push for Zoned (hm-smr like) NVMe, where you can declare much higher DWPD.
I assume that flash translation layers use LSM-like patterns underneath to cope with small random writes. The best case for flash with a good translation layer is data that can be erased/discarded in bulk, since this minimizes write amplification. This is close enough to the "large sequential writes" case but not necessarily equivalent.
To ground these endurance figures a little more concretely, consumer drives will advertise 600TBW:TB, or 600 cycles! Starting at 10x that is a very solid starting place!
On the other hand, an enterprise drive like Kioxia CM7 will offer either 1 or 3 drive writes per day (for regular and write-intensive drive models, respectively), across the 5 year warranty. That's ~1800 cycles or just shy of 5500 cycles.
Generally they just chop off some space to allow for more to be used when it wears out, which is why enterprise is 3.84TB not 4TB. If you partition the drive smaller and never use the rest you can achieve similar endurance.
Do you have a source for that claim? Everything I've read suggests that data retention is mostly about the write temperature vs storage temperature, and that enterprise drives have about equivalent retention to consumer drives.
By retention I'm assuming you're referring to the amount of time it takes for data loss to occur on SSDs in cold unpowered storage.
Correct me if I'm mistaken but it looks to me like your graphs are talking about the number of P/E cycles accumulated, rather than the number of P/E cycles that a drive is rated for?
What this seems to suggest is that as a drive gets more "worn out", its data retention gets worse.
But I don't see how that can be taken to imply that enterprise drives have worse data retention than consumer drives. Nothing that I've seen suggests this.
> What this seems to suggest is that as a drive gets more "worn out", its data retention gets worse.
This is what informal tests (i.e. via scrubbing/resilvering the drive after leaving it powered off for a long time) have found. Retention/data remanence is remarkably good for a drive that has been written over just once, and quite bad (i.e. you start seeing bit errors) for one that's almost worn out. This is actually very good news for the EEPROM-like use case where rewrites are quite rare.
(Note that "almost worn out" in this case can mean going far beyond the formal total-data-written rating of the drive. We're talking the range where the hardware itself is about to croak.)
I'm not sure I fully understand what that page is saying.
It is my understanding that JEDEC standard tests the data retention of the "worst case" scenario where a drive is fully worn out, i.e the drive has reached its maximum rated P/E cycles.
If I'm understanding correctly, that page you linked is saying that enterprise drives have firmware that essentially allows more P/E cycles, which then means that at the end of those cycles, the drive will be more "worn out" and thus will have a worse data retention.
But in a real world usage scenario where we subject a consumer SSD and an enterprise SSD to the same number of P/E cycles, would they have different data retention? I thought the JEDEC data was only for end-of-life scenarios.
fully worn out, i.e the drive has reached its maximum rated P/E cycles.
That differs between enterprise and consumer - and the reason why the former is rated higher is because they've reduced the retention spec (to almost 1/4th).
But in a real world usage scenario where we subject a consumer SSD and an enterprise SSD to the same number of P/E cycles, would they have different data retention?
No, and that's the whole point of this: the same flash, with different definitions of "worn out", is quite misleading as they're just looking at different points on the same curve. It's all obfuscated marketing.
Incidentally, this is also why those widely-publicised tests that claim SSD endurance is not a problem by continuously writing until absolute failure and seeing many times the rated endurance (e.g. https://linustechtips.com/topic/327024-the-ssd-endurance-exp... ) are extremely misleading (albeit enlightening on how "enterprise" ratings are being calculated): they are showing how many cycles the flash will take before it's too leaky to store data long enough for the next verification pass, which may be less than an hour away. At that point it's almost behaving more like DRAM than nonvolatile memory.
Oh wow, yeah I see your point, that is indeed extremely misleading.
I had always thought that enterprise SSDs used higher quality flash than consumer SSDs because of the higher endurance guarantees - that was why I bought them. Now I feel like a big reason to buy enterprise SSD has been removed.
I am flabbergasted, to be honest. And I feel a bit cheated that I am paying much more for the same quality flash.
So would it be fair to say that the only reason to pay for the enterprise SSD premium is the power loss protection and more reliable firmware?
Definitely don't think of enterprise vs consumer SSDs as good vs bad; they're optimizing for different use cases.
Enterprise SSDs get you power loss protection and firmware QA aimed at server workloads and operating systems, and performance tuning prioritizing consistent sustained performance, and larger form factors with enabling higher capacity and higher power.
Consumer SSDs get you higher peak performance (eg. SLC caching) and orders of magnitude better idle power savings and QA against Windows and its NVMe driver, in form factors suitable for laptops and not requiring direct airflow over the SSD.
A very long time ago when the SSD market was still quite immature, there was a time period where "consumer" SSDs were little more than cut-down enterprise SSDs with inferior NAND and fewer features. But that changed well before NVMe showed up.
> But in a real world usage scenario where we subject a consumer SSD and an enterprise SSD to the same number of P/E cycles, would they have different data retention?
Probably not, assuming they're using the same underlying media and same strength of ECC and that the amount of host data written was appropriately adjusted to account for the different capacities and overprovisioning ratios to ensure the actual P/E cycles seen by the NAND were the same.
As you write more data, the consumer drive would be out of warranty first, while the enterprise drive would still be under warranty but not spec'd to retain data for as long as the worn-out consumer drive. So for either drive, the manufacturer isn't guaranteeing 1 year retention past the rated endurance of the consumer drive.
Very disturbing, the article talks about the number of bits to be read per cell like if it were only a matter of speed.
In addition to your comment (related), when 3D-NAND cells are read, interference by the traces (charge-trap disturbs) requires the neighbour cells to be refreshed with writes, if the controller wants to conserve data integrity. This did not happen with 2D-NAND in the past.
Reading data from one cell in 3D-NAND involves writing cells; reading data in 3D-NAND consumes disk endurance.
(Not to mention, temperature/number of layers/endurance)
I used to work in SSD controller firmware. This kind of issue existed more than 10 years ago. To achieve higher capacity with each generation of NAND you are trading off everything else (endurance, retention, read/program times, etc) little by little. 20 years ago it was so rare to see any kind of errors with SLC NAND that error detection and handling was fairly simple.
I have some small USB drives from that era with SLC rated for 10 years of retention after 100k cycles, meaning a 64MB drive has a total endurance of 6.4TB. True binary capacity too, with no wear leveling either, as only the spares on each page were needed for a simple ECC.
Too bad planned obsolescence got in the way, or we would've ended up today with bigger SLC drives that are fast and simple and just as reliable.
There's no planned obsolescence for my use case. I've never done enough writes to wear out an SSD. While I might buy an SLC flash drive if it was properly available, for internal drives TLC has barely any downsides. I very rarely get limited to 500MB/s, I need to turn it on sometimes, and I get 2.8x the space. Sounds great.
tl;dr: worn flash shows severe degradation, but more worrying is that there are even signs of retention failure with TLC flash that has been programmed only once.
Worrying if you want to leave it unpowered. I made that an explicit part of the tradeoff. I don't value that feature very much, and I don't think many other people value it much either.
It's important to make people aware of that problem, but they can solve it in other ways.
I'd pay an extra $20 to upgrade a 0.25TB flash drive to SLC and get unlimited retention. For a drive in my computer, I don't need it and it would cost far more.
Edit: I could even get an extra hard drive just to act as a backup, and TLC+HDD would still be half the price of SLC.
Good TLC/QLC drives will likely store your data as SLC if you overprovision them badly enough. (I.e. use only one fourth or less of the rated storage, since QLC stores four bits per cell. You can sort of verify this via benchmarking, since the SLC "mode" has far better performance overall.)
The endurance figures seem to suggest anywhere between 6.6k and 11k cycles, which is both a wide range and unusually high for TLC flash - this is the normally expected range for decent MLC and 5 years of retention, so I suspect they're massaging the retention downwards to get those numbers.
Related: https://news.ycombinator.com/item?id=43702193