Interesting SSD wear rate - Mac Pro

OVERKILL · Jun 13, 2021

JustinH said:
Wait until the drive throws a predictive failure then you can swap it for a new one under warranty.

When I maintained older SAN's part of my duty was working with the vendor to get drives swapped out with predictive failure.

The last few drives I've had throw a predictive on my PACS servers have been out of warranty (servers need an evergreen but it was put off by COVID). I'm interested to see how that goes with the newer ones I have in service now that I've transitioned to SSD's instead of SAS spinners.

Quattro Pete · Jun 13, 2021

OVERKILL said:
Yeah, unless the drive shipped used or with that value pre-populated?

That "power-on time" is being calculated wrong somehow. I've got two screenshots taken 405 days apart, but according to the hours shown in SMART they've been taken 604 days apart. Makes no sense.

PandaBear · Jun 13, 2021

Most companies have tools for reading their drive lives that may not align with SMART, use the OEM tool instead if possible.

Lite-On is mainly a contract manufacturer, they are given the design and parts sourced by customers and they build the drives for the customers like Foxxcon build machines for everyone else. Kingston is also a slightly smaller player in the contract manufacturer space like Lite-On. Lite-On is acquired by Toshiba / Kioxia, and Toshiba / Kioxia has a very good relationship with Kingston's owner, so they likely will be a major player in the retail brand space and end up as the drive you get if you are not buying big brand like WD / Micron / Samsung / Intel.

Drive live remain is really more of a probability statistics than deterministic calculation. They know based on testing how many write the drive can handle based on the nand used, how strong their ECC algorithm is (most people didn't realize NAND gradually got worse and what was bad in the past due to weak ECC is now ok due to strong ECC), how fast the erase and program speed is (voltage vs time), how many spare blocks are provided (every 7% reserved space will increase lifespan 1x), etc. So the manufacturers know how much most should last based on their calculation and they use that calculation to determine how much life is left.

Power on hour doesn't do much for the life calculation, you are not running a mechanical drive that spins at 7200rpm. Yes electronics die over time but that's not what limits SSD lifespan typically, they are mainly # of erase and program cycles FOR THE TEMPERATURE you operates it in. So if you run it inside a hot attic with no circulation, it will not last as long as inside an office with AC.

PandaBear · Jun 13, 2021

Apple's problem with the M1 SSD is because they have very little DDR, and is relying on swap and compression to SSD to buy them back the speed they lost with the small DDR size. It isn't the fault of Firefox or Chrome, they just didn't design the OS swap mechanism correctly. I would be happy if they would go with bigger DDR and a user replaceable SSD but they just have their own high speed DDR, high speed SSD, lots of swap to make up for the size algorithm, and you cannot just put a cheap SSD in there because it won't be fast enough.

This is my $25 solution for my dad's $450 computer with 1 stick of 8GB DDR4: another stick off ebay from people pulling their stick to upgrade. No compression, no fast swap, no BS, just a bunch of brute force memory.

y_p_w · Jun 13, 2021

ripcord said:
How is "drive life" defined? Does 85% remaining life mean that 15% of the spare space was used to reallocate bad sectors? Or is it just some formulae of X number of writes divided by the expected write lifecycle? From what I understand, the write cycle limit on most SSD's is pretty conservatively rated and most will go quite a bit longer than specified before degrading too much.

It was mentioned what it means. However, I would point out that obtaining a useful life also relies on error correction, where there will always be error correction bits as well as spare area used. If error correction is needed after a read, that will slow down the entire read. I'm not an error correction expert, but I do understand that once an error is detected, it will have to go through an error correction algorithm to reconstruct the proper data. I'm not sure how much this affects the performance. It might not matter as much with a SATA drive where the interface speed is a limiting factor, but it may matter with many of the newer interfaces that are much faster.

There are also a lot of variables. Quite a few drives use a single-level cell cache to help speed things up and to help with reliability. A lot of the 3D NAND flash is more reliable, although I don't understand the physics behind it.

PandaBear · Jun 14, 2021

y_p_w said:
It was mentioned what it means. However, I would point out that obtaining a useful life also relies on error correction, where there will always be error correction bits as well as spare area used. If error correction is needed after a read, that will slow down the entire read. I'm not an error correction expert, but I do understand that once an error is detected, it will have to go through an error correction algorithm to reconstruct the proper data. I'm not sure how much this affects the performance. It might not matter as much with a SATA drive where the interface speed is a limiting factor, but it may matter with many of the newer interfaces that are much faster.

There are also a lot of variables. Quite a few drives use a single-level cell cache to help speed things up and to help with reliability. A lot of the 3D NAND flash is more reliable, although I don't understand the physics behind it.

The common practice in the industry (each company may have their own secret sauce on top of the common algorithm):

1. You do a typical read with data, wrap around with checksum so you know if your data is decoded correctly or not + extra data that helps you correct the problem. ECC / LDPC has a "strength" that is typically measured in how many bits of error you can correct, some stronger / more efficient one will correct very well "most of the time" but some will fail, while others will not be as efficient but will always correct to the performance it guaranteed. These days we are using the correct very well "most of the time" but will fail some version, and rely on other stuff to help us with the "rare" incident that they cannot correct.

2. When the ECC fail, the controller will send command down to the NAND to tell it to read the "soft" bit. Basically sometimes when the 1/0 are decided it is a toss up between the 1s and 0s, so these "soft" bits tells you whether it is more like a 1 or a 0, and help the controller to try again and decide which one is more likely, and make it a first line of recovery.

3. If that still fail, it will start trying to adjust the voltage threshold that decide whether a bit is 1 or 0, basically swing it back and forth further and further away from original to see if they can get a good guess. This is due to the nature of how "wear" happens as the value can go slightly above or below spec when NAND age, so the guessing will help in some situation. This can be a trial and error up to tens but not hundreds of times, depends on what customer wants.

4. If that swinging still fail, then start reading every single analog voltage plot of the whole block and then run some statistical analysis to see where the threshold should be, this will be like using a very slow computer to do some big math calculation (because it doesn't happen a lot so nobody spend money building the performance in the cheap controller), then try again to see if it helps.

5. It is time to bring out the secret sauce, every company does it differently. I can tell you but I have to kill you.

6. Still failing, it is time to use the internal RAID, read between 7 to 31 other chips to raid the data together to guess what you are missing. When you are getting to this point you need to decide whether to re-allocate this sector / page to somewhere else and mark this block BAD.

7. If it runs out of reserved block then it is time to make the drive read only and drive fail.

The above is assuming the drive is performing as designed (did not run into an unknown bug, and manufactured with parts that didn't get premature failure). If you have a bug in the firmware or bad parts all bets are off, you have to fix it or you can go out of business (OCZ anyone?)

Since ECC is so important, it is one of the deciding factor on how long a drive will last. The same drive that failed and died would likely have been alive if the same NAND was paired with a stronger ECC algorithm. NANDs are getting worse every generation and ECC is getting stronger every generation to compensate for it. Life is typically measured as amount of write, but it correlate to reserved spare blocks / reallocate blocks.

PandaBear · Jun 14, 2021

3D nand is typically more reliable because it has more surface area to hold charges per bit. Without 3D there probably isn't enough surface area to hold charges to make TLC reliable, and definitely not QLC. The typical reason to use some SLC for more reliable can be because a lot of writes are on top of each other, so if they get overwritten on top of each other as SLC it won't wear out as much. It also help burden the load of write amplification before it becomes unbearable for the TLC / QLC, it also helps speed up write by quite a lot too.

y_p_w · Jun 14, 2021

PandaBear said:
The common practice in the industry (each company may have their own secret sauce on top of the common algorithm):

1. You do a typical read with data, wrap around with checksum so you know if your data is decoded correctly or not + extra data that helps you correct the problem. ECC / LDPC has a "strength" that is typically measured in how many bits of error you can correct, some stronger / more efficient one will correct very well "most of the time" but some will fail, while others will not be as efficient but will always correct to the performance it guaranteed. These days we are using the correct very well "most of the time" but will fail some version, and rely on other stuff to help us with the "rare" incident that they cannot correct.

2. When the ECC fail, the controller will send command down to the NAND to tell it to read the "soft" bit. Basically sometimes when the 1/0 are decided it is a toss up between the 1s and 0s, so these "soft" bits tells you whether it is more like a 1 or a 0, and help the controller to try again and decide which one is more likely, and make it a first line of recovery.

3. If that still fail, it will start trying to adjust the voltage threshold that decide whether a bit is 1 or 0, basically swing it back and forth further and further away from original to see if they can get a good guess. This is due to the nature of how "wear" happens as the value can go slightly above or below spec when NAND age, so the guessing will help in some situation. This can be a trial and error up to tens but not hundreds of times, depends on what customer wants.

4. If that swinging still fail, then start reading every single analog voltage plot of the whole block and then run some statistical analysis to see where the threshold should be, this will be like using a very slow computer to do some big math calculation (because it doesn't happen a lot so nobody spend money building the performance in the cheap controller), then try again to see if it helps.

5. It is time to bring out the secret sauce, every company does it differently. I can tell you but I have to kill you.

6. Still failing, it is time to use the internal RAID, read between 7 to 31 other chips to raid the data together to guess what you are missing. When you are getting to this point you need to decide whether to re-allocate this sector / page to somewhere else and mark this block BAD.

7. If it runs out of reserved block then it is time to make the drive read only and drive fail.

The above is assuming the drive is performing as designed (did not run into an unknown bug, and manufactured with parts that didn't get premature failure). If you have a bug in the firmware or bad parts all bets are off, you have to fix it or you can go out of business (OCZ anyone?)

Since ECC is so important, it is one of the deciding factor on how long a drive will last. The same drive that failed and died would likely have been alive if the same NAND was paired with a stronger ECC algorithm. NANDs are getting worse every generation and ECC is getting stronger every generation to compensate for it. Life is typically measured as amount of write, but it correlate to reserved spare blocks / reallocate blocks.

Well yes. In the quest for lower costs it's going to quad-level, which is inherently going to be less reliable. That part I absolutely understand. I remember talking to someone about this telling me that fairly new NAND flash will usually work at first pass. But then they start getting marginal because the rewrites start breaking down the insulator, and reconstruction and then retries is going to take more time.

Ultimately it's going to be time to replace an SSD once it fails. Of course that's going to be more difficult in some devices where the SSD is really just a part of the complete main board.

PandaBear · Jun 14, 2021

Took almost 5 years from TLC first appear to make it stable enough for SSD duty. QLC was released a few years ago by Micron as a SATA drive that will likely never max out the drive life, it likely won't reach enterprise drive for a few more years other than the write once read many (WORM) drives.

There is a trend to move the translation layer work to the host computer so some workload in the future will be all sequential write like a tape drive and DVD-R (they call it zoned namespace or ZNS), this helps with drive life (reduce write amplification) a lot as the host will be aware of what should or should not be overwritten. We may need to go that direction if we want to use QLC and beyond.

y_p_w · Jun 14, 2021

PandaBear said:
Took almost 5 years from TLC first appear to make it stable enough for SSD duty. QLC was released a few years ago by Micron as a SATA drive that will likely never max out the drive life, it likely won't reach enterprise drive for a few more years other than the write once read many (WORM) drives.

There is a trend to move the translation layer work to the host computer so some workload in the future will be all sequential write like a tape drive and DVD-R (they call it zoned namespace or ZNS), this helps with drive life (reduce write amplification) a lot as the host will be aware of what should or should not be overwritten. We may need to go that direction if we want to use QLC and beyond.

Of course the biggest problem with most drives is that system data typically goes to a reserved area. I get that it wasn't a problem with hard drives that could indefinitely handle writing to the same area. I know with solid-state drives, that's where wear-leveling comes in.

I'm still wondering why even bother going to quad-level. MLC doubled the density. TLC is what - 50% more density than MLC. 3D was supposed to be the big one, especially since it doesn't require more packages.

PandaBear · Jun 14, 2021

y_p_w said:
Of course the biggest problem with most drives is that system data typically goes to a reserved area. I get that it wasn't a problem with hard drives that could indefinitely handle writing to the same area. I know with solid-state drives, that's where wear-leveling comes in.

I'm still wondering why even bother going to quad-level. MLC doubled the density. TLC is what - 50% more density than MLC. 3D was supposed to be the big one, especially since it doesn't require more packages.

It is the "relative" pricing between companies. One Micron VP was interviewed by a blogger (Moore's Law is Dead) and he said that in his entire career he couldn't find someone willing to pay for anything other than cost per byte. So in a way if someone was able to sell something slightly less than 1/3 cheaper there will be customer.

3D is also getting diminishing return. The staircase that let you connect each layer to the control circuits eventually take up more and more spaces, the quality of the channel holes gets worse the more layer you pile on top, basically right now some has decided to stop at around 96 while other have to do another staircase from scratch to reach 128 (basically you are making a new chip on top of another, need to use the lithography again, so there is no saving going more layers).

I do agree with you that I'd rather have fewer level and more layers, it is just more stable that way, but it cost slightly more and at the end of the day it is the cost that matters.

y_p_w · Jun 14, 2021

PandaBear said:
It is the "relative" pricing between companies. One Micron VP was interviewed by a blogger (Moore's Law is Dead) and he said that in his entire career he couldn't find someone willing to pay for anything other than cost per byte. So in a way if someone was able to sell something slightly less than 1/3 cheaper there will be customer.

3D is also getting diminishing return. The staircase that let you connect each layer to the control circuits eventually take up more and more spaces, the quality of the channel holes gets worse the more layer you pile on top, basically right now some has decided to stop at around 96 while other have to do another staircase from scratch to reach 128 (basically you are making a new chip on top of another, need to use the lithography again, so there is no saving going more layers).

I do agree with you that I'd rather have fewer level and more layers, it is just more stable that way, but it cost slightly more and at the end of the day it is the cost that matters.

Well - I've worked in places where we were pin constrained because that was an important part of the cost. Silicon wasn’t free, but anything we could do to reduce pin count was a major cost savings. I understand that eMMC is major cost savings in part because of pin count reduction.

I’m actually kind of shocked at how little endurance there is now. I was shocked at a 1 million write endurance with SLC flash, but now sub 1000 is considered OK. But then the real world doesn’t necessarily burn through write cycles like we think.

PandaBear · Jun 14, 2021

Depends on the product I think. I was told USB drive have been sold for as little as 30-50 cycles because most are used only 24 times the capacity only. SD card would be more because it can be written by a dash cam or other automated devices. Enterprise SSD can be sold between 5-7k write and customers can always pay more for more reserve capacity, slow down the speed to make it last longer, stronger ECC algorithm with more expensive controller, etc.

My first SSD just died 2 years ago after being used for 10 years, it was 50GB and I RAID 0 it with another one to make a 100GB partition. I'm sure today's drive would last me long enough that I don't care if it die in 7 years, but really, people don't use as much storage now with phone based photography and streaming, instead of copying between digital camera and hard drive, downloading movies and music off the internet, etc. People don't even care about speed much as most OEM have gone with DRAMless SSD and nobody cared, the move from mechanical HDD to SSD is fast enough and people are happy even if they are DRAMless.

I think the main reason is the actual usage / write hasn't change much and people just store more and more write once read many times data on it. Back then people really uses the whole capacity for writing and keep the write once read many times data on ROM, they don't do that anymore, they throw everything on the same NAND, and after wear leveling they aren't really that many cycles.

y_p_w · Jun 14, 2021

I've had some early USB flash drives that got really, really slow after a while. They were USB 2.0 High-Speed but still slowed down after I'd used them for a few years. I suspect that they're not ready to die, but are going heavily into error correction.

There are also a lot of people who look at the numbers and ask why should they settle for "lower" numbers, even though 5000 write cycles may actually be enough for most people, especially where the device will warn the user when it's approaching failure. Heck - there were tons of people who were angry about the Intel FDIV bug even though it was something would almost never affect them.

PandaBear · Jun 15, 2021

y_p_w said:
There are also a lot of people who look at the numbers and ask why should they settle for "lower" numbers, even though 5000 write cycles may actually be enough for most people, especially where the device will warn the user when it's approaching failure. Heck - there were tons of people who were angry about the Intel FDIV bug even though it was something would almost never affect them.

People buy with emotion, you have seen that in the auto industry and home appliance industry (why would people buy stainless steel fridge when we use magnets on them and cover them up, or why are we buying RED laundry machines in pair when you just need to replace the washer only most of the time).

y_p_w · Jun 15, 2021

PandaBear said:
People buy with emotion, you have seen that in the auto industry and home appliance industry (why would people buy stainless steel fridge when we use magnets on them and cover them up, or why are we buying RED laundry machines in pair when you just need to replace the washer only most of the time).

We've mentioned two things in this topic. One was your mention that people want cheaper when they can get it. But then there are those who (once they get something at their price level) ask why should there be any compromise. Even then - there are a lot of people who overspend because they want to "use what the pros use". Whether or not that's a Viking range or professional style golf clubs. Or the big thing on BITOG, which is motor oil.

We can see that with golf equipment. The weekend golfer (even a good one) is probably better off with mid-level clubs and long-distance balls. Only a really high level player can control pro-style equipment, but a lot of people insist on buying "the best".

ZeeOSix · Jun 15, 2021

PandaBear said:
... but really, people don't use as much storage now with phone based photography and streaming, instead of copying between digital camera and hard drive, downloading movies and music off the internet, etc.

I think the main reason is the actual usage / write hasn't change much and people just store more and more write once read many times data on it.

I was watching the Resource Monitor on my Win10 laptop, and there is a lot of reads/writes going on to the SSD even if I'm not doing anything on the computer. There is constant disk read/write action going on from all the programs running in the background. I think those processed probably contribute more to the disk reads/writes than people's own personal files and photos. Also, if you watch a YouTube video or live streaming on the 'net, there's quite a bit of disk action going on.

y_p_w · Jun 15, 2021

ZeeOSix said:
I was watching the Resource Monitor on my Win10 laptop, and there is a lot of reads/writes going on to the SSD even if I'm not doing anything on the computer. There is constant disk read/write action going on from all the programs running in the background. I think those processed probably contribute more to the disk reads/writes than people's own personal files and photos. Also, if you watch a YouTube video or live streaming on the 'net, there's quite a bit of disk action going on.

I think something like a newer Mac may be better for this. They fully specify the operating system and the SSD, and could tailor the disk access to be more efficient - minimizing the number of writes to the drive.

There is a difference between the number of writes going into the SSD and going into the lowest-level flash. The rated endurance (1,000,000/100,000, 10,000,1,000 cycles) is really about the lowest level.. The SSD will be more than just the lowest level flash memory. It will have a RAM cache buffer, and most these days the first level will be single-level cell flash storage.

PandaBear got a little bit technical earlier about modifying threshold voltages, and that might have gone over most people's heads. I promise what I present here will also go way into the weeds. NAND flash memory isn't like RAM. With RAM, the value of each bit is stored in the cell, or in the case of DRAM in a single transistor. With NAND flash, each transistor represents 1/2/3/4 bits and 2/4/8/16 (more or less) discrete levels. But these discrete levels are about the threshold voltage needed to cause a transistor to allow the flow of electrons. So to read data, the controller needs to test these levels. And when it wears out, these threshold levels tend to shift.

JHZR2 · Jun 27, 2021

I was an early-ish adopter of SSD, my 2010 MacBook Air runs one.

And it only has 4GB Ram. Run any browser with more than one window and the computer is ALWAYS down to like 25MB Ram. So it must be swapping a lot. I don’t have any way to tell drive life. Based upon this thread I really wish I knew.

Quattro Pete · Jun 27, 2021

My dad just had an SSD put in his old laptop with 2 GB RAM, running W10. I told him to just get a new laptop instead, but you can't fix stubborn.

Interesting SSD wear rate - Mac Pro

OVERKILL

$100 Site Donor 2021

Quattro Pete

PandaBear

PandaBear

y_p_w

PandaBear

PandaBear

y_p_w

PandaBear

y_p_w

PandaBear

y_p_w

PandaBear

y_p_w

PandaBear

y_p_w

ZeeOSix

$100 site donor 2022

y_p_w

JHZR2

Quattro Pete

Similar threads