Should a SSD be replaced after X years?

jhellwig · Oct 2, 2023

y_p_w said:
Flash is really "flash EEPROM", which is a type of EEPROM that is erased in entire blocks.

https://www.electronicsforu.com/technology-trends/learn-electronics/eeprom-difference-flash-memory
Flash memory is a type of electronically-erasable programmable read-only memory(EEPROM), but it can also be a standalone memory storage device such as a USB drive.

And absolutely blocks get erased. The idea is to get all the trapped charge in the "floating gate" out of there by zapping it. They have to be erased in order to program them, which involves injecting a precise amount of charge in the floating gate to change it's threshold voltage - the voltage that turns it on. I remember working with older flash where it was just one bit per transistor, and they could theoretically be written again without erasing. A block erase was always to all 1s. They could only be programmed to 0. So if a bit was still a 1, it could be programmed to 0, but if it was a 0 it could only stay at 0 until the block was erased. There were creative ways to use it where they didn't need to be erased.

As far as programming - that's pretty convoluted these days because single-level is no longer the standard except maybe for a high endurance cache. But the key is that it's not really programming the cell to sample an output like with SRAM or DRAM, but to program the threshold voltage that turns on a specific transistor. Then the voltage to turn on the transistor is tested. With single level, the goal is to program it for two different threshold voltages. With multi-level (2) it's four. With tri-level (3) it's eight. And with quad-level (4) it's sixteen. It's actually quite convoluted how this works and the more threshold voltages there might be programmed, the more complicated it is to test. When there are errors, there are different ways to adjust the thresholds to sample again. I don't fully understand all that stuff. I only understand this from a relatively high level. But there's less margin of error for more bits per cell.

There are all sorts of issues with accuracy as the insulator in a floating gate is damaged from erasing and programming. There's also the issue of charge in the floating gate leaking, which can be temperature dependent.

This shows the target threshold voltage for different cell capacities of flash.

https://www.kioxia.com/en-jp/rd/technology/multi-level-cell.html

I guess when you say EEPROM I think of the old black bug looking chips that you did have to have a charge pump circuit to erase and all that jazz for a chip that is only 8 bytes or something small like that. I have lots of equipment that still runs them and even some EPROM and PROM. Somewhere at work we have a UV lamp for erasing EPROMs.

When I think flash memory it is surface mount components embedded on a board in something like a flash drive or other newer components. I believe the erasing progress is not as difficult as it used to be with the EEPROMs of old that had only one erase pin that cleared the whole chip. I am fairly sure that with modern SSDs like I said it is not physically erased off the chip till other data has been written into that block under normal circumstances to limit wear on the gates. This is why they tell you not to defrag an SSD because it just wears it out while not making anything faster because it is doing a lot of actual erasing.

y_p_w · Oct 2, 2023

jhellwig said:
I guess when you say EEPROM I think of the old black bug looking chips that you did have to have a charge pump circuit to erase and all that jazz for a chip that is only 8 bytes or something small like that. I have lots of equipment that still runs them and even some EPROM and PROM. Somewhere at work we have a UV lamp for erasing EPROMs.

When I think flash memory it is surface mount components embedded on a board in something like a flash drive or other newer components. I believe the erasing progress is not as difficult as it used to be with the EEPROMs of old that had only one erase pin that cleared the whole chip. I am fairly sure that with modern SSDs like I said it is not physically erased off the chip till other data has been written into that block under normal circumstances to limit wear on the gates. This is why they tell you not to defrag an SSD because it just wears it out while not making anything faster because it is doing a lot of actual erasing.

Not sure exactly how those worked. I remember using windowed EPROMs that were erased using a UV eraser for an undergraduate class project. We could basically use those to create whatever ransom logic we wanted with however many address inputs and 8 outputs (since it was byte wide).

I guess there were later EEPROMs that either needed a high voltage or other mechanism (you said a charge pump) to provide the erase voltage.

But flash is really just a flavor of EEPROM. But way more complicated with newer versions. But again, it's about testing the threshold voltage of a particular transistor.

PandaBear · Oct 2, 2023

There is no "replace after X years" deadline or design life. They don't age just by powering on in theory. They do age based on the following factors:

1) Erase and program cycles. Typically for a drive that has wear leveling it would be how many times the entire drive is written over before wear leveling is factored in, and that's basically the hard limit depends on what kind of flash they use. I think most PC will never wear out on just booting and loading apps, ever. On data center drives they do get worn out and typically they profile their use case and custom order the kind of drive from the vendors (i.e. write once read many, frequent write and frequent read, etc), and they are priced and designed accordingly. As an example most people would be fine running 5 years on WD Blue or Green unless they use it for data processing (i.e. video editing or chia farming). WD Black would be more durable but people buy them more for performance than durability.

2) Data retention. Do you turn your computer off for 1 year without powering it back on? If so you may lose data on those TLC and QLC cells. Most SSDs and USB drives are designed to be powered on once in a while or at least no less than 12 months continuously in room temp, or 1 month inside a hot car. Don't just put a bunch of video on it and then toss in the drawer and forget about it for 5 years. If you want to do that buy a mechanical hard drive and keep it for 5 years max. SSD should be turned on once in a while so internally it can be refreshed. This is not a problem for MLC and SLC usually, but TLC and QLC can be problematic.

3) Component aging. even if the flash cells don't die most electronics do after probably 10 years in design life. So maybe, have a backup plan if you want to use something beyond 10 years is my guess.

Even if they are designed for say, 5000 cycles, that is still just a statistic. Your drive can gradually start to fail at 4700 and use up reserved blocks, and then run out of reserved blocks at 5500 cycles. It can also start early at 4000 and fail at 4800, or it can start failing at 6000 and fail at 8000, etc.

jhellwig · Oct 2, 2023

y_p_w said:
Not sure exactly how those worked. I remember using windowed EPROMs that were erased using a UV eraser for an undergraduate class project. We could basically use those to create whatever ransom logic we wanted with however many address inputs and 8 outputs (since it was byte wide).

I guess there were later EEPROMs that either needed a high voltage or other mechanism (you said a charge pump) to provide the erase voltage.

But flash is really just a flavor of EEPROM. But way more complicated with newer versions. But again, it's about testing the threshold voltage of a particular transistor.

Eh. I think I got my circuits mixed up. Maybe EEPROMs dont need a "high" voltage just a reverse in voltage or something. Like I said I am to lazy to google it but not argue about it for some dumb reason

. The video I was watching about a voltage double was for serial coms. A charge pump is the same as a voltage doubler. Ben Eater on youtube has some great videos explaining stuff like this as he built an 8 bit computer on bread boards.

PandaBear · Oct 2, 2023

earlyre said:
if you ask WD... they say life is only 3 years... take that for what it's worth.....
they wanna sell more drives of course...

Their internal design life depends on what the customers buy. AWS / Microsoft would likely demand 5 years of warranty and design life, HP / Dell probably 1 year, retail customers probably 3-5 years. Most designs are limited by erase / write cycles not by hours of operation. I've never seen any design trade off an SSD engineer has to make to save money that would cut into operation hours, but very often on performance and erase program cycles.

Rule of thumb is you double the expected design life if you add 15% extra reserved blocks for wear leveling and if you double the blocks you add another 20% performance.

PandaBear · Oct 2, 2023

jhellwig said:
Eh. I think I got my circuits mixed up. Maybe EEPROMs dont need a "high" voltage just a reverse in voltage or something. Like I said I am to lazy to google it but not argue about it. The video I was watching about a voltage double was for serial coms. A charge pump is the same as a voltage doubler. Ben Eater on youtube has some great videos explaining stuff like this as he built an 8 bit computer on bread boards.

Not sure about EEPROMs or which generation of it, but I know in today's flash memory the internal erase voltage is about 20V or so, certainly not enough to hurt people and no shielding is required.

PandaBear · Oct 2, 2023

y_p_w said:
Flash is really "flash EEPROM", which is a type of EEPROM that is erased in entire blocks.

https://www.electronicsforu.com/technology-trends/learn-electronics/eeprom-difference-flash-memory
Flash memory is a type of electronically-erasable programmable read-only memory(EEPROM), but it can also be a standalone memory storage device such as a USB drive.

And absolutely blocks get erased. The idea is to get all the trapped charge in the "floating gate" out of there by zapping it. They have to be erased in order to program them, which involves injecting a precise amount of charge in the floating gate to change it's threshold voltage - the voltage that turns it on. I remember working with older flash where it was just one bit per transistor, and they could theoretically be written again without erasing. A block erase was always to all 1s. They could only be programmed to 0. So if a bit was still a 1, it could be programmed to 0, but if it was a 0 it could only stay at 0 until the block was erased. There were creative ways to use it where they didn't need to be erased.

As far as programming - that's pretty convoluted these days because single-level is no longer the standard except maybe for a high endurance cache. But the key is that it's not really programming the cell to sample an output like with SRAM or DRAM, but to program the threshold voltage that turns on a specific transistor. Then the voltage to turn on the transistor is tested. With single level, the goal is to program it for two different threshold voltages. With multi-level (2) it's four. With tri-level (3) it's eight. And with quad-level (4) it's sixteen. It's actually quite convoluted how this works and the more threshold voltages there might be programmed, the more complicated it is to test. When there are errors, there are different ways to adjust the thresholds to sample again. I don't fully understand all that stuff. I only understand this from a relatively high level. But there's less margin of error for more bits per cell.

There are all sorts of issues with accuracy as the insulator in a floating gate is damaged from erasing and programming. There's also the issue of charge in the floating gate leaking, which can be temperature dependent.

This shows the target threshold voltage for different cell capacities of flash.

https://www.kioxia.com/en-jp/rd/technology/multi-level-cell.html

Most of the advancement today to make more and more bits fit in the same cell come from the advancement in error correction. The same page that could be bad years ago are now ok because of the better error correction engine used in the controller. If you look at a typical flash controller these days you will see most of its area is used for error correction. These days even the best and fastest (LDPC) can't 100% guarantee correction so they are layered with internal raid as a last line of defense. The same is happening in data network where faster speed come from better error correction and digital filtering.

The levels between 1100 and 1110 in QLC mentioned above, are typically designed by engineer tested for a particular temperature and chip model, then stored at manufacturing time. So the temperature sensor on the chip will tell the controller to send down the corresponding table to determine which one to use. If it fail, then the controller will try a finer table to read a "finer" reading that helps tell whether a bit that mismatch between 1100 and 1101 is closer to the higher or lower voltage one. If it still fail, they would try reading again and again with a set of tables that shift left and right a few times. If it still fail, then it would read a statistic plot of the page to get the bell curves and then run a math equation to determine the valleys separating each of the humps. Still fail? Time to read the other 31 pages from other blocks to get a raid recover and mark this block bad.

andrew_j · Oct 4, 2023

My old work Crucial SSD would eventually fail. I ended up needing them to be left powered on via usb to do some sort of “trim”? I just ended up buying new, making clone and threw old in a drawer.

JustinH · Oct 6, 2023

On a comsumer drive, you need to monitor the SMART values for predictive failure, and replace them when they do that.

The trim function tells the OS what parts of the drive are available to right, basically how much free space you have. This is done automatically on modern operating systems on a schedule in the background.

On a SAN this is usually done with a weekly scripted task.

PandaBear · Oct 8, 2023

JustinH said:
On a comsumer drive, you need to monitor the SMART values for predictive failure, and replace them when they do that.

The trim function tells the OS what parts of the drive are available to right, basically how much free space you have. This is done automatically on modern operating systems on a schedule in the background.

On a SAN this is usually done with a weekly scripted task.

Trim's primary purpose is for the OS to tell the hard drive something is erased instead of writing "data" there to mark it as erased like mechanical drive. This way the SSD can basically skip that area when garbage collect instead of doing unnecessary writes that's 1) slow and 2) used up erase and program cycles.

It is like taking content out of a box then move the empty box around on a shelf, instead of just taking that empty box away and leave the spot on the shelf empty for the next new box with things inside.

demarpaint · Oct 8, 2023

I constantly backup my computers, so when a drive dies I curse it and buy another.

All kidding aside they're wear items with a limited life cycle and planned obsolescence, so I plan accordingly.

k1rod · Oct 19, 2023

JohnnyG said:
The real trick would be to mirror the drive. Not sure if that option is available for home computers. We did that at work for crucial data. Essentially you run two identical drives. Chances of two failing at the same time would be very small.

I think that’s called a RAID.

wwillson · Oct 19, 2023

JohnnyG said:
The real trick would be to mirror the drive. Not sure if that option is available for home computers. We did that at work for crucial data. Essentially you run two identical drives. Chances of two failing at the same time would be very small.

You are describing RAID 1. There are many different levels of RAID. Most modern operating systems support several different levels of RAID, including the Windows workstation I'm on, which enjoys the protection of RAID 1.

LDM · Oct 19, 2023

y_p_w said:
Not sure exactly how those worked. I remember using windowed EPROMs that were erased using a UV eraser for an undergraduate class project. We could basically use those to create whatever ransom logic we wanted with however many address inputs and 8 outputs (since it was byte wide).

I guess there were later EEPROMs that either needed a high voltage or other mechanism (you said a charge pump) to provide the erase voltage.

But flash is really just a flavor of EEPROM. But way more complicated with newer versions. But again, it's about testing the threshold voltage of a particular transistor.

Still have old EPROM and EEPROM for my old GM OBD1 tuning stuff. Most stuff has moved on to FLASH but these were used in lots of ECM/PCMs for vehicles until the mid 90s when OBD2 required programming from the OBD port. Still have my old programmer and UV eraser for when I need to write a chip for an old vehicle that uses them like my 89 Camaro, although I did pick up bunch of the 28SF512 chips so I wouldn't have to use the EPROMs and just offset the programing depending on which chip it originally used. Rarely have a need to write an EPROM but still have them laying around just in case.

EPROM required the UV eraser, EEPROM could be erased via a high voltage wipe with the programmer and had no window on them. Similiar to FLASH but slower and not as sophisticated. Which was fine back when the size was in KB but you wouldn't want to try writing GB or TB to an EEPROM chip, it would take forever.

y_p_w · Oct 19, 2023

LDM said:
Still have old EPROM and EEPROM for my old GM OBD1 tuning stuff. Most stuff has moved on to FLASH but these were used in lots of ECM/PCMs for vehicles until the mid 90s when OBD2 required programming from the OBD port. Still have my old programmer and UV eraser for when I need to write a chip for an old vehicle that uses them like my 89 Camaro, although I did pick up bunch of the 28SF512 chips so I wouldn't have to use the EPROMs and just offset the programing depending on which chip it originally used. Rarely have a need to write an EPROM but still have them laying around just in case.

EPROM required the UV eraser, EEPROM could be erased via a high voltage wipe with the programmer and had no window on them. Similiar to FLASH but slower and not as sophisticated. Which was fine back when the size was in KB but you wouldn't want to try writing GB or TB to an EEPROM chip, it would take forever.

I don't think they ever got that big. How many address lines would have been needed?

I remember working with SPI EEPROM (and later flash which had different block erase mechanism) and they were never that big. Maybe 64k or 128k bytes? I mean - these days there's up to 1-2 TB in a single IC package, although that might include 3D and perhaps multiple dies in the same package.

LDM · Oct 19, 2023

y_p_w said:
I don't think they ever got that big. How many address lines would have been needed?

I remember working with SPI EEPROM (and later flash which had different block erase mechanism) and they were never that big. Maybe 64k or 128k bytes? I mean - these days there's up to 1-2 TB in a single IC package, although that might include 3D and perhaps multiple dies in the same package.

Yeah, I think the largest EEPROMS I've seen are 1MB. Most of the ones I've worked with are 64K, 128K, 256K, and 512K, as those are the common sizes for the EPROM and EEPROMs used in the GM OBD1 ECM/PCMs.

Should a SSD be replaced after X years?

Similar threads