Weird hard drive issue – questions

SMART data from SeaTools:
View attachment 236613

Stats missing from the screenshot are Ultra DMA CRC Error (0), plus head flight hours and lifetime reads/writes, which I assume aren't important here (someone please correct me if I'm wrong).
At only 1202 hours and 40 power cycles, that is a really young hard drive. It must have been defective from the start.
 
At only 1202 hours and 40 power cycles, that is a really young hard drive. It must have been defective from the start.
Agreed that it's super young to be having these issues.

I did manage to get all the data off it (AFAICT), none of the SMART values seems to be near a condemnation point, and the drive passed SeaTools's diagnostics while connected via USB.

Given what I've posted so far, do you think it's more likely that the drive has failed than that the cable or port went bad?

I just plugged it back in with a different SATA cable and am running more diagnostics on it. Let's see what happens...
 
Reallocated sector count, or in your case retired sectors count, is an important indicator. It’s zero.
Pending count is another, and it is also zero.
It doesn’t seem like a bad HDD.
 
What do you mean by timing?

I "trust" the Marvell controller in the sense that I think it's a lot less likely to be the problem than the hard drive or cable is. I don't trust it enough to rule it out completely as a potential cause.

I just looked back in the event log and found many more of the same kinds of events, apparently on the same channel. So, maybe that channel or the cable is bad...
Some background in the hard drive engineering before we continue. A track for writing bits on a platter can only be as close to the size of a head's magnetic field in writing. The reading can be done in a more narrowed track than write. So in theory, if you are going to write sequentially from one side to another, you can write them in overlap so that the remaining amount of magnetic field is narrow enough for reading. This is called singled magnetic recording. This obviously would increase density and make the drive bigger or cheaper. The cost is you now cannot just write a little bit randomly very fast because you need to hide the random write in a much larger sequential write with overlapping magnetic tracks.

Typically NAS drives don't expect very small read and writes that would timeout if not done too fast, so recently (in the last several years), hard drive companies basically expect them to be given a lot of time when doing write on NAS drives, so they would stack a lot of the tracks together and still have enough time to write them with SMR. It takes time, and would be a bad idea for an OS drive or a random access internal drive. For a NAS this is USUALLY ok, and for security camera drives, it is perfect anyways.

WD Red got busted for doing SMR and failing in NAS, people got angry, and WD says they should buy the Red Pro and let buyers refund if they are not happy. I don't know enough about Iron Wolf but if it is a NAS drive and you use it in a PC, I wonder if this is what is happening to you as well.

1724389909664.webp
 
Last edited:
Reallocated sector count, or in your case retired sectors count, is an important indicator. It’s zero.
Pending count is another, and it is also zero.
It doesn’t seem like a bad HDD.
UDMA error is also 0 so it does not look like an interference issue either. It could be a cable with bad connection, it could be a board / chip bad soldering, they don't always show up, but the bit error rate on the drive's reading is good enough not to fail correctable errors much.

It could be anything from the RAM of the HDD to the PC's chipset and in between. It is not a server so it does not have layers and layers of correction in between. I would just swap this drive out and see if it fail in another computer and call it a day, not worth analyzing something hard to reproduce.

I have a laptop with similar issue on its port and 2 drives both fail there once a month or so, corrupting the entire OS. I eventually gave up and get an optical bay to SATA adapter and just swap in an SSD for the DVD ROM drive, and use a USB DVD RW drive externally instead.
 
Agreed that it's super young to be having these issues.

I did manage to get all the data off it (AFAICT), none of the SMART values seems to be near a condemnation point, and the drive passed SeaTools's diagnostics while connected via USB.

Given what I've posted so far, do you think it's more likely that the drive has failed than that the cable or port went bad?

I just plugged it back in with a different SATA cable and am running more diagnostics on it. Let's see what happens...
It very well could have been a bad SATA cable. It sounds too easy, but it obviously happens.
 
Last edited:
Some background in the hard drive engineering before we continue. A track for writing bits on a platter can only be as close to the size of a head's magnetic field in writing. The reading can be done in a more narrowed track than write. So in theory, if you are going to write sequentially from one side to another, you can write them in overlap so that the remaining amount of magnetic field is narrow enough for reading. This is called singled magnetic recording. This obviously would increase density and make the drive bigger or cheaper. The cost is you now cannot just write a little bit randomly very fast because you need to hide the random write in a much larger sequential write with overlapping magnetic tracks.

Typically NAS drives don't expect very small read and writes that would timeout if not done too fast, so recently (in the last several years), hard drive companies basically expect them to be given a lot of time when doing write on NAS drives, so they would stack a lot of the tracks together and still have enough time to write them with SMR. It takes time, and would be a bad idea for an OS drive or a random access internal drive. For a NAS this is USUALLY ok, and for security camera drives, it is perfect anyways.

WD Red got busted for doing SMR and failing in NAS, people got angry, and WD says they should buy the Red Pro and let buyers refund if they are not happy. I don't know enough about Iron Wolf but if it is a NAS drive and you use it in a PC, I wonder if this is what is happening to you as well.
Thanks for the explanation!

Yes, Ironwolf is a NAS drive. I'm using it in a PC, but I'm using it for NAS things – all it does is store large media files. Basically all it ever does is read; writes are very few, very far between, and typically large. The OS, all apps, and all related stuff (pagefile, temp files, etc.) are on an SSD.

Though, thinking about it now, maybe my backup and AV software are doing some tiny writes in the background...

But the other identical Ironwolf drive sees similar usage on the same controller and hasn't missed a beat – not what I'd imagine if the drive were a mismatch for the controller and/or application.

Does that make sense?
 
Last edited:
Thanks for the explanation!

Yes, Ironwolf is a NAS drive. I'm using it in a PC, but I'm using it for NAS things – all it does is store large media files. Basically all it ever does is read; writes are very few, very far between, and typically large. The OS, all apps, and all related stuff (pagefile, temp files, etc.) are on an SSD.

Though, thinking about it now, maybe my backup and AV software are doing some tiny writes in the background...

But the other identical Ironwolf drive sees similar usage on the same controller and hasn't missed a beat – not what I'd imagine if the drive were a mismatch for the controller and/or application.

Does that make sense?
Yes it make sense. In that case if possible, I would swap this drive's port with another Ironwolf's port, and see if the failure stay with this drive or the same port your drive was plugged into. Swapping things is the easiest to do engineering without special equipment or knowhow.

As long as it is worth your time and you have backups.
 
Thanks again.

Still curious about any Watt-conscious HBAs that might be worth buying!
 
Thanks again, everyone, for the feedback. Switching the cable out seems so obvious in retrospect but evidently it wasn't to me at the time.

The drive has been running for a few days now. In the SMART data per SeaTools, the "normalized" values for Read Error Rate and ECC On The Fly have both worsened slightly from 84 to 82, but nothing else has changed. Fingers crossed!
 
Back
Top Bottom