Originally Posted By: Subdued
We're not talking about a desktop "server" here
Even 14 years ago Dell's PERC was a pretty darn good RAID controller.
Replace one drive, wait for rebuild, replace another drive, etc. and there would be literally no downtime
Uh... no. I have a lot of experience with 14 year old PERC Controllers.
There is no "another" nor any "etc". RAID 5 (and there was no RAID 6 in old PERC's back then) can only rebuild for a single drive failure, no matter what the number of drives are that make up the array. Thus, no "another". When RAID 6 came along that number raised to two, thus there is no "etc". Even 10 years ago not every PERC sold offered RAID 6.
Originally Posted By: Subdued
So I'm a little confused why someone would take an image on a server just to proactively swap out some identical drives.
While not necessary, the ability to swap back a full-up working system is a HUGE timesaver. An O/S install takes an hour, installing the backup software takes a half hour, restoring the backup catalog can take many hours, depending on the size of the volumes. Some reboots and some configuration later and then... you are ready to BEGIN restoring the original system.
Contrast that with a.) Swap in old HDD, boot and b.) run "restore".
Originally Posted By: Subdued
Granted I am assuming RAID, but IMO if you're not running some kind of RAID1/5/6 you don't really have a reliable server...
Agreed. If it doesn't have ECC and RAID it's not a Server.
I would have loved to run RAID 50 or other more redundant configuration, but older PERC's simply didn't support those modes, and even if they did, there simply wasn't enough drive slots, not to mention we weren't working with 2TB drives back then. We needed all the space we could get. For typical server had 6 drive slots, we configured them as a 5-drive RAID 5 and one hot spare. If you had 8 drive slots, then a mirrored boot pair and a 5-drive RAID 5 and one hot spare was a typical setup. Trying to run a 7-drive RAID 5 just drove up the chances of a simultaneous 2-drive failure.
And yes, simultaneous RAID drive failures occurred, for us 4 times over roughly 5 years to my recollection. The first was Maxtor's when the data center A/C failed during a holiday. After that it was buggy WD firmware, the TLER bug. Once WD fessed up and issued a firmware update things got a lot more stable.