*Sigh* Spent all of the morning rebuilding the web server IDE raid array. And thats IDE as in "Integrated Device Electronics", not "Integrated Development Environment" commonly used by most programmers.
Back in the day (we are going nearly 4 years back) it came time to move our web server off Win Box P-II server onto something else. You know, one of those "There is a surplus in our budget, can you think of anything we need?" type questions you might get from a mid-size type library department head.
Oh I was thinking a few things, believe me. Then it was said "You can only spend $3000". So I had this "Great Idea" about giving our web site a new server.
Best bang for the buck, I configured a workstation turned server out of a Althon 800MHz 512Megs PC-133 and 4 30Gig IBM DeskStar hard drives setup in IDE 0+1 RAID Config, using the infamous Abit KT7-A RAID mainboard utilizing the highpoint 370 Ultra100 Software RAID Controller.
Time went on and a year later I offered another ability to purchase a proper tape backup drive with SCSI controller. Had to fight with that bit, the Highpoint BIOS and Adaptec bios overlapped each other, preventing boot. I updated the Adaptec BIOS and all was well.
IPAC for Dynix grew and grew so additional hardware needed to be purchased. Needing to keep the existing RAID config I went with an Althon XP 2100+ and 1gig of DDR ram carrying on the Highpoint controller via Abit KR7-RAID.
That upgrade went pure fluke I think, as there was spots on the web that told of horror stories of the RAID between controllers/mainboards not carrying over correctly. Fortunately for me I encountered no problems on the mainboard exchange, and the old webserver hardware went into the making of the new email server (was running on a Pentium 90. The IMAP protocol is not fast on a Pentium 90).
Add another gig of RAM later and that will bring this up into present day, where last week I got to questioning the raid array on the web server (after dealing with a raid failure on the file server that same day) that I would research and install the RAID monitoring software on the web server to find out the actual status of the raid array by scheduling in a consistancy check slated for last Sunday evening.
You know it or I wouldn't be writing it: The consistancy check failed.
Various attempts between yesterday and today to restore the RAID array into working condition failed. But what disk had the problem? The Highpoint 372 RAID controller software isn't programed to tell you. Or they have it hiding in a real obscure place.
So yesterday I bolted off to the computer hardware store to pick up a couple of extra IDE drives in the hope of fixing this little problem. Since I was only replacing 30gig hard drives, I grabbed some Western Digital 40gig hard drives. Normally I don't recommend buying Western Digital, but these had the 3 year warranty on them plus and 8Meg (rather than the standard 2Meg) Hardware Cache.
Now the second stage of the RAID rebuilding has begun. And again I'm thinking that the way the RAID is setup I had some luck again (considering the devices that had bad blocks on them).
Controller 0 Drive 0 !Bad Block!Need Replacement!
Controller 0 Drive 1 *OK*
those two drives were striped with
Controller 1 Drive 0 *OK*
Controller 1 Drive 1 *Bad Block!Replacement!
You see, even with those two drives with bad blocks and/or problems, because of the setup this raid, the raid array can be rebuilt without difficulty.
The Good: RAID is good. Even with two problematic drives, a 0+1 config is very recoverable without resorting to backup tape. I think it is a good thing I caught this problem before it really got out of control.
The Bad: I had hoped I was out of the range on the bad bunch of DeskStar Hard drives. I guess I was wrong. Raid array rebuilding on the highpoint controller could be better too, as it rebuild process could be on a per drive basis, rather than a per array basis. I found myself sitting through 3.5 complete drive rebuilds because the highpoint controller didn't tell me that the second drive had problems, and neither did the third party diagnostic utility until I did an advanced scan rather than a quick scan.
The Ugly: Shame on Highpoint for not being able to point out which drive on which controller is actually causing the problem. "Consistancy check failed" or "Unable to rebuild" error messages don't tell me enough as to how to fix the problem or what needs to be replaced.
Limited to the bad block senario only? Perhaps, but it would have saved me alot of time by not having to download a third party testing utility to get the answer. And to find out the problem I had to spend the time running an advanced scan to find the bad block problem on the second hard disk.
Well, the rebuild is almost complete, and I need to make sure Windows 2000 server reloads as expected. I'll reschedule a consistancy check for Sunday.
On a side note: I really should think about getting a third fan in that webserver, those 7200rpm drives run real hot.