PDA

View Full Version : 2 HDD´s in Raid1 dead...


alejandro_m1
10-17-2006, 08:27 AM
So here goes my tragic story

I had a file server setup with Suse Linux and using a couple Maxtor Diamond Plus 9 250gb SATA HDD´s, working in Raid 1 for "extra protection". I set it up with Yast Partitioning program because to be honest I´m quite a noob in Linux and know very few commands, so I do need some graphic inteface most of the time. Everything worked fine, and kept that way for a few months, from February until last week, when I decided that it was time to put all my files in order.

While I was classifying some textures I start to get a pretty slow response from the server, for example with Max while trying to access the textures that resided in the file server, it took about 10 minutes to load a simple scene, so I knew something was wrong. After checking that in fact the file server was acting terribly slow I decided to restart it, after all it had been working for about 2 weeks without being turned off.

So after restarting there is no way to go into Linux, it keeps looping a disk check, trying without success to move data from some damaged sectors. So I think, ok so 1 one of the disks died, what a shame, so I will try to get the data from the other disk... The problem is there is the exact same error in the "clone" disk... So to make the story short after a lot of work trying to get either one of the disks to work and to get the array working again I finally manage to get some data by plugging it into my main workstation and running some EXT2 recovery programs in XP. Most of the info that I was moving and classifying was lost, in both HDD but most of the other files were OK.

So I decide that Linux isn´t for me, i think to myself that maybe if I did knew a bit more about Linux I would have recovered more info, so I went back to XP for the file server, I reformat the HDDs with NTFS, mount them both as slaves with no RAID and run a chkdsk on each one to find out how bad was the problem, to my surprise both report the exact same number of KB in bad sectors (something around 560 kb). So after this long story here goes my question: how could this happen? how can they have the same damaged sectors, and failed at the exact same time? I can´t find any culprit of the HDDs death, I mean no power failure, no problems with the PSU (as far as I know) and the temp are a bit hot but not that much.

BTW Maxtor utility states that both disks are healthy

I only had a HDD dying on me years before, and was some factory problem (a Seagate) that´s why a changed to Maxtor and recently to Western Digital, but now two Maxtors dying together is way too much.


PD
So I try to RMA the disks and find out that the 8th of this month the warranty expired. HDDs are pretty cheap now so for my file server I´m getting new ones (WD most probably I´ll stay away from Maxtor this time) But just for not wasting this somewhat new HDDs: How bad would you think that the damage to the disk is? Would you use it for non-critical data or is it time to toss them both into the trash can? In the old times I remember using a HDD with a lot of damaged sectors, just running chkdsk from time to time and it worked for years, but with new HDD´s I heard that when you get some bad sectors you should kiss your disk good bye, is this true?

lots
10-17-2006, 12:57 PM
It could be possible that you have bad RAM or a faulty controller (which would explain why both drives recieved bad sectors at the same time and in the same place). But generally if a problem with RAM or the controller is present, the data on the hard drive is corrupted, rather than a bad sector. Out of curiosity, what motherboard do you have? Perhaps the hard drive controller on that particular motherboard has some issues. To check the RAM try out memtest86. It should find errors fairly quickly (first 30 to an hour).

I would definately rule out the computer itself as the problem factor before moving new hardware onto the drives. I would hate to have the new drives corrupt in such a way.

I also don't see why you can't keep using the drives, though I'd be cautious of using them for critical data. Perhaps just as a temporary storage solution. Just be sure to clean them up, and all of that fun stuff..

When it comes to hard drive brands,they're all basically the same when it comes to reliability. Though, most people agree that Maxtor drives suck :P All the big HD makers have had some problem at some point in the past. Personally I use Seagate because of thier longer warranty periods (5 years).

alejandro_m1
10-18-2006, 05:03 PM
Thanks Lots

The mobo is an old MSI KT4 Ultra running an Athlon XP 1800+, it was an old workstation that I decided to turn into a server, I just changed the case and PSU which is nos a Thermaltake Bach which came with a pretty decent amount of fans.

I agree that I should check the computer hardware to see if there is a problem before getting some new HDDs, so here is what I have checked until now.

I checked the memory with memtest and it all appears to be fine, so maybe the problem isn´t the Ram.

I changed the PSU for one with a little more power, the SATA cables, and a Northbridge fan that had died it appears sometime ago and I didn´t noticed until now (this last one I´m pretty sure would affect the HDDs but just in case).

I also changed the hardrives from where they were to just in front of the intake fan, just in case it was heat related because around the time they crashed the climate was pretty hot around here, I also have installed an air conditioner where the render nodes and file server reside.

Do you think that any of this would help in my specific problem?

CGTalk Moderation
10-18-2006, 05:03 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.