Windows Vista Tips

Windows Vista Tips > Newsgroups > Windows Server > Mirror Drive Failure

Reply
Thread Tools Display Modes

Mirror Drive Failure

 
 
TheScullster
Guest
Posts: n/a

 
      06-22-2011
Hi all

We have a HP ML370 G6 Domain controller - 1 year old Windows Server 2008 R2.
The OS is on a mirror, but recently dropped one of the drives.
Basically the drive showed the orange failure light, but when the server was
rebooted, it rebuilt the array and operated fine for approx 1 month (till
today ).

Again the same drive is showing failure.

OK so this time I will suggest that our IT support company (I don't do
server work) replace the failed drive rather than allowing a rebuild.
But what else can cause this, and how can I build resilience against this
type of failure?
Fail-over server perhaps?


Thanks

Phil


 
Reply With Quote
 
 
 
 
wert
Guest
Posts: n/a

 
      06-29-2011

Not good, hopefully you were fully backed up?


--
wert
------------------------------------------------------------------------
wert's Profile: http://www.techhelpcentre.com/member.php?userid=64
View this thread: http://www.techhelpcentre.com/showthread.php?t=919198

 
Reply With Quote
 
Dave Warren
Guest
Posts: n/a

 
      06-29-2011
In message < > someone
claiming to be "TheScullster" <> typed:

>We have a HP ML370 G6 Domain controller - 1 year old Windows Server 2008 R2.
>The OS is on a mirror, but recently dropped one of the drives.
>Basically the drive showed the orange failure light, but when the server was
>rebooted, it rebuilt the array and operated fine for approx 1 month (till
>today ).
>
>Again the same drive is showing failure.


Almost like the drive is bad...

>OK so this time I will suggest that our IT support company (I don't do
>server work) replace the failed drive rather than allowing a rebuild.
>But what else can cause this, and how can I build resilience against this
>type of failure?


To be entirely honest, I'd be a little... annoyed... if a drive reported
as failing wasn't immediately removed from service and replaced in the
first place.

Maybe that's just me.

>Fail-over server perhaps?


A fail-over server is certainly an option. RAID-10 or RAID-6 would
isolate you from two simultaneous failures, at the cost of needing at
least 4 different drives. If you're in a situation where
possibly-failing drives might be put back into service, this might be
worth the minor up front cost.

It really depends on how sensitive to downtime you are, the odds of two
simultaneous failures are (IMO) fairly low. However, if the downtime of
restoring from backups or rebuilding will cost you more than the cost of
a couple extra drives, the math is easy.
 
Reply With Quote
 
TheScullster
Guest
Posts: n/a

 
      07-04-2011

<wert> wrote
>
> Not good, hopefully you were fully backed up?
>

Yes, backups of backups!
I expected the server to fail over onto the hot spare drive and continue
operating as normal.
What actually happened was the server stopped performing its DCHP role and
the network ground to a halt.

Once the server was rebooted and a replacement drive introduced, the server
re-built the OS array in the background and functioned normally.

Phil


 
Reply With Quote
 
TheScullster
Guest
Posts: n/a

 
      07-04-2011

"Dave Warren" wrote

>
>>We have a HP ML370 G6 Domain controller - 1 year old Windows Server 2008
>>R2.
>>The OS is on a mirror, but recently dropped one of the drives.
>>Basically the drive showed the orange failure light, but when the server
>>was
>>rebooted, it rebuilt the array and operated fine for approx 1 month (till
>>today ).
>>
>>Again the same drive is showing failure.

>
> Almost like the drive is bad...


The fact that it failed twice suggested this, so I had the support company
check it out after the second failure to prove (hopefully) that it was drive
rather than say a controller issue that was causing the grief.


>
>>OK so this time I will suggest that our IT support company (I don't do
>>server work) replace the failed drive rather than allowing a rebuild.
>>But what else can cause this, and how can I build resilience against this
>>type of failure?

>
> To be entirely honest, I'd be a little... annoyed... if a drive reported
> as failing wasn't immediately removed from service and replaced in the
> first place.
>
> Maybe that's just me.


No I agree whole heartedly.
The problem was that as soon as the server was rebooted (after the first
failure), the controller started a rebuild back onto the dodgy drive.
It rebuilt without errors and so it was considered OK to proceed.

>
>>Fail-over server perhaps?

>
> A fail-over server is certainly an option. RAID-10 or RAID-6 would
> isolate you from two simultaneous failures, at the cost of needing at
> least 4 different drives. If you're in a situation where
> possibly-failing drives might be put back into service, this might be
> worth the minor up front cost.
>
> It really depends on how sensitive to downtime you are, the odds of two
> simultaneous failures are (IMO) fairly low. However, if the downtime of
> restoring from backups or rebuilding will cost you more than the cost of
> a couple extra drives, the math is easy.


The migration to the new servers (we swapped domain controller and Exchange
box at the same time) was troublesome to say the least.
Since the bugs in this operation were resolved, downtime has been minimal.
The fact is that very occasional downtime is probably tolerable in our case,
but when you have repeated system loss lasting 1/2 a day then this level is
certainly intolerable.

Thanks for your thoughts Dave


Phil


 
Reply With Quote
 
Dave Warren
Guest
Posts: n/a

 
      07-05-2011
In message < > someone
claiming to be "TheScullster" <> typed:

>"Dave Warren" wrote
>
>> It really depends on how sensitive to downtime you are, the odds of two
>> simultaneous failures are (IMO) fairly low. However, if the downtime of
>> restoring from backups or rebuilding will cost you more than the cost of
>> a couple extra drives, the math is easy.

>
>The migration to the new servers (we swapped domain controller and Exchange
>box at the same time) was troublesome to say the least.
>Since the bugs in this operation were resolved, downtime has been minimal.
>The fact is that very occasional downtime is probably tolerable in our case,
>but when you have repeated system loss lasting 1/2 a day then this level is
>certainly intolerable.


When I consider something like how "sensitive to downtime" you might be,
I'd consider 3-4 days of downtime as the likely consequence of a
hardware failure unless you have replacement parts for absolutely
everything onhand in some fashion (online, hot spares, or spare parts)

At least from my point of view, if you can't handle 3-4 days of downtime
you should probably have a completely redundant architecture of some
sort in place. If you don't have the budget for that, you should be
prepared for a hardware failure to take you down 3-4 days assuming it
can take 1-2 days to get replacement hardware (and to get it working --
Assume your replacement hardware will fail too) plus 1-2 days to cover
troubleshooting time and rebuilding time once the hardware issues are
resolved.

I realize a lot of server administrators expect that hardware failures
can be resolved in 1-2 hours. They usually can. However, I prefer to
be prepared for the worst, either technically (redundancy) or
politically (user expectations).

No user that expects a 2-4 day recovery time screams when you fix it in
12 hours. Try it the other way around and see what happens, regardless
of how busy you look while you're fixing it.

That's my perspective anyway.
 
Reply With Quote
 
TheScullster
Guest
Posts: n/a

 
      07-06-2011

"Dave Warren" wrote

>
> When I consider something like how "sensitive to downtime" you might be,
> I'd consider 3-4 days of downtime as the likely consequence of a
> hardware failure unless you have replacement parts for absolutely
> everything onhand in some fashion (online, hot spares, or spare parts)
>
> At least from my point of view, if you can't handle 3-4 days of downtime
> you should probably have a completely redundant architecture of some
> sort in place. If you don't have the budget for that, you should be
> prepared for a hardware failure to take you down 3-4 days assuming it
> can take 1-2 days to get replacement hardware (and to get it working --
> Assume your replacement hardware will fail too) plus 1-2 days to cover
> troubleshooting time and rebuilding time once the hardware issues are
> resolved.
>
> I realize a lot of server administrators expect that hardware failures
> can be resolved in 1-2 hours. They usually can. However, I prefer to
> be prepared for the worst, either technically (redundancy) or
> politically (user expectations).
>
> No user that expects a 2-4 day recovery time screams when you fix it in
> 12 hours. Try it the other way around and see what happens, regardless
> of how busy you look while you're fixing it.
>
> That's my perspective anyway.


Thanks Dave

I guess I've been lucky so far in that whole system outages have been
restricted to one day max.
I do try to limit servers to a 4 year life, but with the (lack of)
reliability of the HP servers recently installed, having new equipment
clearly isn't any guarantee of a stress-free life.

Phil


 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Security Audit failure - logging on to Guest account Dominick Windows Server 3 04-22-2010 10:43 AM
Re: Pre-authentication events logged, but not lock-out or auth failure Paul Bergson [MVP-DS] Active Directory 0 11-04-2009 12:36 PM
Re: Pre-authentication events logged, but not lock-out or auth failure Jorge Silva Active Directory 0 11-03-2009 03:43 PM
Security Failures after Password Change Zachary Server Security 14 10-30-2009 06:02 PM
WIRELESS network adapter FAILURE on Vista ONLY AFTER SLEEP Vulture Windows Vista Hardware 12 02-25-2008 02:24 PM



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59