Network failure detection and recovery in Windows Server 2003 Clus

Discussion in 'Clustering' started by itziks, Jun 30, 2008.

  1. itziks

    itziks Guest

    itziks, Jun 30, 2008
    #1
    1. Advertisements

  2. Assuming you have your crossover network properly configured, what should
    happen when you disconnect all the public adapters is that any group
    containing an IP resource should go into a failed state and start
    ping-ponging the groups back and forth between nodes. Once those groups have
    failover over the default threshold, the group will sit on whatever node it
    ends up on with the IP addresses in a failed state. You won't be able to
    bring them back online until network connectivity is restored.
     
    Jeff Hughes [MSFT], Jun 30, 2008
    #2
    1. Advertisements

  3. itziks

    itziks Guest

    do i have to bring them online manualy ?
    or when network link recover the resource group return online automatic ?
     
    itziks, Jun 30, 2008
    #3
  4. Depends, if the network link recovers while the groups are bouncing back and
    forth, the groups would come online automatically. If however, the link
    state returns after the cluster has gone through it's failover repetitions
    enough (10 times in a 6 hour timeframe) to stop failing over, you'd have to
    manually bring the resource groups online.
    --
    Jeff Hughes, MCSE
    Support Escalation Engineer
    Microsoft Enterprise Platforms Support (Server Core/Cluster)
     
    Jeff Hughes [MSFT], Jun 30, 2008
    #4
  5. itziks

    itziks Guest

    "10 times in a 6 hour timeframe"
    in my case i didnt see that the cluster did failover after 2 min that the
    link was down.
    the resource of ip in fail state and the cluster dont come back automaticly
     
    itziks, Jun 30, 2008
    #5
  6. Can you restate that last post? I didn't really understand your scenario. 10
    times in 6 hours just means that once we trigger a failover, a clock starts
    ticking. If we have 10 failovers before we hit 6 hours, we'll no longer try
    and bring resources online. In your case, that failing over back and forth
    should have stopped several minutes after the loss of network connectivity.
    The only exception to that is if you are using Exchange. If we lose
    connectivity to the GC, that may prevent the Exchange group from coming
    offline for 8-10 minutes.
    --
    Jeff Hughes, MCSE
    Support Escalation Engineer
    Microsoft Enterprise Platforms Support (Server Core/Cluster)
     
    Jeff Hughes [MSFT], Jun 30, 2008
    #6
  7. itziks

    itziks Guest

    my scenario is 2 cluster node 64 bit for sql 2005, when i disconnect the
    public network adapter from 2 nodes, like you wrote it ping pong several of
    times and then the resource is failed and the default group and the sql group.
    Then I reconnect the 2 nodes to the public network again, and nothing hapens,
    if i will wait enough time then the resource will came online Automatically.
    becouse you wrote that the group will tray to start in 6 hours .

    itziks
     
    itziks, Jun 30, 2008
    #7
  8. itziks

    itziks Guest

    my scenario is 2 cluster node 64 bit for sql 2005, when i disconnect the
    public network adapter from 2 nodes, like you wrote it ping pong several of
    times and then the resource is failed and the default group and the sql group.
    Then I reconnect the 2 nodes to the public network again, and nothing hapens,
    if i will wait enough time then the resource will came online Automatically ??
    becouse you wrote that the group will tray to start in 6 hours .

    itziks
     
    itziks, Jun 30, 2008
    #8
  9. The cluster won't try and re-start in 6 hours, that's just the amount of
    time we count failovers before we assume there is a permanent problem. In
    your scenario, the 10 failovers have already occurred, so the group stays
    offline until you, the user bring the groups back online assuming the
    network connectivity has been restored. As mentioned, if you reconnect the
    network AFTER the failover process has stopped, you will have to manually
    bring the resources back onlin.
    --
    Jeff Hughes, MCSE
    Support Escalation Engineer
    Microsoft Enterprise Platforms Support (Server Core/Cluster)
     
    Jeff Hughes [MSFT], Jul 1, 2008
    #9
  10. itziks

    itziks Guest

    thanks a lot
    it was Very Helpfull



     
    itziks, Jul 1, 2008
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.