Can't failover resources on Win2k3 Cluster odd IP Address Conflict

Discussion in 'Clustering' started by kb1lxm, Oct 23, 2008.

  1. kb1lxm

    kb1lxm Guest

    We have a Win2k3 SP1 two node cluster we use as a file server. Two weeks ago
    we had to reboot a node following the reboot of our core network switches.
    We have two resources that we balance between the two nodes of the cluster,
    but after the reboot everything was on Node 1. Noting a performance lag we
    attempted to move the resource to Node 2, however the IP resource failed
    citing there was an IP Address conflict and gave us the MAC address, which
    turned out to be the MAC address for Node 2.

    We are able to move the Cluster resource (Cluster IP, Name, Quorum) to Node
    2, but both of the File Server resources won't move and give us this same
    problem.

    Now all the times we tried moving resources Veritas Netbackup was backing up
    the File Server resources. Could the running backups be preventing us from
    bouncing a resource from Node 1 to Node 2, or could I have another issue?
    Keep in mind this cluster is about 2 years old and this is the first time
    we've had this issue.

    Here is one of the errors:
    Event Type: Error
    Event Source: Tcpip
    Event Category: None
    Event ID: 4199
    Date: 10/23/2008
    Time: 4:16:53 AM
    User: N/A
    Computer: Node 1
    Description:
    The system detected an address conflict for IP address 10.1.20.74 with the
    system having network hardware address 00:09:6B:F5:0F:9A. Network operations
    on this system may be disrupted as a result.

    For more information, see Help and Support Center at
    http://go.microsoft.com/fwlink/events.asp.
    Data:
    0000: 00 00 00 00 03 00 50 00 ......P.
    0008: 00 00 00 00 67 10 00 c0 ....g..À
    0010: 00 00 00 00 00 00 00 00 ........
    0018: 00 00 00 00 00 00 00 00 ........
    0020: 00 00 00 00 00 00 00 00 ........

    The MAC Address displayed is for Node 2.

    Thanks,
     
    kb1lxm, Oct 23, 2008
    #1
    1. Advertisements

  2. kb1lxm

    kb1lxm Guest

    Sure thing, also it turned out not to be Netbackup's fault as we attempted to
    move the File Server Resource when it was not running.

    Here is the layout:

    Node 0 - 10.1.20.65 This is the Cluster Management IP
    Node 1 Main - 10.1.20.67
    Node 1 Heartbeat - 10.1.40.115
    Node 1 iSCSI - 10.1.40.114
    Node 2 Main - 10.1.20.69
    Node 2 Heartbeat - 10.1.40.116
    Node 2 iSCSI - 10.1.40.117
    FileServer 1 - 10.1.20.74
    FileServer 2 - 10.1.20.165

    The nodes connect to the shared storage via iSCSI. It's not the best
    configuration and was actually before my time.

    We've had this for a few years and before the Switch Maintanence a few weeks
    ago we've never had a problem moving FileServer 1 and FileServer 2 from Node
    to Node. Also the Cluster Resource (Node 0 name and IP and the quorum) have
    no problem moving between Node 1 and Node 2.

    I'm now wondering if something is wrong on the switch side, it's been a
    weird problem. The Cluster.log file does tells me little much either.

    These lines are the only places I see issues:

    2008/10/23-08:16:58.481 WARN [ClNet] Tcpip is not bound to adapter
    A3688E05-4B26-4717-9348-BBA01806D352.
    2008/10/23-08:16:58.481 WARN [ClNet] Tcpip is not bound to adapter
    76890C41-573E-4637-94AE-B8EF15A5E73F.

    - and later -

    2008/10/23-08:17:03.746 ERR IP Address <fileserver1-ip>: IP address
    10.1.20.74 is already in use on the network, status 5057.
    2008/10/23-08:17:03.746 INFO [RM] RmpSetResourceStatus, Posting state 4
    notification for resource <ctdayfs004-ip>
    2008/10/23-08:17:03.746 INFO [FM] NotifyCallBackRoutine: enqueuing event
    2008/10/23-08:17:03.746 INFO [FM] Calling RmNotifyChanges in monitor 0e2c.
    2008/10/23-08:17:03.746 INFO [CP] CppResourceNotify for resource ctdayfs004-ip
    2008/10/23-08:17:03.746 INFO [FM] FmpRmDoHandleCriticalResourceStateChange:
    call InterlockedDecrement on gdwQuoBlockingResources, Resource
    fe02d2ab-9d9b-461d-b0e8-21390dce6b22
    2008/10/23-08:17:03.746 WARN [FM] FmpHandleResourceTransition: Resource Name
    = fe02d2ab-9d9b-461d-b0e8-21390dce6b22 [ctdayfs004-ip] old state=129 new
    state=4
    2008/10/23-08:17:03.746 WARN [FM] FmpHandleResourceTransition: Resource
    failed, post a work item
    2008/10/23-08:17:03.746 INFO [GUM] GumSendUpdate: queuing update type 0
    context 8
    2008/10/23-08:17:03.746 INFO [GUM] GumSendUpdate: Dispatching seq 257420
    type 0 context 8 to node 2
    2008/10/23-08:17:03.746 INFO [GUM] GumSendUpdate: completed update seq
    257420 type 0 context 8
    2008/10/23-08:17:03.746 INFO [FM] FmpPropagateResourceState: resource
    fe02d2ab-9d9b-461d-b0e8-21390dce6b22 failed event.

    Thanks,
     
    kb1lxm, Oct 23, 2008
    #2
    1. Advertisements

  3. seems to me that you have a double IP address somewhere in your network

    this is what you do
    1) offline the whole group "FileServer1" (this will offline its IP address
    as well)
    2) try to ping 10.1.20.74.... if it responds, then *some* other machine in
    your network is using the same IP

    rgds,
    Edwin



     
    Edwin vMierlo [MVP], Oct 28, 2008
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.