Disk Cluster Resource stays in online pending state

Discussion in 'Clustering' started by System C Rob, Nov 13, 2006.

  1. System C Rob

    System C Rob Guest

    Hi,

    We have a win200 cluster which has a group that works fine on one nodem but
    if I try to move the group over to the other node, then the disk resource
    just stays in an online pending state. Other resources (i.e. Server IP and
    Name) come online OK. Any ideas most welcome.

    Thanks ..... Rob
     
    System C Rob, Nov 13, 2006
    #1
    1. Advertisements

  2. Rodney R. Fournier [MVP], Nov 13, 2006
    #2
    1. Advertisements

  3. System C Rob

    System C Rob Guest

    Hardware involved is as follows (same on both nodes):

    HP DL580 G2
    2 x FCA 2101 2GB Fibre Channel HBA
    Dual Connected to MSA1000 Controllers
    OS: Windows 2000 (SP4)

    Is strange as everything looks OK from HP management console and cluster
    administrator can read the quorum OK, just wont fail over the cluster disk in
    this group. I have another group with disk resources in that I have not
    tried failing over yet (It's a live hospital system, so it's kind of scary to
    be honest). I have some downtime in 30 mins (21:00 GMT) when I can try
    failing over again so I may try this.

    Have done some looking up and it may be performing a chkdsk, but it seems to
    be taking too long for it to be this and I was just looking to see if I can
    diable chkdsk for this new failback attempt.

    Thanks .... Rob
     
    System C Rob, Nov 13, 2006
    #3
  4. Rodney R. Fournier [MVP], Nov 13, 2006
    #4
  5. System C Rob

    System C Rob Guest

    No, it is about 3 years old, and we have had successfull failovers between
    the two nodes (Active/Active) in the past with failbacks always working. The
    node currently with the problem rebooted last week and we tracked it down to
    a faulty PCI slot, so the system board has been replaced along with the HBA
    that was in that slot, all firmwares (system board and HBA's) are the same
    level as they were before (and the same as the sister node). One difference
    being is that the HBA's in the failing node have now got newer drivers as
    (for some reason) the server did not recognise them correctly with the
    original (and admittedly old) drivers, hence I can't back rev them to the
    same as the working node.
     
    System C Rob, Nov 13, 2006
    #5
  6. Rodney R. Fournier [MVP], Nov 13, 2006
    #6
  7. System C Rob

    System C Rob Guest

    Hi,

    Yes, we have a called logged with HP, but they are dragging their heels a
    bit. I will look into the driver issues as you have suggested, thanks for
    your help and quick responses.

    Thanks ..... Rob
     
    System C Rob, Nov 13, 2006
    #7
  8. Wait... did you just say that the HBA was replaced? I bet you need to update
    the LUN mask with the new WWN of the new HBA. That should fix it.


    --
    Russ Kaufmann
    MVP - Windows Server - Clustering
    ClusterHelp.com, a Microsoft Certified Gold Partner
    Web http://www.clusterhelp.com
    Blog http://msmvps.com/clusterhelp
     
    Russ Kaufmann, Nov 13, 2006
    #8
  9. Russ's suggestion is definitively worth looking into if you can confirm for
    us that none of the disk resources failover and come online. If there is a
    problem with just the one disk and you can schedule a 15-20 min outage, I
    would:

    1. Shutdown the working node.

    2. Set the cluster disk driver to 'Demand' and the cluster service to
    'Manual' on the problem node and reboot the server.

    3. Server comes up, inspect disks management interface to see if you can
    see all the disks as you would expect to see them especially the problem
    disk. If no, there is your problem.

    4. If yes, verify you can read and write to the problem disk.

    5. If you can, start the cluster disk driver, start the cluster service,
    open cluadmin and see if all resources come online.

    6. If they do not, troubleshoot by looking at the cluster log and system
    log.

    7. If they come online, bring up the second node and test a failover.

    Note: Please do not forget to set the cluster disk driver and cluster
    service back to 'System' and 'Automatic' respectively

    --
    Chuck Timon, Jr.
    Microsoft Corporation
    Longhorn Readiness Team
    This posting is provided "AS IS" with no
    warranties, and confers no rights.
     
    Chuck Timon [Microsoft], Nov 15, 2006
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.