Hyper-v cluster guests loosing network access and or hosts getting blue screen :-(

Discussion in 'Clustering' started by Tiago Lock Martins, Mar 30, 2010.

  1. Hello,

    I`m having some issues on my Hyper-V environment.

    My environment:

    Windows 2008 R2 Enterprise Edition running on Dell PowerEdge R710 with
    32GB RAM.
    3 node cluster set up with quorum disk, have default type cluster disks
    and CSV disks.
    1 NIC host dedicated, 1 NIC set to trunk with two VLANs attached to
    virtual switch 2, 2 NICs as link aggregation attached to virtual switch 1
    2 fiber channel HBAs attached to Brocade switches and Dell EMC CX300
    storage.
    Around 52 LUNs associated to the cluster
    Some LUNs contains VHDs and others are RAW disks
    All guests NICs are synthetics

    When I had a 2 node cluster, some guests just loose network connection,
    even removing virtual switch connection with the guest NIC and attaching it
    back, doesn`t resolves the problem, need to restart the guest to get back to
    network. Sometimes the guests just hang.

    After adding a 3rd node to the cluster, the hosts started to restart
    after blue screens ( got Overlapped I/O message on 1 of them ), sometimes
    the host don`t restart but it simply restarts the guests running on it.

    I`ve been searching for 1 week already and didn`t found nothing that
    really helps.

    Anyone have any clue?

    If need more info, please tell me. Will be happy to help you to help me
    :-D

    Thanks for your time
     
    Tiago Lock Martins, Mar 30, 2010
    #1
    1. Advertisements

  2. Hello,

    The managed support service of the newsgroup Hyper-V is now available
    instead on:

    Hyper-V
    http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/threads

    Would you please repost the question in the forum with the Windows Live ID
    used to access your Subscription benefits? Our engineers will assist you in
    the new platform. In the future, please post Windows Server related
    questions directly to the forums. If you have any questions or concerns,
    please feel free to contact us: .

    Regards,
    Mervyn Zhang
    Microsoft Online Community Support

    ==================================================
    This posting is provided "AS IS" with no warranties, and confers no rights.
     
    Mervyn Zhang [MSFT], Mar 31, 2010
    #2
    1. Advertisements

  3. Tiago Lock Martins

    RCan Guest

    Hi Tiago,

    first of all did you had changed the quorum model when you added the 3rd
    node afterwards ? To verify, in your case the quorum model should be now
    "node majority".
    http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx#BKMK_RequirementsNandFS

    can you also please share more details around your eventlogs and more
    important the cluster log which should be generated (cluster /log gen)
    shortly after this happens.

    Regards
    Ramazan
     
    RCan, Mar 31, 2010
    #3
  4. Hello Tiago,

    I am curious to know if the suggestions from RCan helped you resolve this.

    This sounds suspiciously like an issue I had at University of Ottawa. To
    band-aid the issue, I would console to the VM and (disable/enable) the NIC
    in the guest to reset the IP stack. This was on IBM servers.

    At the time, we thought this was due to Broadcom firmware from IBM information
    but the problem was never completely solved and since I don't work there
    anymore, I don't know to this day if it was...

    I realize your scenario is not the same nor do you have the same hardware.
    I hope the band-aid helps save you time if it resolves your issue faster
    than a reboot...

    Do you have any network stats being monitored? This is where I would start
    looking...
     
    Brad Bird(MVP), Apr 1, 2010
    #4
  5. Tiago Lock Martins

    AndyS Guest

    Hi folks

    This sounds like a problem that we're having and we are using Broadcom NICs.
    We are running Hyper-v on Win 2008R2 on a dell R610 and have Windows Server
    2003 SP2 guests. The symptoms of the problem are virtual server guests loose
    network connectivity randomly (once every week or so) and some perform so
    poorly after loosing network connectivity that they have to be forced to shut
    down rather than rebooted properly.

    From what information I can find there seems to be a link with Broadcom
    adapters. Some suggest disabling 'IPv4 Large Send Offload' on the physical
    adapters on the host which we have done, however we still get servers falling
    over. Another suggestion was to disable 'IPv4 Large Send Offload' on the
    guest virtual adapters (inside the guest Win server 2003 OS) but this caused
    servers to fall over every few hours. The only errors I can find before the
    guests loos network connectivity is 'Event ID 5 - The miniport 'Microsoft
    Virtual Machine Bus Network Adapter' hung.' followed by 'Event ID 4 - The
    miniport 'Microsoft Virtual Machine Bus Network Adapter' reset.'

    We have a call open with Microsoft regarding this issue but we haven't got
    very far.

    Cheers

    Andy
     
    AndyS, Apr 12, 2010
    #5
  6. Thx for the reply RCan,

    Yes, I changed the quorum do node majority.

    In fact, now it is set to disk and node majority, as I have 4 nodes now
    :)...... omw to 5th :-D

    Will see the cluster /log gen results as soon as it happens again.

    Will keep group informed.
     
    Tiago Lock Martins, Apr 20, 2010
    #6
  7. Hi Brad,

    Will see que results of cluster /log gen when the issue happens again.

    Regarding disabling/enabling the guest NIC doesn`t repair the conectivity
    :-(

    Glad to see other thoughs.
     
    Tiago Lock Martins, Apr 20, 2010
    #7
  8. Hi Andys,

    Quote : " The only errors I can find before the
    Same thing this side :-(

    Anyone knows why this happen ( or may cause this behavior ) ?

    Thanks for your time so far ppl.
     
    Tiago Lock Martins, Apr 20, 2010
    #8
  9. Tiago Lock Martins

    RCan Guest

    Hi Tiago,

    let me know about the results of your cluster logs. Mainly this is related
    to network communication issues between nodes. Cross-check your network
    config from MS best practice perspective.

    Regards
    Ramazan
     
    RCan, Apr 20, 2010
    #9
  10. Hello,

    The managed support service of the newsgroup Windows Server Clustering is
    now available instead on:

    Clustering
    http://social.technet.microsoft.com/Forums/en-US/winserverClustering/threads

    Would you please repost the question in the forum with the Windows Live ID
    used to access your Subscription benefits? Our engineers will assist you in
    the new platform. In the future, please post Windows Server related
    questions directly to the forums. If you have any questions or concerns,
    please feel free to contact us: .

    Regards,
    Mervyn Zhang
    Microsoft Online Community Support

    ==================================================
    This posting is provided "AS IS" with no warranties, and confers no rights.
     
    Mervyn Zhang [MSFT], May 3, 2010
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.