Entire Network stops resonding!

Discussion in 'Server Networking' started by Mike Bailey, Aug 29, 2006.

  1. Mike Bailey

    Mike Bailey Guest

    My basic network is three of MS Server 2003. One which is PDC, DNS, DHCP
    and also servers our accounting app, one file and IIS server, and one
    ftp. I've got 2 Netgear 10/100/1000 smart switches and 1 HP 10/100
    switch. I've got about 50 XP workstations.

    What is happening is that all the sudden, every PC on the network,
    including the main server stops responding - seems to hang. They don't
    response to anything - even trying to Ctrl-Alt-Del or trying to open
    other icons. I've even had all icons on desktops disappear then
    eventually come back.

    My main server responds this way also. I can click on program icons
    like Event Viewer, TS Manager, Active Directory..., and they don't respond.

    Then after a little while, maybe 5 mins, things start responding again -
    even workstations.

    When I check my event viewer, I don't see anything logged at all and
    there are no other indications of any problems.

    The only thing that I can think of that has changed recently is that I
    added one of the Netgear 1000mb switches listed above. I'm using it as
    the "backbone" with all of my servers, other switches and firewall
    connect to it.

    I have no idea how to begin to figure out what is causing this and need
    some help and suggestions.

    Thanks,
    Mike Bailey
     
    Mike Bailey, Aug 29, 2006
    #1
    1. Advertisements

  2. Mike Bailey

    mike Guest

    Mike,

    Just throwing something into the mix, do you have and power cleaning / UPS
    devices protecting your servers / swtiches, if not it could be a power
    problem.

    Mike.
     
    mike, Aug 29, 2006
    #2
    1. Advertisements


  3. Mike,

    Check the resource usage on your DC. Everything "windows" is losing
    connectivity, but I doubt that your entire network is losing connectivity.
    I'm willing to bet that your DC becomes unavailable for a short period of
    time, and because it's the only DC on the network, nothing can authenticate
    or perform DNS queries etc...

    I've actually seen this same issue before. A client of mine on a W2K network
    had a similar problem. I resolved it by reducing the load on the DC. In
    their particular case, they were also using the DC as a router, and this was
    causing too much overhead for the DC to perform it's Active Directory duties
    in a timely manner.

    Another factor that could be causing this is also related to excess network
    card usage on your DC (in other words, it may not be related to excess CPU
    or RAM usage). If your NIC is pumping out too much traffic, and your network
    card isn't optimized for this (if it even can be), then client's will not be
    able to contact the DC.

    The common theme here, is that your clients might be losing contact with the
    DC.

    Hope this helps!
    Oliver
     
    Oliver O'Boyle, Aug 30, 2006
    #3
  4. Mike Bailey

    Mike Bailey Guest

    Thanks Oliver. These do sound like likely causes. I have in the past
    seen a few errors for Active Directory and DNS and the such - but not
    recently with this issue.

    When you say that you reduced the load on the DC, what did you do? This
    sounds very likely also since that one server is doing everything PDC,
    AD, DHCP, DNS... Could I solve this by using my second 2003 server as
    a secondary DC?

    As for the NIC possibility, how would I know if this was the case? And
    if it was, does this mean just getting a higher end NIC, or can I do
    something like run two NICS? This server actually does have two nics,
    but only 1 is being used currently. I think the original card was a
    10/100 and I added the currently used 10/100/1000.
     
    Mike Bailey, Aug 30, 2006
    #4
  5. When you say that you reduced the load on the DC, what did you do? This
    I reduced the load by removing the number of duties that server was
    responsible for. Eg. routing, DHCP, DNS etc... though in a smaller network,
    DHCP and DNS don't usually take up that many resources. But if they become
    unavailable, they can wreak havoc.

    As a general rule, you should try and have two DC's on your network anyway.
    So, if you can put a DC on another server (keep it off ISA servers and
    Exchange servers) then I would.

    That being said, you could also offload DHCP and DNS to the secondary server
    as well.
    Check your NIC load by going into Task manager and looking at the Network
    tab. if your NIC is pummping out 80% or more traffic (on a 100Mbps link),
    it's possible that you are starving your clients of AD and DNS resources
    etc... While you are there, check the Perfomance tab as well on the server
    the next time your network clients lose connectivity (if you can).

    And
    Most NICs these days are fine. If you are operating at 1Gbps, make sure your
    NIC and switch autodiscovered each other properly. Look for Duplex (should
    be Full) and Speed (1000mbps) settings.
     
    Oliver O'Boyle, Aug 30, 2006
    #5
  6. That is not a heavey load. Both my DCs (which are identical) do AD, DHCP,
    DNS, and WINS. My DCs are only running on single-10/100 Nics, P2-350's with
    256 meg or RAM,...and they do fine on a system of about 100 machines.
    No, two nics would create a disaster.

    You can't always blame software!
    Looking over the posts it seems to me you might have a failing switch. It
    would be a switch that was located in such a way that all traffic would stop
    if it failed.
    Next time it happens, yank the power on the suspected switch and plug it
    back in (effectively reboot it) and see if it picks up again. The LEDs on
    the switch may or may not help,..they may blink when the port recieves
    traffic but that doesn't mean the traffic is getting over the switch's
    Backplane through the processing between the ports.

    You could also have a failing cable, or EM interferrence with a cable
    depending on what the cable comes close to along the way. If it passes a
    high EM source along the way where that EM Source is "on & off" then when it
    is on connectivity fails, it goes off connectivity resumes. Also bad
    cables, pinched/kinked cables, or failing cables will be more sensitive to
    EM than a good cable would be.
     
    Phillip Windell, Aug 30, 2006
    #6
  7. Mike Bailey

    Mike Bailey Guest

    Why do you say that "two nics would create a disaster?" What about load
    balancing, or Teaming which is available on my Proliant server? I don't
    know anything about these two things, but do know they require two nics
    - and it *sounds* like it would be something that helps.

    But, reading your reply above, makes me suspicious about one of my
    switches. It's new and was actually put in shortly before this seemed to
    happen. After I installed it, I moved all of my servers, and firewall
    onto it. I've just moved them back off to the older switch. Let's see
    what happens.

    Thanks,
    Mike
     
    Mike Bailey, Aug 31, 2006
    #7
  8. Teaming is fine if it was a bandwidth issue,..but it is not,...and there is
    no way it would be in this case. You could run a single *old* 10mbps Nic
    and it would handle this amount of traffic just fine. Traffic to a DC in a
    LAN with 50 workstations is not even noticable let alone be able to over
    load anything,...I'm running twice as many machines as you.
    Maybe if you had a multi-segment LAN with 3000 workstations.

    Not seeing the DC isn't going to make all the machines act like they are
    locked up either, that sounds more like a virus now that I think about it.
    Perhaps like the W32.Spybot that just recently hit in our area. It could be
    virus related, which if thorughly spread to every machine, could effect them
    accordingly all at the same time.

    That is about all the guess work I can do. There is no way I can know for
    sure what it is, I can only toss out ideas. Next time it happens unplug the
    network cable on one effected machine, then what happens with that machine?
     
    Phillip Windell, Sep 1, 2006
    #8
  9. Mike Bailey

    Mike Bailey Guest

    Well, I just had another "episode" of where my main server and many
    workstations "hung". Still no idea what may have caused this. Here is
    something maybe related - or not. The only thing that I could connect
    to this happening previously was that I had put in a new Netgear Smart
    switch and moved all of my servers onto this switch. It of course was
    connect to another switch, which connected to another which most all of
    the workstations are connected. After reading some of the responses
    here, I totally disconnected this switch and basically "put things back
    like they were." I returned the switch and received a new replacement.
    Here is what may be related - I just put the new smart switch (netgear
    GS748t) in this morning. But, this time, it is only connected to the
    other switch - didn't move anything or plug anything else into it. And
    again today, at about 5pm-ish, everything hung for about 5+ mins, then
    returned to normal. This seemed to have stopped after removing the
    switch before.

    I also went head (once everything was responding again) and made my
    other server a backup domain controller. I don't really see what this
    new switch could have being doing since nothing was plugged into it - it
    was just plugged into another switch.

    I'm totally lost here as to what to do or what I can do to track down
    what is causing this - is there some type of monitoring software or
    something? I do check Event Viewer - nothing, and do have a program
    called SecurePoint network analyzer that I don't really understand, but
    I can see in some places that its not reporting errors, nor is the new
    switch, the HP switch, or the other switch - no errors reported on the
    ports???? I've also done a whole network virus scan - nothing found.

    Mike
     
    Mike Bailey, Sep 11, 2006
    #9
  10. Well Mike, I'm thinking now that this has nothing to do with Networking. I
    think there is something running on a schedule that it jamming everything up
    for a epriod of time till it finishes. I have no idea what it would be. I
    have no idea how it would be tracked down. Heck it may even be some kind of
    virus doing a DoS attack.

    I don't know what else to tell you.
     
    Phillip Windell, Sep 12, 2006
    #10
  11. Hi Mike,

    Can you provide some sort of topology diagram for us? I would like to see
    how everything is arranged now, and also, a diagram of how it was arranged
    before. A third diagram showing how it was arranged when things were working
    would also be helpful.

    Thanks,
    Oliver
     
    Oliver O'Boyle, Sep 12, 2006
    #11
  12. Mike Bailey

    Mike Bailey Guest

    I've made a diagram of my network that you can view at
    http://www.trewaxdirect.com/network.jpg

    The only thing that is different in this diagram (the before and after
    you requested) is the new Netgear switch. Keep in mind though that
    currently nothing is plugged into it - it is only plugged into the other
    switch. A note about the "file server with two nics" shown. This is a
    Dell Powervault and that is the way it was build. I don't think there
    is much traffic on the 2nd nic though.

    Last night we had another "outage" around 5:30 again. I agree, this has
    got to be something that is scheduled. Interestingly though, I had
    promoted one of my other servers to a Backup Domain Controller the other
    day. When this was occurring - first brought to my attention because
    users were complaining that their workstations were hanging, I check
    both my PDC and BDC. Both were not responding either - at least not to
    everything. I noticed that on both, apps like Event Viewer, DNS, DHCP,
    Services, Computer Mgt would not open right up when clicked on. But,
    Active Directory, Tasks Mgr, and IE would open right away. Several mins
    after clicking on the other icons, they would suddenly all open at once.

    Still clueless...

    Mike
     
    Mike Bailey, Sep 14, 2006
    #12
  13. http://www.trewaxdirect.com/network.jpg

    Get the second Nic out of the file server or set it to DHCP, then disable
    and unplug it. You can't do that,...you can't run two nics like that on the
    same subnet.

    That probably isn't your main problem,...but you still can't run it like
    that anyway.
     
    Phillip Windell, Sep 15, 2006
    #13
  14. Mike Bailey

    Mike Bailey Guest

    Like I had said in my email, this is a Dell PowerVault storage device
    which came configured like this - to use two nics. I just checked and
    it is doing DHCP, but also appears to be running Intel Network Teaming.
     
    Mike Bailey, Sep 18, 2006
    #14
  15. If it is running Teaming then that is fine, but if not, then disable one of
    them.
     
    Phillip Windell, Sep 19, 2006
    #15
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.