Discussion in 'Server Security' started by Mr Durand, Feb 22, 2006.

  1. Mr Durand

    Mr Durand Guest

    I have a Dell PowerEdge 2550 server running Windows 2003 SP1 and have
    strange behavior occurring every 7-14 days. The server will always respond
    to a ping however when the problems start, I notice services start becoming
    choppy in response and eventually do not respond at all....when it gets into
    this state, I cannot access the system with RDP and the actual system
    console loses the Ctrl-Alt-Del screen which was displayed prior to the
    trouble starting.

    The server runs the following apps/services

    IIS 6.0
    Windows SharePoint Services
    MSDE for SharePoint
    XWall 3.34

    The server is protected by firewall appliances and the event logs show no
    signs of any trouble on the system at anytime. A straight 2 hour Dell
    hardware diag run reflected no problems on the system whatsoever.

    Any idea what direction I can take to resolve the issue? I'm convinced it
    is software and maybe even specifically with the lsass engine but can't
    confirm without any event entries to go on. This doesn't happen frequently
    though it does consistently happen. The server has 3GB of RAM and everytime
    I look in on it, it has more than half of the physical RAM available with
    normal paging.

    I'm stumped....I'll call PSS if I have to but the problem is I can't force
    the problem to occur making it more difficult to troubleshoot.

    Phoenix, Arizona
    Mr Durand, Feb 22, 2006
    1. Advertisements

  2. Does a reboot solve the problem? If so it sounds like you may have a memory
    leak on some process. I have not tried it myself but there are some Resource
    Kit tools called memmonitor and memtriage that may help troubleshoot such as
    shown in the link below. As soon as you experience reduction in performance
    be sure to check Task Manager for memory/CPU usage and it would be a good
    idea to use something like Process Explorer from SysInternals to save a
    baseline of what memory and CPU usage is for your processes when everything
    is running well. --- Steve

    Steven L Umbach, Feb 23, 2006
    1. Advertisements

  3. Mr Durand

    Mr Durand Guest

    The only problem is that it is running along fine and then goes out the
    window in a matter of a minute or two and this is making it difficult to see
    what state the system is in at the moment it starts happening.

    Mr Durand, Feb 23, 2006
  4. Does a reboot fix the problem?? Maybe enabling performance monitor logging
    for memory and CPU can give you an idea of what is going on. It could also
    be a hardware problem such as a flaky power supply or overheating CPU but
    those problems are usually not as regular. --- Steve
    Steven L Umbach, Feb 23, 2006
  5. Mr Durand

    Mr Durand Guest

    yes...temporarily it fixes the issue. It was 12 days between failures the
    last time around now it seems like it is failing again a mere 7 hours later
    after my last reboot.

    Mr Durand, Feb 23, 2006
  6. What happens if you take XWall out of the mixture ?
    (why XWall at all ? why at 3.34 instead of 3.36 ?)
    Roger Abell [MVP], Feb 23, 2006
  7. Mr Durand

    Mr Durand Guest

    XWall is my mail relay, handles all SPAM filtering far better than Exchange
    does. I've been running XWall for 3 years on this system and have never had
    a problem. There has been no abnormal SMTP traffic/behavior and the 3.34
    version was running fine for just over 6 months before I started having this
    problem. I keep pretty good change logs and there have been no changes made
    to the system that correlate to this behavior nor does it pattern normal
    updates distributed from my WSUS server.

    Mr Durand, Feb 23, 2006
  8. Makes sense. I was wondering at your listing XWall but not Exchange.
    So this box, so to speak, straddles the wall to the outside, at least with
    web and mail ? I am sort of in the same boat as Steve, trying to get a
    glimpse with perf metrics and checking hardware health via Open Manage
    diagnostics and baseboard logs (I know you mentioned a 2 hr diag session,
    but you also said that back then it was not so frequent).
    Roger Abell [MVP], Feb 24, 2006
  9. If nothing shows in the logs I would take a look at enabling some
    Performance Logging for CPU and memory which could show if use of either had
    increased dramatically before the failure and could post alerts in the
    system or application log if certain thresholds you describe are crossed.
    Otherwise it could be hardware related for power supply, CPU overheating, or
    flaky memory as top suspects for hardware issues. The link below explains
    Performance Monitoring basics. In cases where I have had similar problems a
    fresh install of the operating system often fixed such problems for me if it
    was software related problem. --- Steve

    Steven L Umbach, Feb 24, 2006
  10. Mr Durand

    Mr Durand Guest

    not going to speak too soon but I moved the configuration and content
    databases for my SharePoint sites over to a dedicated SQL server and
    reconfigured SharePoint accordingly in server farm mode so I could remove
    the local instance of MSDE which may have been using way too many resources.
    Are there known issues with MSDE memory leaks?
    Mr Durand, Mar 1, 2006
  11. I am not aware of any documented memory leaks, speaking
    of MSDE 2K. On the other hand, MSDE is the same core
    bits as shipped in SQL, and can be used to run external code,
    so it is possible something in the total process space is leaking.

    Thanks for posting.
    Roger Abell [MVP], Mar 2, 2006
  12. Mr Durand

    Mr Durand Guest

    removing MSDE did reduce system resource usage however I finally found the

    I have a process that runs via a scheduled task at the top of every hour.
    It runs a compiled Actuate report against an ODBC data source and generates
    dynamic HTML pages based on the data retrieved in the query. If for some
    reason the data source is down, this process will continue to run and stay
    dormant though it will not use any system resources. The big problem was
    that the task could not start on the top of the next hour as the process was
    already running (so it thought). Well like a bonehead I ran another
    scheduled task at 5 minutes before the top of the hour that ran a batch file
    with the KILL -f command to kill the process by name. This appeared to be
    working fine for many months however it seems that since SP1 was installed
    there are times where this command simply hangs and uses MAX CPU resources
    while it hangs. A couple of days ago someone complained my web site was
    down and I immediately grabbed console access which was slow but I got in to
    see two instances of the KILL.EXE process and fighting over the CPU. As
    soon as I killed them, all services were responsive once again. I removed
    that task and configured the other task to destroy itself if it is still
    running for 45 minutes as it should be done in less than 10. Kill command
    can be a savior at times however this time it really killed me.

    Thanks for your help.

    Mr Durand
    Mr Durand, Mar 4, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.