Domain Controller Stops Processing All Login Requests Randomly

Discussion in 'DNS Server' started by Josh-UCDHSC, Apr 29, 2005.

  1. Josh-UCDHSC

    Josh-UCDHSC Guest

    I have a windows 2003 server running as a Domain Controller in a lab
    setting. Randomly the server will stop responding to workstation client
    login requests in the lab and at the server's terminal as well. Eventually
    users in the lab will be able to login, however no GPOs are applied and
    their mapped network drive is absent. At the server itself, if I login as
    Admin it takes about 7 to 10 minutes and I then have to reboot the server.
    After a reboot, the problem disappears (usually anywhere from two to seven
    days). Before rebooting I run dcdiag, whose output I have
    added below (incidentally, to produce this output dcdiag takes forever to
    run
    as well). When I run dcdiag when the system is healthy; there are no
    errors.

    I am/was working with Microsoft Support on the issue and we thought it was
    fixed last week. Unfortunately the problem reoccurred today.

    If anyone has any ideas on the issue please help!

    If I am posting to the wrong group kindly redirect.

    Thanks,

    JC


    Domain Controller Diagnosis

    Performing initial setup:
    Done gathering initial info.

    Doing initial required tests

    Testing server: Default-First-Site-Name\WAIMEA
    Starting test: Connectivity
    ......................... WAIMEA passed test Connectivity

    Doing primary tests

    Testing server: Default-First-Site-Name\WAIMEA
    Starting test: Replications
    ......................... WAIMEA passed test Replications
    Starting test: NCSecDesc
    ......................... WAIMEA passed test NCSecDesc
    Starting test: NetLogons
    [WAIMEA] An net use or LsaPolicy operation failed with error 1203,
    No network provider accepted the given network path..
    ......................... WAIMEA failed test NetLogons
    Starting test: Advertising
    ......................... WAIMEA passed test Advertising
    Starting test: KnowsOfRoleHolders
    ......................... WAIMEA passed test KnowsOfRoleHolders
    Starting test: RidManager
    ......................... WAIMEA passed test RidManager
    Starting test: MachineAccount
    Could not open pipe with [WAIMEA]:failed with 64: The specified
    network name is no longer available.
    Could not get NetBIOSDomainName
    Failed can not test for HOST SPN
    Failed can not test for HOST SPN
    * Missing SPN :(null)
    * Missing SPN :(null)
    ......................... WAIMEA failed test MachineAccount
    Starting test: Services
    Could not open Remote ipc to [WAIMEA]:failed with 64: The specified
    network name is no longer available.
    ......................... WAIMEA failed test Services
    Starting test: ObjectsReplicated
    ......................... WAIMEA passed test ObjectsReplicated
    Starting test: frssysvol
    [WAIMEA] An net use or LsaPolicy operation failed with error 64,
    The specified network name is no longer available..
    ......................... WAIMEA failed test frssysvol
    Starting test: frsevent
    ......................... WAIMEA failed test frsevent
    Starting test: kccevent
    Failed to enumerate event log records, error The specified network
    name is no longer available.
    ......................... WAIMEA failed test kccevent
    Starting test: systemlog
    Failed to enumerate event log records, error The specified network
    name is no longer available.
    ......................... WAIMEA failed test systemlog
    Starting test: VerifyReferences
    ......................... WAIMEA passed test VerifyReferences

    Running partition tests on : ForestDnsZones
    Starting test: CrossRefValidation
    ......................... ForestDnsZones passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... ForestDnsZones passed test CheckSDRefDom

    Running partition tests on : DomainDnsZones
    Starting test: CrossRefValidation
    ......................... DomainDnsZones passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... DomainDnsZones passed test CheckSDRefDom

    Running partition tests on : Schema
    Starting test: CrossRefValidation
    ......................... Schema passed test CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... Schema passed test CheckSDRefDom

    Running partition tests on : Configuration
    Starting test: CrossRefValidation
    ......................... Configuration passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... Configuration passed test CheckSDRefDom

    Running partition tests on : coe
    Starting test: CrossRefValidation
    ......................... coe passed test CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... coe passed test CheckSDRefDom

    Running enterprise tests on : coe.cudenver.edu
    Starting test: Intersite
    ......................... coe.cudenver.edu passed test Intersite
    Starting test: FsmoCheck
    ......................... coe.cudenver.edu passed test FsmoCheck
     
    Josh-UCDHSC, Apr 29, 2005
    #1
    1. Advertisements

  2. Josh-UCDHSC

    Herb Martin Guest

    Usually this type of thing is due to MULTIPLE DNS servers
    from different "sets" (e.g., ISP and Internal, or lab vs. production)
    being listed on either the DC or the Client NICs.

    DNS clients expect that EVERY DNS server listed will return the
    SAME answers.

    This jibes with the scenario above -- after reboot the client
    (or the DC) accidentally picks the other (correct or incorrect)
    DNS server.
    Check this stuff:


    --
    DNS for AD
    1) Dynamic for the zone supporting AD
    2) All internal DNS clients NIC\IP properties must specify SOLELY
    that internal, dynamic DNS server (set.)
    3) DCs and even DNS servers are DNS clients too -- see #2
    4) If you have more than one Domain, every DNS server must
    be able to resolve ALL domains (either directly or indirectly)

    netdiag /fix

    ....or maybe:

    dcdiag /fix

    (Win2003 can do this from Support tools):
    nltest /dsregdns /server:DC-ServerNameGoesHere
    http://support.microsoft.com/kb/q260371/

    Ensure that DNS zones/domains are fully replicated to all DNS
    servers for that (internal) zone/domain.

    Also useful may be running DCDiag on each DC, sending the
    output to a text file, and searching for FAIL, ERROR, WARN.

    Single Label domain zone names are a problem Google:
    [ "SINGLE LABEL" domain names DNS 2000 | 2003 microsoft: ]
     
    Herb Martin, Apr 29, 2005
    #2
    1. Advertisements

  3. Josh-UCDHSC

    Josh-UCDHSC Guest

    I have two DC doing multi-master replication. They use split-brain DNS
    where the main BIND servers on campus are the only forwarders. Both list
    themselves as the only DNS entries in their NIC TCP/IP configurations, now.
    This is something that Microsoft fixed as I had both DC IP addresses listed
    in the DNS entry as well as the main BIND servers added as well.
    Incidently, the clients have all four DNS entries listed in their NIC
    configuration. I can see why that would cause a problem on the DC's. The
    fact that all the clients go down at once would indicate to me that it isn't
    the clients problem but the DCs. As a precaution, I am ghosting the TCP/IP
    DNS configuration out to all the clients tonight where the DCs are the only
    DNS entries.
    check - secure only
    ghosting this change tonight. Why would they all go down at once? Does it
    matter if the two DCs IP addresses of themselves in the DNS configuration of
    the NIC are in the same order? Does each one have to be pointing to itself
    first?
    So my DC are DNS servers, using the main forwarders on campus. If they
    somehow loose connection to these two forwarders, could that cause the
    problem?

    I have only one domain. It is its own forest separate from the main campus
    domain.
    I do get a warning: "[WARNING] At least one of the <00> 'WorkStation
    Service', <03> 'Messenger Service', <20> 'WINS' names is missing."
    WINS service test. . . . . : Skipped
    There are no WINS servers configured for this interface.

    The netdiag /fix added the second PASS line.
    DNS test . . . . . . . . . . . . . : Passed
    PASS - All the DNS entries for DC are registered on DNS server
    '132.194.21.250' and other DCs also have some of the names registered.
    PASS - All the DNS entries for DC are registered on DNS server
    '132.194.21.96' and other DCs also have some of the names registered.
    No errors or warnings..

    What is the difference between dcdiag and dcdiag2k3?
    C:\Documents and Settings\Administrator>nltest /dsregdns /server:WAIMEA
    Flags: 0
    Connection Status = 0 0x0 NERR_Success
    The command completed successfully
    reading... thanks!
    _msdcs
    _sites
    _tcp
    _udp

    all show up in the DNS (AD integrated)...
    No problems on either DC...
    I am looking into exactly what this means - single label domain names. I've
    not heard of it before...

    Thanks for your help!

    For awhile I was blocking all external access to the DCs with a separate
    linux firewall. I've turned that off and instead I am only blocking
    specific ports. Perhaps the DCs where sending out DNS requests and the
    responses were being blocked. I'm not sure, the firewall should allow
    through all related and established connections but Microsoft support said I
    should try it turned off.

    Let me know if there is anything in my response that appears out of the
    ordinary or is suspicious.
     
    Josh-UCDHSC, Apr 30, 2005
    #3
  4. Josh-UCDHSC

    Herb Martin Guest

    Then if the BIND servers are only used for the forwarders in a split DNS or
    shadow DNS setup they are irrelevant. As long as NO client (including DCs)
    try to use them directly on their NIC->IP properties which is the mistake
    that matches your symptoms most closely.
    now.

    That is fine too -- as long as they are replicating AD successfully.

    (There is a lot of naive advise floating around, even some MS KB,
    that claims you should point them at their opposite DC but this is
    NOT correct if you have replication working.)
    That's FINE.
    This is NOT. You must only use the "internal DNS server set"
    on domain clients.
    The clients must NOT list DNS server from outside the domain set --
    i.e., DNS servers which cannot resolve the AD-internal version of the
    domain.

    This is your problem - -change the clients.

    There may be OTHER problems due to some (likely temporary) DC
    glitch, but the problem is persistent and worse due to this
    misconfiguration.
    That is correct. It is not really a precation but a serious requirement.

    Likely the support person doesn't understand this issue -- it is a random
    and intermittent problem.

    it

    Well, if you had a glitch where the DCs got slow while replicating, or a
    switch
    was overloaded (or anything) then the clients would try the BIND DNS and
    will "latch on" for a while.

    Eventually they usually go back to the first or earlier DNS but they don't
    do
    this right away.
    No, because that would affect the INTERNET resolution, not the internal
    AD resolution.
    The for you the word 'all' means 'one' but the truth of the sentence
    remains.

    The word all is there to cover those who have 2 or more domains and forget
    to arrange for domain1 to find domainC etc.
    If you have more than one subnet (for domain clients and servers) then
    you need WINS server and need to configur ALL NIC->IP properties
    for WINS server to use the same "set of WINS servers" - this includes
    DCs and other servers.
    This is because the two DCs are using only the internal, dynamic DNS'
    server set (e.g., themselves) for DNS client settings.

    Your problem is (mostly) on you clients that refer to the BIND servers
    in those NIC properties.
    It's named DCDiag.exe in both cases and I don't pay attention
    to any differences.

    That's because the DCs are correct.
    You must NOT (although it is legal for some silly reason) name an AD
    domain like this: "domain", but must use at least "domain.com" or even
    "child.domain.edu".
    Perhaps. Firewalls between DCs or between DCs and their clients are
    POSSIBLE but tricky to get right.
    Probably until you get the rest fixed. But you might open the
    filter up selectively to just your domain subnets IF you can isolate
    YOUR domain machines by subnet (e.g., they don't come in from
    through the university, dorms, wan from home, etc.)
    Fix the clients And then we will look for anything that is "slowing" or
    hosing up the Client to DC or DNS access.
     
    Herb Martin, Apr 30, 2005
    #4
  5. Josh-UCDHSC

    Josh-UCDHSC Guest

    Herb,

    Thanks for the help. I really appreciate it. I have made the necessary
    changes and so far everything is working fine. This coming Thursday will
    mark a week with no problems. I will keep this thread active with updates
    to status of the problem. The unfortuate nature of the problem is that it
    takes 2 to 7 days to reoccur. Hopefully I will report that I have nothing
    to report later this week!

    Josh

     
    Josh-UCDHSC, May 2, 2005
    #5
  6. Josh-UCDHSC

    Herb Martin Guest

    We are here if you need us.
    Now you understand how many people have naively
    believe they can get away with such (mixture of DNS
    servers listed on clients) practices.

    Superstitiously it SEEMS to work, some of the time.
     
    Herb Martin, May 3, 2005
    #6
  7. Josh-UCDHSC

    Josh-UCDHSC Guest

    Herb,

    The server crashed yesterday. I am going to push the DNS server
    configuration out with Group Policy just to be sure I get all the machines
    before I throw in the towel as switch back to Linux :). I believe that will
    override the local policy.

    Josh
     
    Josh-UCDHSC, May 3, 2005
    #7
  8. Josh-UCDHSC

    Herb Martin Guest


    You don't have a Windows problem, but a misconfiguration.

    Linux won't help if you misconfigure it or the clients similarly.

    While GPOs can be used for this, it involves added
    settings that aren't there by default so it is harder than
    just getting the clients correct unless you have dozens
    of clients.
     
    Herb Martin, May 4, 2005
    #8
  9. Josh-UCDHSC

    Josh-UCDHSC Guest

    One thing I am still baffled by is why the problem is exihibited on the DC
    when they are configured correctly. The whole subnet can't login to the
    domain and I can't login in to the DC (which is essentially logging into the
    domain as a client too). Do the clients that are misconfigured somehow
    communicate a DNS errror and cause the DNS server to hang on the DC? If I
    restart the DNS service without rebooting it doesn't help. Could this in
    any way be Active Directory related?
     
    Josh-UCDHSC, May 4, 2005
    #9
  10. Josh-UCDHSC

    Herb Martin Guest

    That implies the DC is itself misconfigured in its
    own client DNS settings.

    What DNS servers are configured on the DC NIC?

    Are they all holding the Domain zone, or able to fully
    resolve that zone?
    Which is why it implies a client NIC->IP
    problem on the DC unless the DNS server itself
    is misconfigured.
    Well, yes, but in the sense that almost all AD replication
    OR authentiction (logon) problems are really DNS problems.

    Practically all of those DNS problems are due to
    misconfiguration. And a high percentage of those
    are casue by trying to configure "two sets" of DNS
    servers on the client NICs (DCs are DNS clients too.)
     
    Herb Martin, May 4, 2005
    #10
  11. Josh-UCDHSC

    Josh-UCDHSC Guest

    Many thanks; Comments inserted below, also -
    Microsoft thinks they have fixed it again. They created a global catalog on
    the 2nd domain controller. They also changed the DNS dynamic update
    settings on the DC from "secure" to "non secure and secure". I questioned
    this but they said it was needed for both DCs to work as global catalogs.

    Also, the 2nd domain controller's A record appeared to have been pointing to
    the wrong location, it was replying to pings from the first DC as
    halcyon.cudenver.edu instead of halcyon.coe.cudenver.edu.

    The server stopped responding today. The forward lookup zone on waimea
    wasn't present on in the DNS after the reboot. MS Support recreated it.
    This was the first time I've seen this happen.

    I ran the MPSRPT_DirSvc.EXE before rebooting the this time. If you want to
    see anyout put from the myriad of tests performed let me know. I have
    pasted the WAIMEA_DCDIAG.TXT at the end of this post. From google I found
    http://support.microsoft.com/?kbid=839880. I'm not sure if this relevant.

    Clients are running Windows XP Professional With SP2. Firewall is turned
    on.

    Full computer name: WAIMEA.coe.cudenver.edu
    Domain: coe.cudenver.edu

    DNS Server Addresses, in order of use:
    132.194.21.250
    132.194.21.96

    "append primary and connection specific DNS suffixes" is selected

    DNS suffix for this connection: cudenver.edu

    "Register this connection's address in DNS" is checked
    "Use this connection's DNS suffix in DNS registration" is checked
    Not sure what you mean by "holing the Domain zone" the DCs resolve nslookups
    for computers in the domain as computername.coe.cudenver.edu for nslookups
    outside the
    WAIMEA_DCDIAG.TXT Output below:


    Domain Controller Diagnosis

    Performing initial setup:
    * Verifying that the local machine WAIMEA, is a DC.
    * Connecting to directory service on server WAIMEA.
    [WAIMEA] Directory Binding Error 1753:
    There are no more endpoints available from the endpoint mapper.
    This may limit some of the tests that can be performed.
    * Collecting site info.
    * Identifying all servers.
    * Identifying all NC cross-refs.
    * Found 2 DC(s). Testing 1 of them.
    Done gathering initial info.

    Doing initial required tests

    Testing server: Default-First-Site-Name\WAIMEA
    Starting test: Connectivity
    * Active Directory LDAP Services Check
    * Active Directory RPC Services Check
    [WAIMEA] DsBindWithSpnEx() failed with error 1753,
    There are no more endpoints available from the endpoint mapper..
    Printing RPC Extended Error Info:
    Error Record 1, ProcessID is 1908 (DcDiag)
    System Time is: 5/4/2005 20:54:21:884
    Generating component is 2 (RPC runtime)
    Status is 1753: There are no more endpoints available from the
    endpoint mapper.

    Detection location is 500
    NumberOfParameters is 4
    Unicode string: ncacn_ip_tcp
    Unicode string:
    fb8e829f-b7de-4769-a6da-214e38a0bd8c._msdcs.coe.cudenver.edu
    Long val: -481213899
    Long val: 65537
    Error Record 2, ProcessID is 1908 (DcDiag)
    System Time is: 5/4/2005 20:54:21:884
    Generating component is 2 (RPC runtime)
    Status is 1722: The RPC server is unavailable.

    Detection location is 761
    NumberOfParameters is 1
    Unicode string: 4020
    Error Record 3, ProcessID is 1908 (DcDiag)
    System Time is: 5/4/2005 20:54:21:884
    Generating component is 8 (winsock)
    Status is 1722: The RPC server is unavailable.

    Detection location is 313
    Error Record 4, ProcessID is 1908 (DcDiag)
    System Time is: 5/4/2005 20:54:21:884
    Generating component is 8 (winsock)
    Status is 10048: Only one usage of each socket address
    (protocol/network address/port) is normally permitted.

    Detection location is 311
    NumberOfParameters is 3
    Long val: 4020
    Pointer val: 0
    Pointer val: 0
    Error Record 5, ProcessID is 1908 (DcDiag)
    System Time is: 5/4/2005 20:54:21:884
    Generating component is 8 (winsock)
    Status is 10048: Only one usage of each socket address
    (protocol/network address/port) is normally permitted.

    Detection location is 318
    ......................... WAIMEA failed test Connectivity

    Doing primary tests

    Testing server: Default-First-Site-Name\WAIMEA
    Skipping all tests, because server WAIMEA is
    not responding to directory service requests
    Test omitted by user request: Topology
    Test omitted by user request: CutoffServers
    Test omitted by user request: OutboundSecureChannels
    Test omitted by user request: VerifyReplicas
    Test omitted by user request: VerifyEnterpriseReferences

    Running partition tests on : ForestDnsZones
    Starting test: CrossRefValidation
    ......................... ForestDnsZones passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... ForestDnsZones passed test CheckSDRefDom

    Running partition tests on : DomainDnsZones
    Starting test: CrossRefValidation
    ......................... DomainDnsZones passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... DomainDnsZones passed test CheckSDRefDom

    Running partition tests on : Schema
    Starting test: CrossRefValidation
    ......................... Schema passed test CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... Schema passed test CheckSDRefDom

    Running partition tests on : Configuration
    Starting test: CrossRefValidation
    ......................... Configuration passed test
    CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... Configuration passed test CheckSDRefDom

    Running partition tests on : coe
    Starting test: CrossRefValidation
    ......................... coe passed test CrossRefValidation
    Starting test: CheckSDRefDom
    ......................... coe passed test CheckSDRefDom

    Running enterprise tests on : coe.cudenver.edu
    Starting test: Intersite
    Skipping site Default-First-Site-Name, this site is outside the
    scope

    provided by the command line arguments provided.
    ......................... coe.cudenver.edu passed test Intersite
    Starting test: FsmoCheck
    GC Name: \\WAIMEA.coe.cudenver.edu
    Locator Flags: 0xe00001fd
    Warning: Couldn't verify this server as a PDC using DsListRoles()
    PDC Name: \\WAIMEA.coe.cudenver.edu
    Locator Flags: 0xe00001fd
    Time Server Name: \\WAIMEA.coe.cudenver.edu
    Locator Flags: 0xe00001fd
    Preferred Time Server Name: \\WAIMEA.coe.cudenver.edu
    Locator Flags: 0xe00001fd
    KDC Name: \\WAIMEA.coe.cudenver.edu
    Locator Flags: 0xe00001fd
    ......................... coe.cudenver.edu passed test FsmoCheck
     
    Josh-UCDHSC, May 5, 2005
    #11
  12. Josh-UCDHSC

    Herb Martin Guest

    For small forests and single domain forests it
    is usual to make ALL DC into GCs.
    Replies to pings don't happen by name;
    seeing that would be based on reverse
    records or some such.
    Gone entirely?

    Someone deleted it. They don't just disappear.
    Generally irrelevant EXCEPT that the machines
    with the firewall cannot be "servers" by default;
    you must enable services/port they wish to share,
    and that is not an issue for you here.
    Are these both holding the SAME exact zone (now)?

    They both must do that.
    Irrelevant to the type of problems you are
    having this is a user convenenience setting.
    Irrelevant, but usually unnecessary -- the key is to
    get the FULL computer name correct in the System
    control panel, then this setting is never needed with
    ONE NIC, and seldom needed with multiple NICs.
    Same as previous.
    Does the DNS server have the zone defined and have a
    full copy of it (not some external partial copy of a zone
    with the same name)?
    This error is disturbing -- has someone been messing
    with the registry in an attempt to alter the way that the
    RPC server works?

    If not you may have a corrupted DC which would benefit
    from a "REPAIR Install" (from the original CDROM.)
    Something wrong with the RPC Server?
    How many IP addresses does this DC have? How many NICs?
    Likely a DNS or RPC server problem still...

    Has anyone every SEIZED a role (PDC Emulator) in this domain?

    Is this the only (current) DC? Has there every been more?

    For DNS check all of this:


    DNS for AD
    1) Dynamic for the zone supporting AD
    2) All internal DNS clients NIC\IP properties must specify SOLELY
    that internal, dynamic DNS server (set.)
    3) DCs and even DNS servers are DNS clients too -- see #2
    4) If you have more than one Domain, every DNS server must
    be able to resolve ALL domains (either directly or indirectly)

    netdiag /fix

    ....or maybe:

    dcdiag /fix

    (Win2003 can do this from Support tools):
    nltest /dsregdns /server:DC-ServerNameGoesHere
    http://support.microsoft.com/kb/q260371/

    Ensure that DNS zones/domains are fully replicated to all DNS
    servers for that (internal) zone/domain.

    Also useful may be running DCDiag on each DC, sending the
    output to a text file, and searching for FAIL, ERROR, WARN.
     
    Herb Martin, May 5, 2005
    #12
  13. Josh-UCDHSC

    Josh-UCDHSC Guest

    Comments inserted below~
    Would you like to have a look at things? I could setup Remote Assistance...

    Yes. They are both holding the same exact zone.
    This is set correctly, other than waimea is in all capital letters in the
    System Control Panel. I took out the cudenver.edu to match the TCP/IP
    settings on the 2nd DC, which didn't have it. Network Load Balancing was
    checked on the 2nd DC so I unchecked it as it is not doing any network load
    balancing. It was interesting, when I had the cudenver.edu suffix entered
    running "nslookup waimea" about every second it would return
    "waimea.cudenver.edu" in the server field and the next time
    "waimea.coe.cudenver.edu". It would switch back and forth between the two.
    When I took the cudenver.edu suffix out, "nslookup waimea" only lists
    "waimea.coe.cudenver.edu" in the server field.
    Yes both DCs have the same zone set.
    No, I think the problem is with the version of dcdiag used. To generate
    this data I used MPSRPT_DirSvc.EXE instead of dcdiag.
    Using "dcdiag" doesn't show the problems.
    I will consider this after the semester is over at the end of next week.
    Running a regular dcdiag doesn't show the endpoint mapper problem.
    This server has 4 NICs. All but one are disabled.
    Are there any more advanced tools than dcdiag and the like? Is it advisable
    to run windump on a Win2k3 DC?
    No, not to my knowledge.
    This is the only current DC besides the 2nd active DC. There hasn't been
    any other other than a test domain with a different name (coe-test) which
    was running on a different machine and was shutdown months ago.
    Yep all of this checks out.

    Would you like to have a look at things? I could setup Remote Assistance,
    there could be something I am missing or don't understand. I could give you
    a call, my email is: if you want send me some contact
    info.
     
    Josh-UCDHSC, May 5, 2005
    #13
  14. Josh-UCDHSC

    Herb Martin Guest

    Assistance...

    I might, but unless you already have RA working then we are going to
    have to ditz around with firewalls/NATs probably.

    [I wrote the following lines last -- after interspersing comments all
    the way down.]

    Normally, I would say yes -- but I am running a little ragged right now.


    I usually offer to let people call so if you think it will help go ahead.

    But you seem pretty competent and if you have checked all of the DCDiag
    (or equivalent) stuff, and are sure about checking the DNS then that is
    all I would likely be doing.

    You might chase down that missing role holder (unless it was just a
    spurious RPC error.)

    Or the "REPAIR" install. You should back up first, but I have never
    hurt a machine doing that -- and yet to fail to recover one either. Repair
    install is the best kept secret in Windows these days.

    [...more inline...]
    Caps don't matter, DNS is not case sensitive and although NetBIOS
    is TECHNICALLY case sensitive, the machines always UPPERCASE
    their computer name, domain names and such.
    Load balancing only makes sense if you have at least two
    servers in NLB set.
    There is a later version of DCDiag at the MS site.

    I always use it.
    Only active ones with working IPs count. So that is good.
    Some people get weird problems with multiple NICs or multiple
    IPs being active.

    I can usually solve most any AD/DNS problem with DCDiag.

    You might look at ReplMon or RepAdmin though if you are having
    replication problems.

    Sometimes you have to use NTDSUtil to clean out "dead" servers,
    i.e., DCs that have died or been uninstalled.
    I was CONSIDERING that maybe the role had been seized but the
    other DCs hadn't replicated that info.

    Or a role was seized but the original role holder was still brought
    back online (never do that latter.)
    Ok, then you likely don't have the "seize" issue. That makes for weird and
    unpredictable problems that might fit your circumstances though.
     
    Herb Martin, May 5, 2005
    #14
  15. Josh-UCDHSC

    Josh-UCDHSC Guest

    It just happened again. This is so frustrating. You mentioned a repair
    install and described it as relatively safe. So I just boot off the Win2k3
    CD, do the repair and theoretically it should majically reboot to it's good
    old self? I will backup the entire system to tape with Veritas Backup Exec.
    I just want to be clear on what repair does and what the system state will
    be after a repair. In a perfect world, everything will be as it was before
    the repair except the core OS binaries that were corrupt or overwritten will
    be replaced/installed? Theoretically I won't loose any data, the AD, or any
    custom settings?

    When the system stops responding to clients and I am logged into the DC I
    can't start task manager with ctrl-alt-del or ctrl-shift-esc I have to do
    "start"->"run" and type taskmgr. Also, I tried to start an mmc.exe
    application, dnsmgmt.msc, while the server was zombied and it caused an
    "Application Hang" Event ID 1002, Catagory (101) and a fault bucket Event ID
    1001.

    "Hanging application mmc.exe, version 5.2.3790.0, hang module hungapp,
    version 0.0.0.0, hang address 0x00000000."


    I understand if you're busy, maybe we can work something out when you free
    up. RA should be relatively simple to get working, there is no NAT and the
    firewall is easy to shutdown :).

    Maybe I'll try to give you a call,

    JC

     
    Josh-UCDHSC, May 6, 2005
    #15
  16. I hope you don't mind if I jump in here, it sounds like it is getting
    serious if you want to run a repair on Windows. You can just as easily run
    setup as an in place upgrade from within Windows.
    But before you go through that post the results from a netdiag /debug

    --
    Best regards,
    Kevin D4 Dad Goodknecht Sr. [MVP]
    Hope This Helps
    ===================================
    When responding to posts, please "Reply to Group"
    via your newsreader so that others may learn and
    benefit from your issue, to respond directly to
    me remove the nospam. from my email address.
    ===================================
    http://www.lonestaramerica.com/
    ===================================
    Use Outlook Express?... Get OE_Quotefix:
    It will strip signature out and more
    http://home.in.tum.de/~jain/software/oe-quotefix/
    ===================================
    Keep a back up of your OE settings and folders
    with OEBackup:
    http://www.oehelp.com/OEBackup/Default.aspx
    ===================================
     
    Kevin D. Goodknecht Sr. [MVP], May 6, 2005
    #16
  17. Josh-UCDHSC

    Josh-UCDHSC Guest

    Kevin,

    I tried to send it as a zipped attachment at about 10K at is apparently too
    large. Is there an email address I can send it to?

    I was just on the phone with Microsoft again, this time they demoted the
    active directory zone on the Primary Server, deleted the zone on the
    secondary server, promoted the DNS to an AD integrated zone on the 1st DC
    and set up the secondary server again.

    Output from Windows 2003 Server, Enterprise Edition running on a Dell
    PowerEdge 2850.

    How do I run a setup as an in-place upgrade from within Windows?

    I have also scheduled a sfc for the next reboot.
     
    Josh-UCDHSC, May 6, 2005
    #17
  18. Josh-UCDHSC

    Herb Martin Guest

    I just want to be clear on what repair does and what the system state will
    You will first make a backup so it is entirely safe. <grin>

    And it is reasonably safe anyway.

    Boot from the original CDROM, choose install to the same location
    as the current OS and MAKE SURE it offers to (and you select)
    repair the system.

    It will leave apps and data in place but will repair the DLLs etc.

    Afterwards you should check with Windows Update to make sure
    the machine is fully patched.

    If your problem is due to corruption (we don't know this) then it
    will likely repair it.

    BTW, if you don't have a backup you should have one anyway so
    that isn't really an "extra" step.

    --
    Herb Martin, MCSE, MVP
    Accelerated MCSE
    http://www.LearnQuick.Com
    [phone number on web site]

     
    Herb Martin, May 6, 2005
    #18
  19. It posted I just looked it over, all records appear to be registered for
    this DC
    Put the CD in and choose upgrade.


    --
    Best regards,
    Kevin D4 Dad Goodknecht Sr. [MVP]
    Hope This Helps
    ===================================
    When responding to posts, please "Reply to Group"
    via your newsreader so that others may learn and
    benefit from your issue, to respond directly to
    me remove the nospam. from my email address.
    ===================================
    http://www.lonestaramerica.com/
    ===================================
    Use Outlook Express?... Get OE_Quotefix:
    It will strip signature out and more
    http://home.in.tum.de/~jain/software/oe-quotefix/
    ===================================
    Keep a back up of your OE settings and folders
    with OEBackup:
    http://www.oehelp.com/OEBackup/Default.aspx
    ===================================
     
    Kevin D. Goodknecht Sr. [MVP], May 6, 2005
    #19
  20. Josh-UCDHSC

    Josh-UCDHSC Guest

    Herb, thanks for the help. I backup the DCs and a few other servers using
    the grandfather-son-grandson backup scheme so not to worry on that, I hope.
    I use Backup Exec 10 and find it really easy to use.

    So far the server hasn't zombied since Thursday night. I'm guardedly
    optimistic that rebuilding the DNS fixed the problem. I will update the
    thread at the end of this week.

    JC

     
    Josh-UCDHSC, May 9, 2005
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.