Strange networking problem

Discussion in 'Windows Vista Talk' started by Speed Dial, Mar 17, 2010.

  1. Speed Dial

    Speed Dial Guest

    From time to time, my copy of Vista seems to go into a "spasming" state
    which affects networking. When in this state, the following symptoms
    recur frequently, some of them at intervals of roughly five minutes.
    (One time, the first symptom group kept recurring like clockwork almost
    exactly every 2 minutes.) The state itself lasts hours; its absence also
    tends to last hours and sometimes days. It can be induced by restarting
    either the router (a BEFSR41) or the computer. I know of no way to
    induce its absence.

    Symptom group 1 recurs in a particular sequence when this state exists:
    * First then mouse pointer briefly shows a busy-cursor-with-arrow, even
    if the user and all applications are (supposed to be) idle.
    * About five seconds later, the networking icon in the tray loses its
    globe, usually replacing it with a red X and occasionally with a
    yellow ! in a triangle or with nothing.
    * The pointer indicates "busy" again but more briefly about a second
    later.
    * The networking icon returns to normal about a second later.
    This full sequence takes less than ten seconds and recurs every few
    minutes, sometimes less frequently, sometimes as often as every two
    minutes and (in that case) with clockwork regularity.

    The other symptom groups involve applications that use networking. As a
    rule these symptoms involve various intermittent or sporadic erroneous
    behaviors by these applications, some of which could be explained by
    network timeouts or similar effects from a hypothetical brief network
    outage but some of which cannot.

    The timing of these symptoms generally does NOT correlate with the
    timing of the events in symptom group 1; e.g. Firefox may fail to load
    pages while the tray icon shows normal connectivity, and may succeed
    while the tray icon shows a red X (request initiated by mouse click and
    web page displayed by browser all during the two-second duration of the
    red X). This and some symptoms not being explained by loss of network
    response cast doubt on any theory where this is simply the network
    connection itself spasming. The fact that this state can be induced
    intentionally by resetting the router or computer (without touching the
    DSL modem) also points away from a simple network flakiness explanation,
    as does the fact that the modem lights do not exhibit any symptoms
    whatsoever. The problem thus seems to exist closer to the computer than
    the modem, or at a higher level of the networking stack than the wiring
    or PPPoE protocol (so, the IP layer or higher), or both.

    In the below symptom groups, symptoms marked with a (!) at the end
    cannot be explained solely by inability of the application to contact a
    remote host. If it's (!!) the application's behavior seems to be
    patently incorrect no matter what -- i.e., it looks like an application
    bug is involved, not just the networking, router, and/or operating
    system behaving flakily.

    Symptom group 2 involves Firefox:

    * Pages may intermittently fail to load normally, load only part way,
    timeout, or similarly, and hostname lookup may hang or fail (hanging
    being particularly common).
    * The browser may ignore a link click, or spin for a bit but then stop
    with "Done" displayed in the status bar without having gone
    anywhere.(!!)
    * The browser may display "This page cannot be displayed because the
    browser is in Offline Mode. Select "Work Offline" from the File menu."
    or words to that effect; the user did not activate offline mode and if
    the file menu is dropped down, there is no checkmark by "Work
    Offline". Furthermore, again without user intervention the browser
    will behave as if offline mode were toggled off again after roughly
    five seconds.(!!)
    (Clearly an application bug involved here; the state of the check on
    the menu item and whether it considers itself to be in offline mode
    for the purpose of page retrieval should never get out of synch, nor
    should the mode toggle without user input. On the other hand something
    external is clearly the trigger, since FF only does this when Vista/my
    network/whatever is in the state being described by this news post.)
    * There are problems with specific web sites above and beyond the above.
    For example, some sites, including Sourceforge, will log me out every
    five minutes like clockwork while my computer is in this state. This
    suggests cookie destruction is occurring, but it's selective to
    particular sites' cookies. No cookie blocking addons or software
    are installed on this system, and Firefox's cookie policy is left
    default; furthermore, this only occurs during the "spasming state" of
    the machine/network.(!)
    * At the same time as the red-X spasms mentioned in Symptom Group 1,
    Firefox may briefly (for a few seconds) consume up to 25% CPU.(!!)
    The other FF symptoms either don't correlate with SG1 at all, or
    correlate but not in a way where both can be explained simply as a
    brief loss of connectivity -- in particular, the "premature Done"
    error tends to be followed by the spurious "offline mode" toggle,
    THEN by the symptoms of SG1, with "offline mode" toggling back off
    AND normal connectivity (web pages successfully retrieved) before SG1
    subsides. The timing is wrong for it to be simple connectivity loss
    causing it, where the red X should coincide, rather than follow,
    a period of web pages being unretrievable (and the browser should
    simply display timeout errors, not any of the other symptoms!).

    Symptom group 3 involves Thunderbird:

    * Attempting to view a news post may hang with the throbber going and
    no progress being made.
    * When the above occurs, the "Stop" button may fail to abort the
    connection so as to enable a retry; Thunderbird then has to be closed
    and restarted.(!!)
    * News posts may hang while sending.
    * These posts are eventually reported as successful, but never appear
    on the server.(!)
    * Canceling a news post send that's hung and then manually recreating
    the post (aided by copy/paste) and resubmitting tends to get
    interrupted by a pair of dialogs complaining about some error moving
    something to the "sent" folder (despite the fact that the user is
    currently editing, not sending, a news post).(!!)
    * Thunderbird may hang with unstoppable, no-results network activity
    on startup and have to be restarted three or four times (stop
    button doesn't work).(!!)

    Symptom group 4 involves the BEFSR41:

    * During one of these seizures, the BEFSR41 may stop functioning
    correctly in a manner requiring it to be powered down and back
    up again.(!!) The symptoms of this are:
    * No network connectivity behind the router.
    * Router may display normal status (connected, normal IP/DNS/etc.
    data) over its web interface, or said interface may become
    unreachable.
    * Using the web interface (if functional) to Disconnect and then
    Connect rarely fixes the problem; usually causes the web
    interface to hang when Connect is clicked with no restoration
    of connectivity.
    * Windows shows a networking icon with no X or triangle or globe
    once this has occurred. Power cycling the BEFSR41 invariably
    fixes it, but I suspect that it prolongs, just as it may
    cause to begin with, the spasming.

    Other applications that use the network may display symptoms of
    intermittent connectivity, but by and large do not display any symptoms
    not explainable solely by their connectivity being intermittent.

    There's an XP box behind the same router and it gets affected at the
    same times, but in a milder way: all network-using applications behave
    as if connectivity is intermittent, and "Local Area Connection is now
    connected" balloons pop up randomly from time to time from the tray,
    probably XP's equivalent of Symptom Group 1.

    My suspicion at this time is that the root problem lies with the BEFSR41
    but has symptoms elsewhere.

    1. That the BEFSR41 gets affected (Symptom Group 4) and the XP box gets
    affected proves the problem is not confined to the Vista box; the
    lack of any abnormality with the DSL modem strongly implies that it
    isn't sited further away, either.
    Even if it is sited further away (likely at the DSL hub) the router's
    reaction to it indicates bugs in the router.
    2. Symptom Groups 2 and 3 indicate that there are a plethora of bugs and
    warty behaviors in Mozilla's products that are provoked by
    intermittent connectivity. The cookie crunchage may, however, be the
    operating system's fault, since the OS can obviously delete files,
    though it seems unlikely given the specific targeting of not only
    cookies, but non-Internet Explorer cookies *from particular web
    sites*.
    3. Symptom Group 1 is inconsistent, however, with simple brief
    connectivity losses. Such would cause the globe icon to come and go
    only, and should not cause busy cursors. Furthermore, I occasionally
    get connectivity losses unrelated to these "spasm states"; in these, the
    DSL modem's external-connectivity light goes off for a while. The
    following symptoms occur:

    * Globe icon vanishes from tray for the duration.
    * All network-using apps act like connectivity has been lost.
    * TB may exhibit any of the bugs in Symptom Group 3, particularly stop
    button failures and startup failures, at onset; this suggests that
    SG3 is due to bugs in TB triggered by any loss of connectivity.
    * FF does NOT display any of the bugs in SG2 other than the simple
    inability to load web pages. In particular none of the bugs marked
    with a (!) or (!!) occurs. It does not spontaneously toggle "offline
    mode", it does not claim a page load is "done" with neither a page
    load nor an error message having occurred, and it does not lose any
    login cookies or any other cookies.
    This suggests that the bugs in FF are NOT provoked by any old loss
    of connectivity; there's something "special" about these "spasms".

    This points back to the BEFSR41. Let's take one of the SG1 symptoms at
    face value: the red X indicating that there's no even *local* network
    for a short time. This suggests that the BEFSR41 is in some kind of
    failure mode independently of whether it has lost its connection
    upstream. No problem upstream (e.g. at my ISP, or with the DSL modem)
    should cause the BEFSR41 to stop functioning even as a local router
    between the XP box and the Vista box. Perhaps a problem in one of those
    two places does exist and is the trigger, but what it triggers is a
    buggy behavior in the router, not merely a loss in connectivity, and
    moreover loss of connectivity is not by itself sufficient to cause the
    buggy behavior in the router.

    When the router does exhibit the buggy behavior, my theory is that it
    triggers some buggy behavior in Vista and in Firefox, in turn. Probably
    the first domino is Vista deciding there's no network connection at all
    rather than one that temporarily isn't working, and doing some kind of
    busy work (busy cursor) to unload (and later load again) a bunch of
    drivers or something. Plus it tells all applications that the network
    doesn't even exist anymore, rather than merely is down. This in turn I
    theorize triggers the buggy behaviors in FF that don't occur with any
    old loss in connectivity.

    So we have bugs in Thunderbird caused by any loss in connectivity, bugs
    in Firefox caused by the OS telling FF the network's been physically
    unplugged (even if you grant that aborting a page load without any error
    message, or spontaneously toggling on "offline mode", is a legitimate
    response to such, a) not keeping the "offline mode" menu item checkmark
    in synch with the actual mode state is a bug and b) crunching cookies in
    response is a bug, and chewing 25% CPU for 2 seconds whenever this
    happens and even if idle at the time is questionable at best), bugs in
    the router, and an unknown cause.

    There are four plausible locations for the cause.

    1. The router itself. Since it exhibits indisputably wrong behavior at
    times, the simplest explanation is this one. The router sometimes
    gets into a wonky state (causing SG1-3), which may resolve after a
    while or may cause it ultimately to hang (SG4). It starts up in this
    state and possibly rebooting the computer (either computer?) sends
    some signal down the line (a DHCP request?) to it that may trigger
    it.
    That it starts up in this state points away from overheating being
    the cause, as does its being well ventilated, with the holes in its
    case unobstructed, and the fact that this has been observed to happen
    when the room ambient temp was only 18 deg C.
    One candidate trigger is the DHCP host in the router. If the router
    got kicked into the spasming state by its DHCP requesting a new IP
    address from my ISP's network, then spasming would be triggered by
    any reboot of the router AND whenever the DHCP lease from my ISP
    expired. I've several times witnessed spastic behavior start right at
    7 in the morning on the dot; perhaps my ISP's DHCP leases all expire
    at that time of day.
    So my #1 suspect site for the bug is the router's DHCP host
    functionality.
    2. The DSL modem. This is the third device directly connected to the
    router. Weighing against this is the absence of any overt symptoms
    involving the modem itself, such as changes in its status lights.
    3. My ISP's network/config. Perhaps something the ISP does from time
    to time (some ICMP message involved in keeping their network
    running and keeping track of connected endpoints, so they can free
    up an IP address early if a user shuts their computer down or
    whatnot?) triggers a bug in the BEFSR41.
    4. Something further out there; there's some type of network packet that
    any random Internet host can send that sends BEFSR41s into a spastic
    state for hours. (This is unlikely, but scary if true; it means the
    routers have a moderately nasty denial-of-service vulnerability.)

    Does anyone here have any insight? Perhaps some of the symptoms can be
    mitigated, e.g. some Vista setting that prevents it from ever treating
    the network as actually gone rather than just temporarily down?

    If anyone here knows of a way to fix a BEFSR41 to be immune to this,
    that would of course be even better.
     
    Speed Dial, Mar 17, 2010
    #1
    1. Advertisements

  2. Speed Dial

    Speed Dial Guest

    Well?
     
    Speed Dial, Mar 18, 2010
    #2
    1. Advertisements

  3. Speed Dial

    Speed Dial Guest

    Check the crosspost list.
     
    Speed Dial, Mar 18, 2010
    #3
  4. Speed Dial

    Extravagan Guest

    I guess I need to spell it out explicitly for you.

    I selected TWO Vista newsgroups to post to. I found one in microsoft.*
    and posted to that so my post would appear where Microsoft was most
    likely to notice it, and additionally I posted to a Vista newsgroup in
    alt.* because that one seemed to be the highest-traffic Vista newsgroup.

    If you're meaning to suggest that there's a third newsgroup that would
    be a good place to post it, then please stop beating around the bush and
    simply name the newsgroup. Then I'll see if my server carries it, and if
    so post a copy of my original post to that group.

    If you have no intention of being constructive, EITHER with regard to
    the original post's queries OR by naming a newsgroup where you think
    they would get more attention (of a constructive kind!), then please
    don't waste time and bandwidth by posting again to this thread.
     
    Extravagan, Mar 19, 2010
    #4
  5. Speed Dial

    Extravagan Guest

    I guess I need to spell it out explicitly for you.

    I selected TWO Vista newsgroups to post to. I found one in microsoft.*
    and posted to that so my post would appear where Microsoft was most
    likely to notice it, and additionally I posted to a Vista newsgroup in
    alt.* because that one seemed to be the highest-traffic Vista newsgroup.

    If you're meaning to suggest that there's a third newsgroup that would
    be a good place to post it, then please stop beating around the bush and
    simply name the newsgroup. Then I'll see if my server carries it, and if
    so post a copy of my original post to that group.

    If you have no intention of being constructive, EITHER with regard to
    the original post's queries OR by naming a newsgroup where you think
    they would get more attention (of a constructive kind!), then please
    don't waste time and bandwidth by posting again to this thread.
     
    Extravagan, Mar 19, 2010
    #5
  6. Speed Dial

    Extravagan Guest

    Then why were you rude? Twice?
    I don't notice any newsgroup names, real information about the problem,
    or silence from you. Shouldn't there have been at least one of those three?
     
    Extravagan, Mar 19, 2010
    #6
  7. Speed Dial

    Extravagan Guest

    Sure there is, unless you for some reason believe the problem can never
    be solved by anyone other than Microsoft.

    Perhaps there's a newsgroup focused on Linksys routers?
    How fortunate then that I didn't do so -- I merely ASKED you to EITHER
    be silent OR do one of two other things (none of which you have so far
    done).
    I am, by attempting to elicit a more constructive post from you. However
    I begin to suspect that that, sadly, isn't going to happen.
     
    Extravagan, Mar 19, 2010
    #7
  8. Speed Dial

    Speed Dial Guest

    Updating the firmware on the BEFSR41 appears to have fixed it. If anyone
    suggested that amid all the flaming between Speed Dial and whoever else,
    thanks. :p
     
    Speed Dial, Apr 2, 2010
    #8
  9. Speed Dial

    John Raser Guest

    I read newsgroups to understand more about Windows Vista Professional. From
    the discussion, it seemed to be a hardware problem. That was cleared up, and
    I'm glad it was. Vista seems overall to have many peculiar behaviors. e.g.,
    taking eight minutes to launch Internet Explorer (that happened once). I
    feel that the more information I can get the better; newsgroups are a goods
    means to that end.
     
    John Raser, Jun 29, 2010
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.