High DPC Use and a Method to Reboot a network card?

Discussion in 'Windows Vista Drivers' started by IMH, May 3, 2006.

  1. IMH

    IMH Guest

    Hi All

    I am using a 3COM 3C2000 NIC to capture MPEG from proprietry hardware.
    After we have sent the NIC 2^32 packets of data the NIC or WinPCap which we
    are using to perform the capture becomes unstable, spending 50% of the CPU
    time servicing DPCs. This is very bad as on some machines these DPCs all
    schedule on the same processor (it's a hyperthreaded box) and this makes the
    machine unusable.

    I was hoping to be able to reboot the NIC in case that is the thing at fault
    but thus far I've had no success and cannot find anyt IOCTL commands that
    might achieve this for me.

    Can anyone help or advise on a newsgroup more appropriate for this subject?

    Thanks a lot
    Cheers
     
    IMH, May 3, 2006
    #1
    1. Advertisements

  2. IMH

    Bill Paul Guest

    I'm not sure exactly how to do it, but it sounds like you need to get
    the driver's MiniportHalt() and MiniportStart() routines to be invoked
    in order to stop and restart the NIC. You can't really 'reboot' the NIC.
    There is a MiniportReset() method, however I think you're only supposed
    to call that if MiniportCheckForHang() tells you the device has become
    wedged.

    The actual problem sounds like a driver bug. I don't know what chipset
    the 3c2000 card you have is using (I think it's a Marvell Yukon), but I
    suspect it has hardware statistics counters. Typically, when a counter
    reaches its wraparound value (and it's very likely the frame counters
    wrap at 32 bits), the chip will generate a 'statistics overflow' interrupt
    to notify the host of the counter wrap. Exactly how you cancel the
    interrupt varies depending on the chip. Generally you have to read all
    the stats registers, which automatically resets them to zero.

    It sounds like the card is generating a stats overflow interrupt, the
    ISR is acknowleding it and scheduling a DPC to handle the condition,
    but the interrupt handler is not clearing the stats counters, which causes
    the interrupt to fire again once the handler completes. This causes the
    handler DPC to be resumbitted by the ISR over and over again.

    Stopping and restarting the NIC should reinitialize it and clear the
    counters, however ideally you shouldn't have to do this. Also, it's very
    likely that doing this will cause the NIC to drop and reacquire the link,
    which may take several seconds. This could cause a significant disruption
    in traffic reception.

    I suggest you either search for an updated driver that has this bug fixed,
    or else switch to a different card which doesn't suffer from this problem.

    -Bill

    --
    =============================================================================
    -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu
    wpaul!#%removethis%^%#windriver.com | Wind River Systems
    =============================================================================
    <adamw> you're just BEGGING to face the moose
    =============================================================================
     
    Bill Paul, May 3, 2006
    #2
    1. Advertisements

  3. Are you saying you are sending 2^32 = ~4 billion *packets*, or are you
    talking about the amount of *data* sent?

    In either case, the amount of packets and/or data sent should of course
    never cause such (or any other) problems.

    Now what do you mean by "the NIC or WinPCap [..] becomes unstable"? I
    don't think any protocol driver has a DPR, so it's most likely the
    NIC's *driver* (not the NIC itself) that keeps spending time in DPCs.

    In any event, you should try to get an updated driver.

    A NIC driver cannot "reboot", but you can unload and reload it either
    manually in the network configuration dialogue or in device manager or
    by using SetupDi functions. For the latter see the DevCon sample source
    code in the DDK.

    Stephan
     
    Stephan Wolf [MVP], May 4, 2006
    #3
  4. IMH

    IMH Guest

    Hi Stephan

    Thanks for your input.

    We are sending UDP packets with a payload of 1028 bytes but I am talking
    about quantity of packets; we are recording CBR MPEG2 constantly and the
    system grinds to a half at a time which would coincide roughly with having
    received 2^32 packets.

    I am not familiar with how drivers work and so I don't know how the winpcap
    driver interacts with the NIC or whether it could influence time spent in a
    DPC.

    Are you saying that it is impossible for winpcap to influence the DPC time
    in any way? In which case this almost certainly appears to be driver
    related and I am talking with 3Com about this at present.

    I was figuring that the IC on the NIC performing the brains of the operation
    could be reset in some way (or so our hardware genius here suggests as it's
    the case with the ICs he is working with) so I hoped we could ask the driver
    to reboot it; clearly it's a little more complicated than that however. I'm
    already investigating the SetupDi commands to see what happens there...

    Thanks for your help with this
    Ian

     
    IMH, May 4, 2006
    #4
  5. IMH

    IMH Guest

    Hi Bill, Thanks for responding...
    Could the device be wedged? What happens when a device is wedged? Also, I
    see numerous methods that I'd like to call but I have to be running at a
    level lower than user level to run those.
    Is there something I could do via DeviceIOControl that would allow me to do
    this or would I need to know the correct IOCTL commands that might initiate
    such a read?
    Yes, my inital thought was that a DPC was being scheduled for the same event
    over and over again. However when I view the DPCs Queued/Sec during normal
    operation I see a peak of around 3000 when recording a combined bandwidth of
    30Mbits/sec. When the machine enters this high DPC Time we still appear to
    continue recording but windows becomes sluggish. If I view the DPC
    informaton remotely via the Windows Performance Counter interface I see DPC
    time of 50% and one of two cases; case 1, the DPCs are scheduled evenly on
    both processors but yielding a total DPC time of 50%, or case 2, the DPCs
    are ALL scheduled on 1 processor (it sits at 100%) but the other sits at 0%,
    yielding a total DPC time of 50%. In case 1 the machine is still usable,
    but quite poorly, in case 2 the machine is unusable but continues running
    for a time at least (perhaps until the next 2 ^ 32 packets have been
    received). Observing a machine in case 2 remotely via the WPC interface I
    see DPCs queued/sec has dropped from a peak of 4000 to a peak of around 100.
    I like the theory you are suggesting here but would this drop in DPCs
    queued/sec support this? I ask in all ignorance as I have no idea, having
    barely no experience at driver/hardware development.
    Tell me about it! :)
    Yes that might be a problem but it might be acceptable short term provided
    the device doesn't alter. I suspect we'd need to re-open the device through
    winpcap however?
    We are testing 2 alternative gigabit network cards at present and I am in
    communication with 3Com. There is no new driver for the card at present
    however.

    Thanks a lot for your input Bill

    Cheers
    IMH
     
    IMH, May 4, 2006
    #5
  6. IMH wrote:
    [..]
    Well, ok, protocol drivers like the one that comes with WinPCap usually
    don't have a DPR but they can actually run in the context of the
    miniport driver underneath them:

    DPC -> MiniportHandleInterrupt() -> NdisMIndicateReceivePacket() ->
    ProtocolReceivePacket() [in e.g. WinPCap]

    Since WinPCap is AFIK open source, you can probably download and
    inspect the source code here:

    http://www.winpcap.org
    Protocol drivers should usually do as little as possible in
    ProtocolReceivePacket(), which runs in DPC context (i.e. at IRQL =
    DISPATCH_LEVEL).
    Well, an NDIS protocol driver can call NdisReset(), which "forwards a
    reset request to an underlying driver."

    See the DDK docs for more information on the aforementioned
    NDIS-related functions:

    http://msdn.microsoft.com/library/
    Although that might help, I strongly recommend you try to eliminate the
    actual cause of the problem.

    Stephan
     
    Stephan Wolf [MVP], May 4, 2006
    #6
  7. IMH

    IMH Guest

    Hi Stephan

    I've been looking through the source and I cannot find the methods you
    mention. I'll post to the WinPCap community to see if there is a known
    issue...

    We've replicated the same thing on a DLink card which "appears" to have a
    different chipset, so I am now wondering if it might be a Windows 2000 issue
    or perhaps even to do with the IBM boxes we are using!?

    Or... is it really likely that two NIC manufactureres have fallen foul
    problem!?

    Any thoughts would be greatly appreciated...

    Cheers
    Ian

     
    IMH, May 5, 2006
    #7
  8. IMH

    IMH Guest

    Hi Steph, Bill

    Is there any chance this could be an OS issue!?

    Cheers
    Ian
     
    IMH, May 5, 2006
    #8
  9. There are only very few manufacturers of Gigabit Ethernet chipset.
    Thus, most GigE cards out there are OEM products that use the drivers
    provided by the respective manufacturer.

    Drivers from different vendors thus usually show the same behaviour for
    the same chipset (i.e. Marvell, Broadcom, whatever).

    Stephan
     
    Stephan Wolf [MVP], May 5, 2006
    #9
  10. IMH

    Bill Paul Guest

    No, 'wedged' usually means it stops transmitting and/or receiving entirely.
    As for the miniport methods, I'm not quite sure how to invoke them (I'm
    not really a Windows expert). I suspect you can do it via DeviceIOControl(),
    but offhand I don't know now.
    You can't read the statistics via DeviceIOControl() unless there's a custom
    driver routine to do it (and if there were, Marvell would never tell you
    about it). You have to read the chip registers to do it. Again, the driver
    is supposed to do this automatically for you, however it's possible whoever
    wrote it forgot, and if the frame counters only wrap after 2^32
    packets (which is a "long time") they might not have noticed it.
    It depends. :) The one way to be sure is to swap in a network card from
    another manufacturer and see if the problem persists. The fact the problem
    occurs after 2^32 packets have been received is somewhat telling: it
    points to 32-bit overflow problem somewhere. But since I'm way over
    here and can't observe the problem first hand, I can't really make a
    more accurate determination.
    Tell Marvell about it. It's their chip and driver.
    I don't know if stopping and restarting the interface would invalidate
    your winpcap handle. I don't think it would.
    So... do the other cards exhibit the same problem? If they do, then
    it's some sort of software issue. If they don't, then the problem is
    specific to that particular card and driver.

    Note that 3Com is unlikely to be much help. I don't think they actually
    manufacture any of their own NIC silicon anymore: everything is
    OEM'ed from other companies.

    -Bill

    =============================================================================
    -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu
    wpaul!#%removethis%^%#windriver.com | Wind River Systems
    =============================================================================
    <adamw> you're just BEGGING to face the moose
    =============================================================================
     
    Bill Paul, May 6, 2006
    #10
  11. IMH

    Bill Paul Guest

    *sigh*

    You know we can't see the card from here, right? So if all you tell
    us is that it's "a DLink card" that's not going to tell us a damn thing
    about what kind of chip it really has on it.

    You could have at least told us the model number.
    It's very likely. Was it this card?

    http://www.newegg.com/Product/ShowImage.asp?image=33-127-134-05.JPG

    If so, then congratulations: all you did was buy another Marvell-based
    NIC, so you haven't really tested anything new at all, and you've been
    wasting your time. Don't you love the retail computer market? I don't.
    I hate it with a passion. I hate it because it's specifically geared
    towards people like you, you can't tell one card from another anyway.

    Go and get a TOTALLY DIFFERENT card. With a DIFFERENT CHIPSET. If you
    can't tell what chipset it is, then ask someone to help you find out.
    Here's a hint: DON'T GET ONE WITH A BIG GIANT LETTER 'M' ON IT!
    Better still, learn how to check the PCI vendor/device ID.

    Get one of these instead:

    http://www.newegg.com/Product/Product.asp?Item=N82E16833106123

    Or one of these:

    http://www.newegg.com/Product/Product.asp?Item=N82E16833156139

    Or one of these:

    http://www.newegg.com/Product/Product.asp?Item=N82E16833181009

    I'll bet you a quarter the problem doesn't show up with these. And
    next time you write, tell us EXACTLY which one you got so we don't
    have to guess.
    You really don't want to know what I'm thinking right now.

    -Bill

    =============================================================================
    -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu
    [email protected]#$(%($windriver###!#com | Wind River Systems
    =============================================================================
    "Ignorance may be bliss, but delusion is ecstasy!" -Perki
    =============================================================================
     
    Bill Paul, May 6, 2006
    #11
  12. Go and get a TOTALLY DIFFERENT card. With a DIFFERENT CHIPSET. If

    Intel produces decent and cheap GigE cards and chips.
     
    Maxim S. Shatskih, May 6, 2006
    #12
  13. IMH

    IMH Guest

    *shamed face*

    My manager bought the cards in a panic and I guess I should have told him to
    specifically check the chip manufacturer. I'll post the model of the DLink
    and go through the links you posted and ensure we test with wildly different
    cards. 3COM are actually being very helpful thus far (but I neglected to
    inform their tech a second card was producing the same results! If said tech
    reads this, please ignore this post!!).

    Thanks to all for their input...

    :)

    IMH
     
    IMH, May 6, 2006
    #13
  14. [off-topic mode on]

    This guy is programming an application that uses information provided
    by a network card/driver. He is apparently not (very) familiar with
    driver programming or hardware details. Much like I can drive my car
    without having to know anything about the internals of my car's
    machine.

    The whole spirit and purpose of forums like this one is to help each
    other, not SHOUT or make fun of someone elses perplexity.

    Thanks, Stephan
    [off-topic mode off]
     
    Stephan Wolf [MVP], May 6, 2006
    #14
  15. IMH

    Bill Paul Guest

    I'm sorry, but you're wrong.

    I have to deal with support issues more often than I'd like, and my
    number one peeve is customers (or even support staff) who don't pay
    enough attention to detail. I'm not talking about pushing them out
    of their limited realm of expertise, either: I'm talking about really
    simple stuff. Like, what OS version is the customer using? What
    CPU are they running it on? What kind of networking adapter(s) do they
    have? Either this information has an immediate bearing on the problem
    at hand, or I'm going to need it later anyway when I sit down to try
    to reproduce it. You'd think people would know by now to provide
    all of this information up front, but trust me, they don't.

    Now.

    You may not be required to know how your car works in order to drive
    it, but when it breaks, you _ARE_ required to pay enough attention to
    detail so that when you take it to the mechanic to fix it, you can
    provide an accurate description of what's wrong. If all you say is
    "sometimes the engine makes a funny noise and belches smoke," it could
    take the mechanic weeks to figure out what's causing the problem. It's
    possible he might not be able to reproduce it at all. "Oh," you say,
    when he tells you he can't find anything wrong, "I forgot to mention:
    it only happens when I go past 50 mph. Does that matter?" At this point,
    you'll be lucky if all the mechanic does is shout at you.

    This is not a question of being unfamiliar with the technology. It's a
    question of alertness and attention to detail. In some cases, sadly,
    it can even be a question of reading comprehension. There is a vast
    difference between saying "I tried a different card, and the problem
    persists" and saying "I tried card model Blah from Vendor Foo and
    the problem persists." He was alert enough to provide the make and
    model of the first card, but dropped the ball when it came to positively
    identifying the second. (And I don't care how screwed up the retail NIC
    market is: you're never allowed to identify a device just by saying
    "it's a DLink card." When you go to the shelf and see a huge pile of
    DLink products staring back at you, it should become immediately obvious
    that they don't make just one card.)

    If the idea here is to help each other, everyone has a responsibility
    to he equally helpful. The people who ask questions have to be every bit
    as attentive and conscientious as the people who try to answer them.
    And I'm not going to coddle people who choose to abdicate that
    responsibility.

    -Bill

    --
    =============================================================================
    -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu
    [email protected]#$(%($windriver###!#com | Wind River Systems
    =============================================================================
    "Ignorance may be bliss, but delusion is ecstasy!" -Perki
    =============================================================================
     
    Bill Paul, May 8, 2006
    #15
  16. IMH

    IMH Guest

    I didn't even read all that...

    I'm grateful for the help, but if you're going to be like that please don't
    bother. I am not one of your customers. I am a peer and I expect there is
    knowledge I have you might need one day, and even if that isn't the case, it
    will be the case for someone out there. Do you want *them* to poke fun at
    your ignorance? You know stuff; that's why I'm asking people in this forum.
    I don't think it's the place to get snotty because I don't know to post more
    specifics. If I knew, would I be asking?

    I think Stephan's words (and thank him for them) are very true. I know
    very, very little about driver writing. I know a lot about other things;
    such is the way of the world. In my ignorance I have come here and asked for
    some help.

    And to continue the car anaolgy my mechanic doesn't take the p*ss out of me
    when I describe the fault badly, he asks *politely* to see the car... you
    might have done the same with regard to the NIC model numbers, chipsets,
    etc... Perhaps even sagely informed me that I might be comparing apples and
    apples and as such might discover them to both be... erm apples.

    You have knowledge, clearly on this subject but *maybe* you should use it to
    help unconditionally, instead of vaunt your knowledge over less informed
    individuals from some lofty place. If helping annoys or frustrates you (as
    clearly your job appears to do) then don't bother; I think you'd be happier
    for it by the sounds of your opening rant.

    All that said, thanks for your help you have been very helpful. It's a
    terrible shame you had to react so badly to the quite well deserved wrist
    slap you received from Stephan.

    Kind Regards
    Ian

    p.s. I'm sorry this is off topic and please feel free to flame on in return.
    I've said my bit in response to your post and if you rant away at whoever
    then you will merely be reinforcing the above; I won't need to add a thing.
    Thanks again for your help.

     
    IMH, May 8, 2006
    #16
  17. IMH

    IMH Guest

    Sadly I just read what you wrote and I must add two things and then my mouth
    reall will stay shut:

    1. Get another job
    2. Please re-read your post as if you were me reading it, if you don't
    feel ashamed then while getting your new job, get yourself a new personality

    Regards
    Ian

     
    IMH, May 8, 2006
    #17
  18. IMH

    Bill Paul Guest

    You don't get the point. It's not about ignorance. I'm not expecting
    you to know all about driver internals or the various and features of
    gigabit ethernet chipsets. I'm not calling you an idiot for not being
    an expert in the field of networking. (If anything, *I'm* the idiot
    for becoming an expert. I mean, look at what it's got me.) What I *am*
    saying is that since you're not an expert, it's vital that you preserve
    as many details as possible when consulting the experts so that you
    don't accidentally send them down a blind alley.
    The whole car/mechanic analogy doesn't really work well in this case
    anyway, because you have to actually bring the car to the mechanic, which
    gives him a chance to analyze it first hand. That makes finding the
    problem much easier. But that's often not how it works with troubleshooting
    computer problems, which is frequently done via e-mail, or telephone, or
    forum postings. In those cases, you're dependent on other people to relay
    information to you because you never have the opportunity to see the
    failing system, and that complicates matters significantly.

    Again, this is not about what you know. It's about how you think.
    I don't expect you to know that the 3c2000 and the D-Link 530T both
    use a Marvell chipset, and that Marvell uses the same driver code base
    for both. However I do expect you to actually say exactly what the
    new card is, so that someone who does know can tell you: "hey, just
    a minute: you're not really testing what you think you're testing."

    When you thought to yourself "hmm, I switched the card, and I still
    see the same problem," you drew a conclusion based on raw data. Your
    next step should have been to relay that exact same raw data here, so
    that others could see it and draw their own conclusions. This is what
    I try to do when I find myself in the same position. There's plenty
    of crap I don't know, and when I'm stuck, I gather up all the data I
    have and try to seek out someone smarter than me, and I try to present
    as much context as I can about the problem. This is both for their
    benefit and mine, because it forces me to *think*. I remind myself that
    the people I'm about to pester haven't been watching over my shoulder,
    which means I have to describe the problem with as much background and
    detail as I can, otherwise they'll just be confused. In the process,
    I usually have to go back and re-examine the problem from the
    beginning, with a slightly different perspective, and sometimes I'll
    discover something I overlooked the first time around, which turns out
    to be the missing piece of the puzzle. If end up being that lucky
    (and yes, a large part of this is more luck than skill), I won't have
    to pester anyone at all: I'll have solved the problem on my own. If not,
    at least I've done my homework.

    Now, the question is, should I have to tell you that you need to
    provide these details? Is the model of the card such an esoteric and
    obscure piece of information that I have to take special steps to
    insure that you don't forget to tell it to me?

    I'm not really certain, but my opinion is: I don't think so. On another
    day, I might not have remembered that D-Link also sells gigE cards with
    Marvell chipsets. But given the part number, I would have gone out and
    looked it up and realized you'd just exchanged one Marvell card for
    another: I wouldn't have needed to rely on my sometimes faulty memory.
    And this information is not esoteric or obscure: it's written on the
    box and on the card. You may not know about network controllers,
    but you do know how to read. Should I have to remind you to read?
    I don't have a choice in the matter. As you say, I know stuff, and they
    pay me to use what I know. It used to be, when I was young and stupid,
    I would try to answer questions cheerfully and politely no matter how
    simple or silly or they seemed, even when I didn't get paid for it. I
    tried to solve everyone's problems in spite of the lack of detail they
    would provide, thinking that after a while people would learn to ask new
    and more intelligent questions. Except it didn't work out that way. I kept
    getting asked the same questions over and over. Year after year. I'd get
    the same vague, detail-free problem reports and have to go through the
    same tedious information extraction process to get to the bottom of
    things.

    Exactly how polite am I supposed to be? Should I give everyone a pass
    and encourage this steady errosion of common sense, or should I remind
    people not to waste that incredible chunk of grey matter between their
    ears? Thinking is one of the most amazing things we can do as human
    beings. It's a tremendous gift. All too often though, I see it being
    squandered. And it bugs me.

    Can you honestly say it wouldn't bug you too?
    I'm not flaming you, but I'm also not letting you get away that easily.
    If you choose not to give due consideration to what I've said, then the
    only thing being reinforced is your desire not to learn. Harsh or not,
    I'm not wrong, no matter how much you might like me to be.

    The only thing that disappoints me about all this is that now I'll
    probably never find out if I was right about the damn Marvell driver
    being the cause of your problem.

    -Bill
     
    Bill Paul, May 8, 2006
    #18
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.