zWwriteFile CPU DMA usage vs Device DMA usage

Discussion in 'Windows Vista Drivers' started by Prafulla, May 5, 2009.

  1. Prafulla

    Prafulla Guest

    Hi,

    I am using zwWriteFile for writing the data into Files in kernel mode. My
    device receives the data from external sources and i would require to write
    them into files.
    I am using LSI Megaraid card and connected to multiple harddisks(Not primary
    disks, just for storage). Also installed the LSI Megaraid driver for this.

    I observed that, while reading/writing the data into these files, CPU usage
    reached 100% till the data transfer complete. I expected that LSI Megaraid
    DMA controllers should be responsible for data transfer, instead, it using
    the CPU resources for data transfer. If CPU usage reached 100%, is that
    indicates that, even CPU controller DMA also not in use?
    FILE_FLAG_NO_BUFFERING also used.

    When the data transferred using this method, I thought the flow will be file
    system driver ==>SCSI class driver ==>port driver ==>Miniport driver(LSI).

    Can we set anyway for using LSI DMA resources for data transfer, so that
    system resources will be free to do something else?

    Thanks,
    Prafulla
     
    Prafulla, May 5, 2009
    #1
    1. Advertisements

  2. Prafulla

    Don Burn Guest

    It does not nessecarily mean anything about the controller and DMA. You are
    calling the file system and saying do this action now (thanks to the no
    buffering) there is a lot of things that have to be done before you get to
    the transfer. Your model is incomplete:

    file system fiters ==> file system ==> volume manager ==> disk driver ==>
    SCSI port driver ==> miniport driver


    --
    Don Burn (MVP, Windows DDK)
    Windows Filesystem and Driver Consulting
    Website: http://www.windrvr.com
    Blog: http://msmvps.com/blogs/WinDrvr
    Remove StopSpam to reply





    __________ Information from ESET NOD32 Antivirus, version of virus signature database 4054 (20090505) __________

    The message was checked by ESET NOD32 Antivirus.

    http://www.eset.com
     
    Don Burn, May 5, 2009
    #2
    1. Advertisements

  3. If you copy files from/to a plain SATA drive to/from a Megaraid drive (using
    copy command or Explorer), does it also cause 100% CPU?
     
    Alexander Grigoriev, May 6, 2009
    #3
  4. Prafulla

    Prafulla Guest

    Hi Alexander,

    I verified the file copying from SATA to megaraid, Observed that CPU usage
    is not shooted up, but not clear whether processor DMA controller or Megaraid
    DMA controller was responsible for data transfer?

    And I also tried using zwWriteFile at driver level with RAM memory instead
    of Device memory copying to Megaraid volume, yes, during this case also, CPU
    usage is also not shooted up. I thought Processor DMA controller is in use
    here for copying the data b/w RAM to Megaraid disk.

    In my case, where i would like to copy from Device memory(PCI) to
    Megaraid(PCIe card) volume, Do you see any inputs of forcing to use the
    Megaraid DMA controller(LSI driver) instead of Processor usage/Processor DMA
    controller.

    Thanks,
    Kota
     
    Prafulla, May 6, 2009
    #4
  5. Prafulla

    Jakob Bohm Guest

    1. There is generally no "Processor DMA controller", what you may be
    seeing is ordinary driver or kernel code doing memcpy/RtlCopyMemory from
    one address to another. To find out where this happens, try using a
    debugger to single step into your ZwWriteFile() call as it calls
    NtWriteFile(), then various Fs and Io functions, then into NTFS.SYS,
    MegaRaid.SYS etc. etc.

    2. ZwWriteFile() like all the Zw() functions are really designed to deal
    with application memory, thus the logic inside ZwWriteFile() may not be
    well optimized for hardware to hardware transfers. If this is your
    problem, using lower level calls (such as submitting your own IRPs to
    the file system) may allow you to pass the device memory all the way to
    the MegaRaid DMA code.

    3. Some motherboard chipset designs may have a limitation that prevents
    the Bus Master (DMA) functionality on one PCI card (the MegaRaid) from
    directly accessing RAM located on another PCI card. Also, there may be
    some limitation in the NT kernel or HAL enforcing this limitation even
    if it does not exist in the hardware. If one of these is your problem,
    you may be out of luck.


    --
    Jakob Bøhm, M.Sc.Eng. * * direct tel:+45-45-90-25-33
    Netop Solutions A/S * Bregnerodvej 127 * DK-3460 Birkerod * DENMARK
    http://www.netop.com * tel:+45-45-90-25-25 * fax:+45-45-90-25-26
    Information in this mail is hasty, not binding and may not be right.
    Information in this posting may not be the official position of Netop
    Solutions A/S, only the personal opinions of the author.
     
    Jakob Bohm, May 6, 2009
    #5
  6. Prafulla

    Prafulla Guest

    Hi Jakob,

    Thanks for your detailed mail, As you mentioned if it is case 3, I may be in
    trouble, but, i would like to try your suggestion about case 2.
    Sending the IRP to File system(Is it File system driver?).

    I also established remoteIoTarget communication at kernel b/w device1(PCI
    device) to device2(Megaraid driver - Miniport driver). I dont see this is
    useful, because miniport driver hooks to upper level driver and no direct
    IRP's it supports. I also have access to miniport driver code.

    I guess you're referred to sending the requests from PCI device to File
    system driver, if so, could you please explain more, which driver is this(Is
    it NTFS.sys)?, How can i get the device object of this NTFS.sys and how can i
    direct my requests to the megaraid drive because, NTFS.sys may be general to
    all of the drives.
    I would appreciate, if you could point to any links/documenation.

    Thanks,
    Prafulla
     
    Prafulla, May 6, 2009
    #6
  7. Prafulla

    Don Burn Guest

    You are writing files, so it has to go through a file system in this case
    NTFS. Not only that you are expanding files which means you are having to
    have blocks allocated and depending on the blocing of the data you are using
    they are zero'd then your data copied into to part of them.

    So a few specific questions:

    1. How big of records are you writing at a time? Can you make those
    records 4K bytes, which is likely to be faster.

    2. Do you know how much data you will be writing? Or can you waste a lot
    of disk space? If so preallocate the file before you start uour writes.

    3. How critical is this? There are more complex approaches but one really
    wants to avoid them unless absolutely nessecary.

    I would not try the walk through the Zw calls to the disk driver, it is huge
    and confusing.

    --
    Don Burn (MVP, Windows DDK)
    Windows Filesystem and Driver Consulting
    Website: http://www.windrvr.com
    Blog: http://msmvps.com/blogs/WinDrvr
    Remove StopSpam to reply





    __________ Information from ESET NOD32 Antivirus, version of virus signature database 4055 (20090506) __________

    The message was checked by ESET NOD32 Antivirus.

    http://www.eset.com
     
    Don Burn, May 6, 2009
    #7
  8. Prafulla

    Prafulla Guest

    Hi Don,

    Please see the answers related to your queries.

    -------Not really, because, Analog data comes from device can last for few
    hours.
    If i transfer the data PCI device ==>Host==>PCI device, same data is
    transferred twice on the Bus(upload & download), instead, if i can establish
    device to device memory through 2nd Device DMA controller, i thought, this
    approach would meet my data rate requirement and also bus bandwidth can be 1x
    instead of 2x.
     
    Prafulla, May 6, 2009
    #8
  9. Prafulla

    Don Burn Guest

    I would try blocking the data into 4K (or a multiple of 4K) chunks, and
    consider doing an initial allocation that is huge and then truncate the file
    when done. Worst case, if you preallocate, there are calls that can get
    the files blocks on disk, then roll calls to do them directly, but I would
    avoid this until you have exhausted the other avenues.


    --
    Don Burn (MVP, Windows DDK)
    Windows Filesystem and Driver Consulting
    Website: http://www.windrvr.com
    Blog: http://msmvps.com/blogs/WinDrvr
    Remove StopSpam to reply





    __________ Information from ESET NOD32 Antivirus, version of virus signature database 4055 (20090506) __________

    The message was checked by ESET NOD32 Antivirus.

    http://www.eset.com
     
    Don Burn, May 6, 2009
    #9
  10. Prafulla

    Jakob Bohm Guest

    I was referring to the following scenario A:

    +-----------------------+ +-----------------------+
    | Your device ||| | | ||| MegaRaid | ~~~~
    | -RAM- | | -DMA->>>>>>>>>>>>>>>>>>>(DISK)
    | ||| | | ||| | ~~~~
    | V | | ^ |
    +-------+ V +--+ +--+ ^ +-------+
    +----------V-+ +--^---------+
    V ^
    ||V||||||||||||||||^||
    - +>>>PCI bridge>>>+ -
    ||||||||||||||||||||||

    Assuming there is a RAM chip on your own PCI device, this RAM holds the
    data and is mapped into PCI memory space so it can be accessed with
    ordinary pointers once it has been mapped to a kernel mode virtual
    address by your driver or the framework it is loaded in. I was then
    referring to the case of the DMA controller on the MegaRaid card
    accessing that RAM chip on your card by
    issuing bus master requests through the PCI bridge to the physical
    address range which the PCI bridge has mapped to your device and then
    expecting the PCI bridge to route those DMA requests to your device.
    This means the data will be moving from your PCI card to the MegaRaid
    PCI card without going through the System RAM or the CPU.

    Alternative scenario B:

    +-----------------------+ +-----------------------+
    | Your device ||| | | ||| MegaRaid | ~~~~
    | -DMA- | | -DMA->>>>>>>>>>>>>>>>>>>(DISK)
    | ||| | | ||| | ~~~~
    | V | | ^ |
    +-------+ V +--+ +--+ ^ +-------+
    +----------V-+ +--^---------+
    V ^
    ||V||||||||||||||||^||
    - V PCI bridge ^ -
    ||V||||||||||||||||^||
    V ^
    ||||||||||||||||||||||
    - System RAM -
    ||||||||||||||||||||||

    If there is no memory mapped RAM chip on your device, but your device
    has a DMA controller of its own copying data to system RAM, then there
    will always be two DMA operations, one by your DMA controller and one by
    the MegaRaid DMA controller. The trick is to avoid also spending time
    copying the data from one place in System RAM to another place in system
    RAM.

    Alternative scenario C:

    +-----------------------+ +-----------------------+
    | Your device |||| | | ||| MegaRaid | ~~~~
    | -Dumb- | | -DMA->>>>>>>>>>>>>>>>>>>(DISK)
    | |||| | | ||| | ~~~~
    | V | | ^ |
    +-------+ V +--+ +--+ ^ +-------+
    +----------V-+ +--^---------+
    V ^
    ||V||||||||||||||||^||
    - V PCI bridge ^ -
    ||V||||||||||||||||^||
    V ^
    ||| ^
    -CPU- ^
    ||| ^
    V ^
    ||||||||||||||||||||||
    - System RAM -
    ||||||||||||||||||||||

    If there is no memory mapped RAM chip and no DMA controller on your
    device, and your device is not a SCSI device connected to the same
    MegaRaid controller as the disk, then there is no way to avoid having
    some piece of code related to your device processing every byte of data
    just to get it into RAM in the first place, and you should initially
    focus on optimizing that code.


    --
    Jakob Bøhm, M.Sc.Eng. * * direct tel:+45-45-90-25-33
    Netop Solutions A/S * Bregnerodvej 127 * DK-3460 Birkerod * DENMARK
    http://www.netop.com * tel:+45-45-90-25-25 * fax:+45-45-90-25-26
    Information in this mail is hasty, not binding and may not be right.
    Information in this posting may not be the official position of Netop
    Solutions A/S, only the personal opinions of the author.
     
    Jakob Bohm, May 6, 2009
    #10
  11. I verified the file copying from SATA to megaraid, Observed that CPU usage
    Processor has no DMA controller, and onboard System DMA is not used for PCI(e), only for ISA/LPC.
    What is the RAID kind you use? RAID5?
     
    Maxim S. Shatskih, May 6, 2009
    #11
  12. 3. Some motherboard chipset designs may have a limitation that prevents
    All of this would be a nasty violation of PCI spec.
     
    Maxim S. Shatskih, May 6, 2009
    #12
  13. it NTFS.sys)?, How can i get the device object of this NTFS.sys and how can i
    ZwCreateFile on a file, then ObReferenceObjectByHandle to get FILE_OBJECT, then IoGetRelatedDeviceObject to get DEVICE_OBJECT, then send an IRP to them.

    You can also use a user app which will open the NTFS file in noncached mode and write to it.
     
    Maxim S. Shatskih, May 6, 2009
    #13
  14. Prafulla

    Prafulla Guest

    Hi Jakob,

    My pci device will be referred to Scenario A, It has some RAM memory and it
    is mapped to Kernel Virtual address.
    You were mentioned that giving this pointer to the 2nd Device DMA controller.
    I was actually trying this method by establishing remote io target
    communication from driver 1 to driver2. So that, i can pass a pointer and 2nd
    Device(Megaraid) DMA controller should take that and complete the transaction.

    Driver 2 is a Megarid device & follows SCSI port model:
    file system fiters ==> file system ==> volume manager ==> disk driver ==>
    SCSI port driver ==> miniport driver

    Problem is: From device1, to which driver i should provide the pointer
    address, so that, it transfers the pointer to miniport driver DMA object?

    I tried directly sending it from Device1 driver to miniport
    driver(megaraid), I sent few commands about IOCTL_SCSI_GET_ADDRESS,
    IOCTL_SCSI_GET_INQUIRY_DATA, PASS_THROUGH_DIRECT, but only enquiry data
    returned some result and none of the other provided any details.
    Miniport driver doesnt support any read/write/IOCTL operations for doing DMA
    operations(Incase, if i want to write the data to raw disk, w/o any file
    system in it).

    This is not successful, I thought, i should send the commands to little
    upper level layers, which one will accept mine and forward to lower level
    driver?
    I am also cautious that, if i send the data to upper level drivers, is the
    bandwidth will be doubled, i.e PCI ==>Host==>PCIe.

    Please let me know your thoughts.

    Thanks,
    Prafulla
     
    Prafulla, May 7, 2009
    #14
  15. My pci device will be referred to Scenario A, It has some RAM memory and it
    Try IoAllocateMdl+MmBuildMdlForNonPagedPool, and pass this MDL down to the storage stack.
     
    Maxim S. Shatskih, May 7, 2009
    #15
  16. Prafulla

    Prafulla Guest

    Hi,
    I am having few queries on your suggestion, could you please explain

    1)MDL down to the storage stack----->
    file system fiters ==> file system ==> volume manager ==> disk driver ==>
    SCSI port driver ==> miniport driver
    Query 1: Could you please point out, which driver your referring to, i.e
    driver1 to ___driver?

    Yesterday, you were suggested below info:
    2) ZwCreateFile on a file, then ObReferenceObjectByHandle to get
    FILE_OBJECT, then IoGetRelatedDeviceObject to get DEVICE_OBJECT, then send an
    IRP to them.
    Query-2: Using above method, i can get the "Io Target" and can I use
    these read/write routines to send the data to device?
    WdfIoTargetFormatRequestForRead & WdfIoTargetFormatRequestForWrite.
    If i use this read/write to NTFS.sys device object, will it be written to
    the fileObject which is mentioned above?, how does device object knows where
    to be written?

    3)You can also use a user app which will open the NTFS file in noncached
    mode and write to it
    Query-3: I tried creating a MDL for my device memory & passed the user
    virtual address to application and from there, i have written to Megaraid
    volume using WriteFile.
    Observation: Here also it is used the CPU but less the zwWriteFile method
    from Kernel.
    MDL improved the performance.
    Is this is the methos you are referring?

    4) What is the RAID kind you use? RAID5? -- It will be in RAID0, we are not
    looking for redundancy

    5) onboard System DMA is not used for PCI(e), only for ISA/LPC.
    Query 5: When i used zwWriteFile b/w dev memory to Megaraid disk, CPU usage
    touched 100%, so, in this case, CPU performed the transfer and Not Megaraid
    DMA controller
    When i used zwWriteFile b/w RAM to Megaraid disk, no CPU usage,
    Megaraid DMA controller would have performed the data transfer.

    Query-6) currently, I am simulating the hardware environment using two
    different cards PCI & PCIe(Megaraid volume). Here both on different address
    space.
    Actual hardware will be: both the devices will be PCIe and it will be
    connected to a PCIe switch and to host.
    In this case, if a transfer initiated b/w PCIe-1 to PCIe-2, does the tranfer
    takes place just b/w these two(PCIe-1 ==>PCIe switch ==> PCIe-2) (OR)
    (PCIe-1 ==>PCIe switch==> Host ==> PCIe-2)
    I am suspecting that zwWriteFile should not eat CPU usage in this case
    compared to simulating environment, because both falls in same address space,
    I am not sure.

    Please let me know your thoughts.
    Sorry for so many queries & long mail.

    Thanks,
    Prafulla
     
    Prafulla, May 7, 2009
    #16
  17. 5) onboard System DMA is not used for PCI(e), only for ISA/LPC.
    Try allocating a buffer in usual RAM, memcpy from device memory to this buffer, and then write it to Megaraid controller.

    BTW - can your device do DMA? if yes - then it is a better way then using on-device memory.
     
    Maxim S. Shatskih, May 7, 2009
    #17
  18. Prafulla

    Jakob Bohm Guest

    I think the whole point of the exercise is to completely eliminate the
    extra copy in system ram by doing direct DMA from HisDevice device RAM
    to MegaRaid SCSI bus by passing a physical address in HisDevice memory
    range (disguised in an MDL or as an equivalent kernel range virtual
    address) to the MegaRaid driver.

    To do this the OP needs to:

    1. Figure out (at runtime!) which kind of memory address needs to be
    passed to IRP_WRITE of the top driver corresponding to the FILE_OBJECT
    obtained from ZwCreateFile() or from user mode. This is indicated by
    the DO_ flags in the device object at the top of the stack, as is the
    size (in stack levels) of the IRP he needs to allocate.

    2. Figure out which way of submitting IRP_WRITE IRPs to that driver
    would actually cause the the physical address of HisDevice to make it
    all the way to the MegaRaid DMA hardware, without any of the
    intermediate drivers deciding to copy the data to ordinary RAM before
    passing it on down the stack.

    #2 can be extra difficult if there is an on-access antivirus program
    installed (even if it is logically disabled).

    --
    Jakob Bøhm, M.Sc.Eng. * * direct tel:+45-45-90-25-33
    Netop Solutions A/S * Bregnerodvej 127 * DK-3460 Birkerod * DENMARK
    http://www.netop.com * tel:+45-45-90-25-25 * fax:+45-45-90-25-26
    Information in this mail is hasty, not binding and may not be right.
    Information in this posting may not be the official position of Netop
    Solutions A/S, only the personal opinions of the author.
     
    Jakob Bohm, May 7, 2009
    #18
  19. I think the whole point of the exercise is to completely eliminate the
    The whole point of the exercise is to optimize the performance. So, if DMA device->RAM and then DMA RAM->disk controller will be faster, then this is the way to go.

    I don't know how his PCI hardware manages the read/write cycles to its device memory. Maybe it is suboptimal in this case, and employing its busmaster DMA facility (if it has one) will be faster.
    Just pass the mapped device memory (from MmMapIoSpace) to ZwWriteFile.
    A machine with major perf requirements just plain cannot have this configuration :).
     
    Maxim S. Shatskih, May 7, 2009
    #19
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.