How to get 4K alignment DMA transfer for SCSI miniport and others.

Discussion in 'Windows Vista Drivers' started by flyingfly, Aug 17, 2010.

  1. flyingfly

    flyingfly Guest

    HI:
    I am writing a SCSI miniport driver to present a ram disk using FPGA
    based platform, and there are some problems I need to solve:
    1. For bus master DMA with scatter/gather supported, is there a way to
    make the start address of each physical segment aligned to 4KB? I find in
    http://support.microsoft.com/kb/153264/en-us/ that only a maximun of Double
    Dword alignment is supoorted, but I think that may not be the limitation..
    2. Following the 1st question, can each data segment be a multiple of
    512bytes? I find in WDK docs that the related parameters are
    MaximunTransferLength and NumberOfPhysicalBreaks in
    PORT_CONFIGURATION_INFORMATION, which can not help to solve my problem.
    3. In Windows XP, how many SCSI commands are mandatory for OS to use
    the ram disk? I implemented INQUIRY, READ CAPACITY, READ, WRITE, FORMAT UNIT
    in my driver, and leave other command to SCSI_INVALID_REQUEST. But in the
    implementation, OS failed to format the disk. And during the initialization
    process, OS sends INQUIRY vital data, and MODE SENSE command which I reply
    with SCSI_INVALID_REQUEST. What is the problem in my driver and what else I
    need to do to make the disk work properly.

    Thanks a lot!
     
    flyingfly, Aug 17, 2010
    #1
    1. Advertisements

  2. 1. For bus master DMA with scatter/gather supported, is there a way to
    No. Windows DMA API cannot provide these guarantees. So, the hardware must be smart enough to never require them.
    Just look at what commands arrive to your code and fail.

    I think that at least some mode pages (MODE SENSE) are mandatory.

    FORMAT UNIT is not used by Windows "format" command. VERIFY is used.
     
    Maxim S. Shatskih, Aug 17, 2010
    #2
    1. Advertisements

  3. answer to your 3rd q: add Report luns command which is required.

    REPORT LUNS is only required IIRC if you have non-contiguous LUN numbering.

    If you have contiguous LUNs, or only LUN0 - then it is not needed.

    Well, STORPORT have probably changed the rules and made REPORT LUNS mandatory.
     
    Maxim S. Shatskih, Aug 17, 2010
    #3
  4. flyingfly

    flyingfly Guest

    So the DMA scatter/gather can not guarantee the alignment of data buffer to
    exceed double DWORD, and the size of each segment can not be preassigned.
    Obviously the hardware have to be smart..
    I replied MODE SENSE(page code = 0x3F) with one mode parameter block
    descriptor(Number of blocks set to 0, Block length set to 0x200). And I
    replied VERIFY with SRB_STATUS_SUCCESS always.

    And I define a BYTE array with a depth of 16*1024*1024 to present a 16MB
    disk. For READ and WRITE operation, I read from/write to the BYTE array.

    I passed the initialization, but the "format" command can't be correctly
    processed.

    During the debugging process, OS firstly write 1 bolck of data(512byte) to
    the block address of 3F, then repeatly send commands READ CAPACITY, MODE
    SENSE(mode page = 0x3F), TEST UNIT READY(I replied with success always), and
    READ 8 blocks of data at block address 3F, READ 4 blocks of data at block
    address 0x7F.

    After sending the above commands for certain times, OS begins to send VERIFY
    from block address of 0x3F with a length of 512 blocks, and increase the
    address by 0x200 until the last VERIFY command of address 0x7C3F and length
    of 0x143 block.

    After that, OS WRITE 1 block data to address 0x3F and begin to repeat READ
    CAPACITY, MODE SENSE, and READ 8 blocks at address 0x3F , 4 blocks at address
    0x7F.

    In the end, the "format" finished with failure.

    I'm wondering what exactly is the OS expecting to get from the 8 blocks
    data at address 0x3F and the 4 blocks at address 0x7F, OS just write 1 block
    data to 0x3F, what more does it want?

    There are 3 commands I did not reply:
    The function of SRB is SRB_FUNCTION_IO_CONTROL, I am not quite sure this
    function, and I replied with SRB_STATUS_INVALID_REQUEST.
    INQUIRY with EVPD set to 1, I don't think vital data is madatory, so I
    replied with SRB_STATUS_ERROR.
    MODE SENSE with page code set to 0x1C, I set in INQUIRY that the driver
    support SCSI-2, so the page code is reserved and I also replied with
    SRB_STATUS ERROR.

    What is the problem with my dirver? Is there any other things I did not
    handle?

    Thanks a lot!
     
    flyingfly, Aug 19, 2010
    #4
  5. So the DMA scatter/gather can not guarantee the alignment of data buffer to
    Windows DMA can work in 2 modes:

    a) double-buffering provided by the OS (HAL or pci.sys). This is for 32bit (non64bit, no dual address cycle) busmasters on >4GB RAM, and also can be requested by any device by setting DEVICE_DESCRIPTION::ScatterGather = FALSE.

    In these cases, the OS copies all data from your MDL to its internal page-aligned, physically contiguous buffer, and then returns the single-entry SGL describing _this_ buffer to you.

    Here the alignment is OK and the segment is only 1, but... at the cost of extra memcpy() inside the DMA adapter object.

    b) transparent mode, where the SGL is stupidly built off the MDL you have provided to the DMA adapter.

    In this mode, Windows cannot control the things like alignment and SGL entry size. More so, IIRC can even send you the SGL with strictly adjacent entries without concatenating them to one larger.

    So, your alignment is up to upper layers, which created the MDL in question.

    And this is logical. In b) mode, Windows promises to never do extra memcpy() and just stupidly creates an SGL off MDL. Nothing else.

    You cannot convert unaligned data to aligned without memcpy/memmove, note this.

    So, if you need to satisfy the DMA alignment requirement, you need to:
    a) use the first way with total memcpy() or all data
    OR
    b) implement your own smarter double-buffering, which will still do memcpy(), but only when really necessary
    OR
    c) influence the upper layers in Windows to only send you properly aligned MDLs

    The storage stack traditionally uses the c) way. You can provide the maximum MDL size (in pages, called MaximumPhysicalBreaks, and in bytes, called MaximumTransferLength), and the alignment requirement in AlignmentMask.

    The most major client of the storage stack is the low-level noncached paths of the FSD, and they provide the guarantee that everything is sector-aligned. It this enough for you?

    To bypass this guarantee, one must either use SCSI passthrough, or use Win32 noncached IO on a usual disk file _with the buffer pointer not aligned_, which the Win32 docs explicitly forbid (and correctly says that this can work of fail, depending on storage controller hardware details).

    In these cases, it is possible for the storage port to see the non-sector-aligned MDLs.

    From practice, I can say that alignment in CD/DVD burning code is yes, an issue (to avoid sending misaligned MDL to the port which the port will immediately fail), and that atapi/aic78xx/usbstor/sbp2port all have different alignment requirements.
    This is usually the sign that Disk.sys failed some important IOCTL. You can trace them all until failure, and then look at Disk source on how these IOCTLs are mapped to SCSI commands.
     
    Maxim S. Shatskih, Aug 19, 2010
    #5
  6. flyingfly

    flyingfly Guest


    Thank you very much for your detailed explanation, I am getting clearer
    about the alignment requirements.

    So in a), the buffer shown to me will be page-aligned(4K), which is good for
    me. b) is too complex to handle and c) can only guarantee a alignment of
    double DWORD which is not sufficient.

    I have another question, for SCSI miniport driver,
    SCSIPortGetUncachedExtension can not allocate more than 100KB of uncached
    extionsion in XP and earlier OS.
    If I need more than 100KB space for shared data and control information
    between host and adapter, can I just define the data stuctures in
    DeviceExtension?
    Will the OS guarantee those data structures to have a static address in
    memory and is physically continuous?
    So that I can provide the physical memory address of those data structures
    to adapter.

    Thanks again!
     
    flyingfly, Aug 25, 2010
    #6
  7. If I need more than 100KB space for shared data and control information
    No, they will not be DMAable.

    Simplify your "shared data and control information" to not require such a huge common buffer.
     
    Maxim S. Shatskih, Aug 25, 2010
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.