PCI express payload mystery

Discussion in 'Windows Vista Drivers' started by theo, Oct 11, 2006.

  1. theo

    theo Guest

    We have got our WDF driver up and running with an 8 lane PCI express
    card. It uses hardware DMA scatter gather and appears to be working
    well. However we are only getting about 50% of the theoretical
    bandwidth over 8 lanes (1GB/s instead of 2GB/s). Unfortunately we need
    to hit around 60%. The 50% doesn't include driver overhead and is
    measured by the hardware during the transfer.

    One of the main factors that can affect PCI express efficiency is the
    payload size. From what I have read the maximum payload is 4k which is
    supposed to give near to 100%. For 128 byte payloads with an estimated
    application overhead it's around 55%, and for 64 bytes it is around
    40%. On our AMD platform as we are not measuring the overhead, it is
    feasible that we're getting 64 byte payloads (our card supports 4k).

    A few things we have noticed so far:
    ATI X1900 XT has a maximum payload size of 128 (according to SiSoft
    Intel 965 chipset has a maximum payload size of 128 (according to the
    data sheet).

    Here are a couple of questions I hope someone can shed some light on:

    1) On an AMD system PCI express is interfaced by HyperTransport which
    is packet based, and has a maximum payload size of 64 bytes. Does this
    mean its PCI express interface is also limited to 64 byte payloads?

    2) Does anyone know of a motherboard chipset that supports large
    payloads (4k)?

    Apologies if this is slightly off topic, but I guessed it is something
    that will affect anyone writing a PCI express driver.
    theo, Oct 11, 2006
    1. Advertisements

  2. theo

    Tim Roberts Guest

    It has been my experience that the PCI express bus overhead (ping,
    keepalives, credit exchanges) consumes about 50% of the bus bandwidth. We
    were able to sustain only about 50% of a PCI express x1 endpoint -- about
    I don't think there is very much that a driver-writer can do about this. As
    you say, this is almost entirely a root complex chipset issue.
    Tim Roberts, Oct 12, 2006
    1. Advertisements

  3. theo

    theo Guest

    Thanks Tim, that is very interesting.

    It is a shame PCI express hasn't really been implemented as well as it
    could be regarding the payload sizes (on the current generation

    E.g. comparing bandwidth of PCI express with AGP the picture roughly
    looks like this (based on our AMD test platform):

    AGP x4 = PCI express 8 lane
    AGP x8 = PCI express 16 lane

    I have managed to find an Intel chipset which supports 256 byte
    payloads (it's limited to 8 lanes though).

    As a reminder if anyone can confirm if AMD systems are inherintly
    limited to 64 byte PCIe payloads due to hypertransport I will be very
    theo, Oct 16, 2006
  4. Read or write?
    already5chosen, Oct 16, 2006
  5. theo

    theo Guest

    For those benchmarks our device was just reading data from system
    theo, Oct 16, 2006
  6. First, I should say that reading at 1GB/s is not bad at all.
    The rest of my post is based on PCI experience. I don't know if the
    same applies to PCIe.

    For regular PCI on Opteron-based systems the PCI Read command code,
    i.e. "Memory Read" vs. "Memory Read Line" vs. "Memory Read Multiple"
    makes a big difference.
    Of course, plain Memory Read sucks - it should to according to PCI
    However "Memory Read Line" also doesn't perform all that great - it
    seems Opteron treats this code literally and delivers exactly one cache
    line per burst which happens to be 64 bytes.
    So "Memory Read Multiple" was the only way we were able to achieve good
    bus utilization on Opteron system.

    Once again, I have no experience with PCIe, I never read PCIe spec so
    quite likely all I said doesn't apply to your case.
    already5chosen, Oct 16, 2006
  7. theo

    Tim Roberts Guest

    Even so, there are huge advantages to PCIe: (1) that connection is
    point-to-point, not shared with other devices, (2) the PCIe x16 board only
    needs 82 pins, whereas AGPx8 needs 132 pins; and (3) AGP had just about
    reached its theoretical speed limit, whereas Intel already has PCI express
    running at 8Gbit in their labs, and expects to push it well beyond that.
    Tim Roberts, Oct 18, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.