[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Read Speed
Olaf (and others),
I think that you need to be carefull when doing this. By setting your
device's address space to write-back cacheable in the processor, you now
have a potential consistency problem.
Because the processor can now cache your PCI device's memory space, when you
do a read, you have no idea if that read is actually getting to the PCI
device or if the CPU is returning data from its own cache. (And with
write-back cacheable, the same is true of writes, which might get stuck in
the cache for an undetermined period of time.)
How safe you are depends on how you are accessing your device. You've
mapped 4 MB of space using the MTRR and I don't know of any Intel processor
with 4 MB of cache (but I've not been deeply involved with Intel CPUs in
about 1 1/2 years), so if your code is simply looping through the 4MB
window, then you're probably not seeing stale data because by the time that
you come back to the same address again it had been pushed out by some other
data elsewhere in the block.
If you are only using a smaller portion of the window (say, 64K-128K), then
I can almost guarantee that some of the data that you think you are reading
from the device will be stale.
Also, you're thrashing your cache pretty badly doing this and while the PCI
transfer is higher, your CPU execution has probably slowed down.
If you set an MTRR as write-back for control & status registers then you are
really in trouble. Reading status will return old values from cache and
will not be updated properly. Control writes will stick in the cache as
modified lines and won't actually affect your device until they are pushed
out. And when they are pushed out, they will go out as a full 32-byte cache
line and some PCI devices don't like having a burst write to control
registers.
In summary, this is probably ok to do for development purposes, if you are
careful about it's use. I don't think that anyone should ever ship a
product using this solution.
And anyone else following this thread, please don't just turn on an MTRR and
expect wonderfull performance enhancements with no consequences. Think
about what you are doing and if it is safe.
My .02...
-Richard Walter
Senior Hardware Engineer
Brocade Communications Systems
rwalter@brocade.com
Note: I speak for myself, not for Brocade Communications.
-----Original Message-----
From: Olaf Birkeland [mailto:Olaf.Birkeland@fast.no]
Sent: Monday, May 07, 2001 2:43 AM
To: pci-sig@znyx.com
Cc: Andre David
Subject: RE: Read Speed
Hi,
We've tried this as well (although we of course use DMA for real application
SW ;-), but with better success. In Linux, we added a MTRR by the command:
echo "base=0xd9000000 size=0x400000 type=write-back">/proc/mtrr
This setting is of course dependent on where you card is mapped, and whether
your memory is prefetchable or not. I our case, we reached almost 50 MB/s
reading from the CPU with this setup (8 dword bursts), including some added
latency as the target device is behind a PCI bridge. We used memcpy() for
reading the memory.
MTRR is only available on PentiumII and upwards. Test system had 733 MHz
PIII, i815E chipset. But again: Memory transfer by CPU reads are not
advisable......
Regards,
- Olaf Birkeland
> -----Original Message-----
> From: adavid@pcru00.cern.ch [mailto:adavid@pcru00.cern.ch]On Behalf Of
> André David
> Sent: Thursday, April 05, 2001 17:19
> To: pci-sig@znyx.com
> Subject: Read Speed
>
>
> Hi!
>
> I am working on a group developing a PCI board for data acquisition.
> Since our priority is getting it running, busmastering capabilities for
> such things as DMA to the host memory are not on the front line of
> development.
>
> Now, since I'm the guy behing the device driver, I have done some
> benchmarking with a simple device driver (in Linux, of course) using a
> standar PCI VGA adapter.
> This "driver" just uses the memcpy() transfer some data between the main
> memory and the board's framebuffer.
>
> I have tried three different processor/chipset combinations and the
> results I get, are:
>
> (results after BIOS and MTRR parameters tweaking)
>
> Reading (Mbyte/s)
> Writing(Mbyte/s)
> Intel 440FX (PII@233) 7.03
> 36.16
> Intel 440BX (2*PII@400) 8.62 102.4
> VIA KT133 (Athlon@900) 7.46 119.6
>
> Now this points to a pattern in which the north bridge seems unable to
> read with a reasonable speed from the board. I know writing is always
> easier than reading (from the specs a single data phase read is slower
> than a single data phase write (4 clock cycles vs. 3)).
>
> The north bridge behaviour is inadmissible even if we assume that all
> the reads are single data phase reads (4 clock cycles), with even medium
> devsel (1 more clock cycle lost) and a wait-state from the VGA board
> (another clock sycle lost), because this would give a total of 6 clock
> cycles, or 22Mb/s total bandwidth.
>
> So my questions are:
>
> - Since it looks that north bridges have always been like this, has
> anyone found one that is not?
> - Is it admissible (logical) that the north bridge is like this?
> - Since I have only talked about commodity PC's, could there be
> something on the industrial market that does not suffer from this
> apparent "feature"?
>
> Thanks in advance for all comments,
>
> Andre David
>