[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Read Speed
Hi Richard,
Point taken! As I wrote, this is *NOT* what I use or recommend for
production software. I was more trying to get Andre (literarily) up to speed
in his lab testing.
Anyhow, in my specific case, the WB mapping is ok as the memory buffer is
only written by the CPU itself, thus no stale data. Device registers are
mapped in another non-cached region. The main reason for testing this for me
was to observe device behavior under burst reads..... (And I'm not using the
entire 4MB, but rather 2x64kB which were mapped in the 4MB window as they
are on separate PCI devices.)
Hopefully this forum is knowledgeable enough to read my first posting, even
if it lacked extensive warnings of the type "don't try this at home".
Regards,
- Olaf
> -----Original Message-----
> From: Richard Walter [mailto:rwalter@brocade.com]
> Sent: Monday, May 07, 2001 7:49 PM
> To: 'Olaf Birkeland'; pci-sig@znyx.com
> Cc: Andre David
> Subject: RE: Read Speed
>
>
> Olaf (and others),
>
> I think that you need to be carefull when doing this. By setting your
> device's address space to write-back cacheable in the processor, you now
> have a potential consistency problem.
>
> Because the processor can now cache your PCI device's memory
> space, when you
> do a read, you have no idea if that read is actually getting to the PCI
> device or if the CPU is returning data from its own cache. (And with
> write-back cacheable, the same is true of writes, which might get stuck in
> the cache for an undetermined period of time.)
>
> How safe you are depends on how you are accessing your device. You've
> mapped 4 MB of space using the MTRR and I don't know of any Intel
> processor
> with 4 MB of cache (but I've not been deeply involved with Intel CPUs in
> about 1 1/2 years), so if your code is simply looping through the 4MB
> window, then you're probably not seeing stale data because by the
> time that
> you come back to the same address again it had been pushed out by
> some other
> data elsewhere in the block.
>
> If you are only using a smaller portion of the window (say,
> 64K-128K), then
> I can almost guarantee that some of the data that you think you
> are reading
> from the device will be stale.
>
> Also, you're thrashing your cache pretty badly doing this and
> while the PCI
> transfer is higher, your CPU execution has probably slowed down.
>
> If you set an MTRR as write-back for control & status registers
> then you are
> really in trouble. Reading status will return old values from cache and
> will not be updated properly. Control writes will stick in the cache as
> modified lines and won't actually affect your device until they are pushed
> out. And when they are pushed out, they will go out as a full
> 32-byte cache
> line and some PCI devices don't like having a burst write to control
> registers.
>
> In summary, this is probably ok to do for development purposes, if you are
> careful about it's use. I don't think that anyone should ever ship a
> product using this solution.
>
> And anyone else following this thread, please don't just turn on
> an MTRR and
> expect wonderfull performance enhancements with no consequences. Think
> about what you are doing and if it is safe.
>
> My .02...
>
> -Richard Walter
> Senior Hardware Engineer
> Brocade Communications Systems
> rwalter@brocade.com
> Note: I speak for myself, not for Brocade Communications.
>
>
> -----Original Message-----
> From: Olaf Birkeland [mailto:Olaf.Birkeland@fast.no]
> Sent: Monday, May 07, 2001 2:43 AM
> To: pci-sig@znyx.com
> Cc: Andre David
> Subject: RE: Read Speed
>
>
> Hi,
>
> We've tried this as well (although we of course use DMA for real
> application
> SW ;-), but with better success. In Linux, we added a MTRR by the command:
>
> echo "base=0xd9000000 size=0x400000 type=write-back">/proc/mtrr
>
> This setting is of course dependent on where you card is mapped,
> and whether
> your memory is prefetchable or not. I our case, we reached almost 50 MB/s
> reading from the CPU with this setup (8 dword bursts), including
> some added
> latency as the target device is behind a PCI bridge. We used memcpy() for
> reading the memory.
>
> MTRR is only available on PentiumII and upwards. Test system had 733 MHz
> PIII, i815E chipset. But again: Memory transfer by CPU reads are not
> advisable......
>
> Regards,
> - Olaf Birkeland
>
> > -----Original Message-----
> > From: adavid@pcru00.cern.ch [mailto:adavid@pcru00.cern.ch]On Behalf Of
> > André David
> > Sent: Thursday, April 05, 2001 17:19
> > To: pci-sig@znyx.com
> > Subject: Read Speed
> >
> >
> > Hi!
> >
> > I am working on a group developing a PCI board for data acquisition.
> > Since our priority is getting it running, busmastering capabilities for
> > such things as DMA to the host memory are not on the front line of
> > development.
> >
> > Now, since I'm the guy behing the device driver, I have done some
> > benchmarking with a simple device driver (in Linux, of course) using a
> > standar PCI VGA adapter.
> > This "driver" just uses the memcpy() transfer some data between the main
> > memory and the board's framebuffer.
> >
> > I have tried three different processor/chipset combinations and the
> > results I get, are:
> >
> > (results after BIOS and MTRR parameters tweaking)
> >
> > Reading (Mbyte/s)
> > Writing(Mbyte/s)
> > Intel 440FX (PII@233) 7.03
> > 36.16
> > Intel 440BX (2*PII@400) 8.62 102.4
> > VIA KT133 (Athlon@900) 7.46 119.6
> >
> > Now this points to a pattern in which the north bridge seems unable to
> > read with a reasonable speed from the board. I know writing is always
> > easier than reading (from the specs a single data phase read is slower
> > than a single data phase write (4 clock cycles vs. 3)).
> >
> > The north bridge behaviour is inadmissible even if we assume that all
> > the reads are single data phase reads (4 clock cycles), with even medium
> > devsel (1 more clock cycle lost) and a wait-state from the VGA board
> > (another clock sycle lost), because this would give a total of 6 clock
> > cycles, or 22Mb/s total bandwidth.
> >
> > So my questions are:
> >
> > - Since it looks that north bridges have always been like this, has
> > anyone found one that is not?
> > - Is it admissible (logical) that the north bridge is like this?
> > - Since I have only talked about commodity PC's, could there be
> > something on the industrial market that does not suffer from this
> > apparent "feature"?
> >
> > Thanks in advance for all comments,
> >
> > Andre David
> >
>
>