Andre,
I don't have much (read: none) experience with North Bridges per se, but i
can tell you what we've seen in our embedded work. The read performance
numbers you quote don't seem too far out of line with what i'd expect for
single-beat reads, from any PCI part. Meaning, what you're seeing is
endemic to PCI; I think you're missing some clock cycles in your
calculations. When the Master (the processor) wishes to do a PCI read, the
Master sends the address, waits for DEVSEL, then gets tons of wait states
while the Target's (the VGA board) back-end fetches the data from whatever
RAM is back there. That adds an entire memory fetch cycle into your wait
state time, plus the multi-cycle pipelines out and back through the Bridge
itself. The process then completes when the read data from the back-end is
actually transferred back to the Master. If the Target were a little more
intelligent, it might Retry in there somewhere instead of wait-stating the
Master, but the effect on the overall throughput for a single-Master
architecture is similar. The only ways that i know of to mitigate this
"feature" is 1) make every PCI transaction a Write (which you said was not
likely), or 2) do burst reads (which can help a lot, but still won't get
your throughput up to the PCI Write numbers).
The Write numbers are better because Writes can finish as soon as the
Target receives the data (known as Posted Writes). The Master doesn't have
to wait for the Target to write the data to the back-end before continuing.
BTW, It also appears as though your first platform is doing single-beat or
smaller-burst Writes, and the latter two platforms are doing big-burst
Writes (inferring from the differences in the bandwidth numbers you quote).
mike nemeth
Rockwell-Collins, Inc.
{Expressed herein is my opinion and not that of Rockwell Collins, Inc.}
André David
<Andre.David@ To: pci-sig@znyx.com
cern.ch> cc:
Sent by: Subject: Read Speed
adavid@pcru00
.cern.ch
04/05/01
10:18 AM
Hi!
I am working on a group developing a PCI board for data acquisition.
Since our priority is getting it running, busmastering capabilities for
such things as DMA to the host memory are not on the front line of
development.
Now, since I'm the guy behing the device driver, I have done some
benchmarking with a simple device driver (in Linux, of course) using a
standar PCI VGA adapter.
This "driver" just uses the memcpy() transfer some data between the main
memory and the board's framebuffer.
I have tried three different processor/chipset combinations and the
results I get, are:
(results after BIOS and MTRR parameters tweaking)
Reading (Mbyte/s)
Writing(Mbyte/s)
Intel 440FX (PII@233) 7.03
36.16
Intel 440BX (2*PII@400) 8.62 102.4
VIA KT133 (Athlon@900) 7.46 119.6
Now this points to a pattern in which the north bridge seems unable to
read with a reasonable speed from the board. I know writing is always
easier than reading (from the specs a single data phase read is slower
than a single data phase write (4 clock cycles vs. 3)).
The north bridge behaviour is inadmissible even if we assume that all
the reads are single data phase reads (4 clock cycles), with even medium
devsel (1 more clock cycle lost) and a wait-state from the VGA board
(another clock sycle lost), because this would give a total of 6 clock
cycles, or 22Mb/s total bandwidth.
So my questions are:
- Since it looks that north bridges have always been like this, has
anyone found one that is not?
- Is it admissible (logical) that the north bridge is like this?
- Since I have only talked about commodity PC's, could there be
something on the industrial market that does not suffer from this
apparent "feature"?
Thanks in advance for all comments,
Andre David
(See attached file: Andre.David.vcf)
=?iso-8859-1?Q?Andre.David.vcf?=