[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read Speed




Andre,

  the problem is worse than you are estimating.  First off the host bridge
on a 486 class machine could start a read about every 6 clock cycles (i.e.
fast decode on the target device and 0 wait states).  When you move to the
Pentium class machine you get to 7 clock cycles per read.  And a
PentiumIII class machine is even slower.

  Those numbers don't include any target latency.  You are going to need
to assume at least a medium decode speed, and if you are accessing DRAM or
some other off-chip memory, you should add another 5-10 clock cycles.

  So the number of clock cycles that you will roughly see for a read
access is going to be about:
	8 (new processor limitations)
	1 decode speed
	10 memory wait states
        = 19 clock cycles @33Mhz => 1.7Mdwords/second => 6.9Mbytes/second

  Those are just rough estimates (and I didn't do the math until I had
chosen all of the clock cycle estimates).  You notice that my estimates
are actually a bit low compared to what you are getting.

  If you try going to a 66Mhz bus speed, you bus frequency will double,
but ALL of your latencies will double also, and you will be running at the
same bandwith.

  Lesson to be learned: use DMA if you need peripheral to host transfers
to be fast.

-- Neal

On Thu, 5 Apr 2001, André David wrote:

> Hi!
>
> I am working on a group developing a PCI board for data acquisition.
> Since our priority is getting it running, busmastering capabilities for
> such things as DMA to the host memory are not on the front line of
> development.
>
> Now, since I'm the guy behing the device driver, I have done some
> benchmarking with a simple device driver (in Linux, of course) using a
> standar PCI VGA adapter.
> This "driver" just uses the memcpy() transfer some data between the main
> memory and the board's framebuffer.
>
> I have tried three different processor/chipset combinations and the
> results I get, are:
>
> (results after BIOS and MTRR parameters tweaking)
>
>                                              Reading (Mbyte/s)
> Writing(Mbyte/s)
> Intel 440FX (PII@233)                7.03
> 36.16
> Intel 440BX (2*PII@400)            8.62                        102.4
> VIA KT133 (Athlon@900)            7.46                        119.6
>
> Now this points to a pattern in which the north bridge seems unable to
> read with a reasonable speed from the board. I know writing is always
> easier than reading (from the specs a single data phase read is slower
> than a single data phase write (4 clock cycles vs. 3)).
>
> The north bridge behaviour is inadmissible even if we assume that all
> the reads are single data phase reads (4 clock cycles), with even medium
> devsel (1 more clock cycle lost) and a wait-state from the VGA board
> (another clock sycle lost), because this would give a total of 6 clock
> cycles, or 22Mb/s total bandwidth.
>
> So my questions are:
>
> - Since it looks that north bridges have always been like this, has
> anyone found one that is not?
> - Is it admissible (logical) that the north bridge is like this?
> - Since I have only talked about commodity PC's, could there be
> something on the industrial market that does not suffer from this
> apparent "feature"?
>
> Thanks in advance for all comments,
>
> Andre David
>

-- 
-- Neal Palmer

The Dini Group
1010 Pearl St #6
La Jolla, CA 92037
(858) 454-3419 x16
(858) 454-1728 (Fax)
begin:vcard 
n:David;André
tel;work:+41792013849
x-mozilla-html:FALSE
org:CERN - Centre Europeen de Recherche Nucleaire;Experimental Physics Division - NA60 Experiment
adr:;;;;;;
version:2.1
email;internet:Andre.David@cern.ch
note:Geneva, Switzerland
x-mozilla-cpt:;-11552
fn:André David
end:vcard