[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: How to achieve a long burst?



I hate to add to the flow, but it might be useful to point
out an acheivable level.  My group made a board using an
Intel 80960RD which acheived 118MB/second "burst" over a
few (8?) MB, and averaged 97MB/second over 12 hours.  This
was with a 166MHz Pentium, 430FX chip set, NT4.0 with a
"checked" kernel (Read slow).  These numbers are significantly
higher than I have seen kicked around here as good achievable
numbers.  

We did not do anything heroic, or even study much - the
960 had the PCI interface premade, and we just programed
the appropriate transfer types.

We ran this in a few other PCs of the era with similar results.
'TX chip sets, K5's, 133's, etc.  They were all EDO memory.

OH, the 960RD is a single chip with a processor, DRAM
controller, DMA transfer controller, and a bridge from 
its local bus to the PCI bus.  These transfers were from 
its local bus to the PCs memory by way of the PCI bus.

Hope this puts a useful perspective on things.

-----Original Message-----
From: Scott C. Karlin [mailto:scott@CS.Princeton.EDU]
Sent: Friday, December 15, 2000 9:32 AM
To: peter.marek@marekmicro.de
Cc: pci-sig@znyx.com
Subject: Re: How to achieve a long burst?


Peter,

I think we are in agreement here.  Please allow me to clarify
and update my position a little bit.

My experiment used 64-byte chunks because that is what my
application needs.  Larger sized transfers will definitely
be faster as they will have less overhead per byte.

I've suspected that my DMA numbers were a little on the slow
side.  I have received some suggestions on how I can improve
the speed more (including what you suggest in point #3, below).
I have received reports that the board I am using can perform
DMA reads at 64 MByte/sec.  I obviously need to review all
my settings to get a better understanding as to why my
performance isn't as good as it could be.  If anyone on the
list has some specific experience with the RAMiX PMC694, I'd
like to start an off-line conversation to discuss this issue.

My first reading of Ted Firlit's post made me think that he
wanted a DMA engine on the motherboard because he thought a
DMA engine would always be "better" than PIO (programmed I/O).
My response was intended to point out (1) that PIO can be faster
than DMA (for small transfers) and (2) that since PIO on some
motherboard/CPU combinations is relatively fast (68 MByte/sec
writes), one may discover that DMA is not needed for a particular
application.

After thinking about this some more, I think that Ted Firlit was
pointing out that to get reasonable transfer rates from a PCI
peripheral to the host's local memory, you either need a high-speed
PCI DMA engine on the motherboard to read the data or you need the
peripheral to be a master and write the data across the bus.
Note that the peripheral does not need a DMA engine; it can also
use a local CPU to perform the transfer.

I'm all for putting a high-speed PCI DMA engine on the motherboard.

Finally, here's what DMA means to me.  DMA is simply Direct
Memory Access: something other than the processor in question
directly accesses that processor's memory.  If I'm writing code
on processor A, and data can either move within, leave, or enter
A's local memory space without processor A directly using CPU
instructions (such as reg->mem, mem->reg, or even mem->mem) then
that's DMA.  It doesn't matter to me if the data was moved with
a specialized state machine (traditional DMA engine) or with a
CPU on a peripheral PCI board.

Regards,

Scott


Peter Marek writes:
> 
> you're comparing a horse and a cow in the scenario pointed out.
> I have to throw my hat in for DMA transfers.
> 
> Some of my experience:
> 
> 1) You just get reasonable DMA performance if you do long bursts (4kB or
so)
> 2) You need to use the right PCI commands during the transfer (Memory Read
> Line/Multiple and Memory Write and Invalidate)
> 3) For small junks of data, host initiated transfers may work as you
pointed
> out.
> 4) PCI2PCI bridges kill DMA performance. If you could do the DMA directly
> from MPC8240 an with no 21554 in between you would receive much higher
> bandwidth.
> 5) We have implemented systems which transfer 80MB/s !!!sustained!!!, not
> peak using Bus master DMA.
> 
> 
> Regards,
> 
> 
> Peter Marek
> General Director
> MarekMicro GmbH
> Kropfersrichter Str. 6-8
> D-92237 Sulzbach-Rosenberg
> Germany
> Phone: 049 - 9661 - 908 - 210
> Fax:      049 - 9661 - 908 - 100
> ----- Original Message -----
> From: Scott C. Karlin <scott@CS.Princeton.EDU>
> To: <pci-sig@znyx.com>
> Sent: Thursday, December 14, 2000 6:03 PM
> Subject: Re: How to achieve a long burst?
> 
> 
> > Maybe you don't need DMA on the host bridge...
> >
> > I did some recent measurements of data transfer across the PCI bus
> > to and from a board with DMA capability.  The motherboard is an
> > Intel CA810E with a 733MHz Pentium III.  The target PCI board is
> > a RAMiX PMC694 which has a 266 MHz Motorola MPC8240 (PPC core with
> > other goodies including a DMA engine) behind an Intel 21554 bridge.
> > The PCI bus is 32 bits / 33 MHz.  I'm running Linux on the Pentium
> > and "raw" code on the 8240.
> >
> > Here's what I measured:
> >
> > Bus Master    Mode    Direction   MBytes/sec (64-byte chunks)
> > ===========   ====    =========   ===========================
> > Pentium III   memcpy  read                3.98
> > Pentium III   memcpy  write              68.70
> > MPC8240       DMA     read               22.66
> > MPC8240       DMA     write              34.19
> >
> > (memcpy is the gcc compiler supplied routine)
> >
> > The host bridge seems to be doing a good job of write combining.
> >
> > What this tells me is that for many applications the transfer
> > speed of a host Pentium may be fast enough.  It also confirms
> > what has been observed before: it is better to push data across
> > the bus than to pull it.  Certainly, if you want more speed, you
> > can implement a DMA controller on the host side; however, you'll
> > always be limited by the slower device (master or target) of the
> > transaction.  (DMA also has other advantages: you can overlap DMA
> > transfers with other computation and you might be able to use a
> > slower CPU as a result.)
> >
> > Scott
> >
> > Ted Firlit writes:
> > >
> > > Unfortunately, it appears that the bridge chips on PC motherboards do
> > > not implement DMA engines.  Therefore it falls to the peripherals to
> > > implement DMA.. which also mandates they must be capable of becoming
the
> > > PCI bus master.
> >
> >
> > ---------------------------------------------------------------------
> > Scott C. Karlin                        Princeton University
> > http://www.cs.princeton.edu/~scott     Department of Computer Science
> > ---------------------------------------------------------------------