[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to achieve a long burst?



Hi Scott,

I agree that putting a DMA engine on the CPU may be a way to speed target
performance... if just a single device uses the central DMA engine.
In case you integrate the DMA engine on the CPU, !!ALL!! device drivers
would have access to it and would potentially make use of it. The device
drivers had to arbitrate and synchronize access to the DMA engine before
they could start any DMA transfer. Synchronization amongst drivers is time
consuming and there would be no predictable latencies from a request for the
DMA engine and the time the actual transfer starts.....
Drivers would heavilly depend on each other and the underlying device
hardware.

Regards,

Peter Marek
General Director
MarekMicro GmbH
Kropfersrichter Str. 6-8
D-92237 Sulzbach-Rosenberg
Germany
Phone: 049 - 9661 - 908 - 210
Fax:      049 - 9661 - 908 - 100
----- Original Message -----
From: Scott C. Karlin <scott@CS.Princeton.EDU>
To: <peter.marek@marekmicro.de>
Cc: <pci-sig@znyx.com>
Sent: Friday, December 15, 2000 3:32 PM
Subject: Re: How to achieve a long burst?


> Peter,
>
> I think we are in agreement here.  Please allow me to clarify
> and update my position a little bit.
>
> My experiment used 64-byte chunks because that is what my
> application needs.  Larger sized transfers will definitely
> be faster as they will have less overhead per byte.
>
> I've suspected that my DMA numbers were a little on the slow
> side.  I have received some suggestions on how I can improve
> the speed more (including what you suggest in point #3, below).
> I have received reports that the board I am using can perform
> DMA reads at 64 MByte/sec.  I obviously need to review all
> my settings to get a better understanding as to why my
> performance isn't as good as it could be.  If anyone on the
> list has some specific experience with the RAMiX PMC694, I'd
> like to start an off-line conversation to discuss this issue.
>
> My first reading of Ted Firlit's post made me think that he
> wanted a DMA engine on the motherboard because he thought a
> DMA engine would always be "better" than PIO (programmed I/O).
> My response was intended to point out (1) that PIO can be faster
> than DMA (for small transfers) and (2) that since PIO on some
> motherboard/CPU combinations is relatively fast (68 MByte/sec
> writes), one may discover that DMA is not needed for a particular
> application.
>
> After thinking about this some more, I think that Ted Firlit was
> pointing out that to get reasonable transfer rates from a PCI
> peripheral to the host's local memory, you either need a high-speed
> PCI DMA engine on the motherboard to read the data or you need the
> peripheral to be a master and write the data across the bus.
> Note that the peripheral does not need a DMA engine; it can also
> use a local CPU to perform the transfer.
>
> I'm all for putting a high-speed PCI DMA engine on the motherboard.
>
> Finally, here's what DMA means to me.  DMA is simply Direct
> Memory Access: something other than the processor in question
> directly accesses that processor's memory.  If I'm writing code
> on processor A, and data can either move within, leave, or enter
> A's local memory space without processor A directly using CPU
> instructions (such as reg->mem, mem->reg, or even mem->mem) then
> that's DMA.  It doesn't matter to me if the data was moved with
> a specialized state machine (traditional DMA engine) or with a
> CPU on a peripheral PCI board.
>
> Regards,
>
> Scott
>
>
> Peter Marek writes:
> >
> > you're comparing a horse and a cow in the scenario pointed out.
> > I have to throw my hat in for DMA transfers.
> >
> > Some of my experience:
> >
> > 1) You just get reasonable DMA performance if you do long bursts (4kB or
so)
> > 2) You need to use the right PCI commands during the transfer (Memory
Read
> > Line/Multiple and Memory Write and Invalidate)
> > 3) For small junks of data, host initiated transfers may work as you
pointed
> > out.
> > 4) PCI2PCI bridges kill DMA performance. If you could do the DMA
directly
> > from MPC8240 an with no 21554 in between you would receive much higher
> > bandwidth.
> > 5) We have implemented systems which transfer 80MB/s !!!sustained!!!,
not
> > peak using Bus master DMA.
> >
> >
> > Regards,
> >
> >
> > Peter Marek
> > General Director
> > MarekMicro GmbH
> > Kropfersrichter Str. 6-8
> > D-92237 Sulzbach-Rosenberg
> > Germany
> > Phone: 049 - 9661 - 908 - 210
> > Fax:      049 - 9661 - 908 - 100
> > ----- Original Message -----
> > From: Scott C. Karlin <scott@CS.Princeton.EDU>
> > To: <pci-sig@znyx.com>
> > Sent: Thursday, December 14, 2000 6:03 PM
> > Subject: Re: How to achieve a long burst?
> >
> >
> > > Maybe you don't need DMA on the host bridge...
> > >
> > > I did some recent measurements of data transfer across the PCI bus
> > > to and from a board with DMA capability.  The motherboard is an
> > > Intel CA810E with a 733MHz Pentium III.  The target PCI board is
> > > a RAMiX PMC694 which has a 266 MHz Motorola MPC8240 (PPC core with
> > > other goodies including a DMA engine) behind an Intel 21554 bridge.
> > > The PCI bus is 32 bits / 33 MHz.  I'm running Linux on the Pentium
> > > and "raw" code on the 8240.
> > >
> > > Here's what I measured:
> > >
> > > Bus Master    Mode    Direction   MBytes/sec (64-byte chunks)
> > > ===========   ====    =========   ===========================
> > > Pentium III   memcpy  read                3.98
> > > Pentium III   memcpy  write              68.70
> > > MPC8240       DMA     read               22.66
> > > MPC8240       DMA     write              34.19
> > >
> > > (memcpy is the gcc compiler supplied routine)
> > >
> > > The host bridge seems to be doing a good job of write combining.
> > >
> > > What this tells me is that for many applications the transfer
> > > speed of a host Pentium may be fast enough.  It also confirms
> > > what has been observed before: it is better to push data across
> > > the bus than to pull it.  Certainly, if you want more speed, you
> > > can implement a DMA controller on the host side; however, you'll
> > > always be limited by the slower device (master or target) of the
> > > transaction.  (DMA also has other advantages: you can overlap DMA
> > > transfers with other computation and you might be able to use a
> > > slower CPU as a result.)
> > >
> > > Scott
> > >
> > > Ted Firlit writes:
> > > >
> > > > Unfortunately, it appears that the bridge chips on PC motherboards
do
> > > > not implement DMA engines.  Therefore it falls to the peripherals to
> > > > implement DMA.. which also mandates they must be capable of becoming
the
> > > > PCI bus master.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > Scott C. Karlin                        Princeton University
> > > http://www.cs.princeton.edu/~scott     Department of Computer Science
> > > ---------------------------------------------------------------------