[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to achieve a long burst?




Hi,
The right place to put a set of DMA channels is not on CPU, but on first PCI
bridge from CPU.

If the DMA registers are defined in the PCI bridge space as a standard, they
are accessible to all system applications. All other things are done in
hardware, the latency is the same as any other PCI transactions. The bridge
will handle all necessary housekeepings.

As I said before, PCI is a highway, it is defined as a BUS system, like any
ISA, EISA, SCSI and so on. A DMA is a bus company that runs on any road
situations, no matter they are ISA, EISA, SCSI, PCI. If a DMA is defined in
a board, it's a private company and only can be used by the board
applications. If PCI bridge cantains DMA machines and become a standard, the
bus company would become a public company that all systems having PCI bus
and the modified bridge will be benefical.

I see a great potential technology advantage to install a set of DMA engines
in PCI bridge and they may be good materials for next PCI specifications
expansions.

The major advantages on PCI bus over ISA and EISA are that CPU will run with
the DMA machines in a cocurrent mode that is the main object of PCI
specifications.

Weng Tianxiang
Micro Memory Inc.
9540 Vassar Av.
Chatsworth, CA 91311
Phone: 818-998-0070, Fax: 818-998-4459



----- Original Message -----
From: Peter Marek <peter.marek@marekmicro.de>
To: <scott@CS.Princeton.EDU>
Cc: <pci-sig@znyx.com>
Sent: Friday, December 15, 2000 8:16 AM
Subject: Re: How to achieve a long burst?


> Hi Scott,
>
> I agree that putting a DMA engine on the CPU may be a way to speed target
> performance... if just a single device uses the central DMA engine.
> In case you integrate the DMA engine on the CPU, !!ALL!! device drivers
> would have access to it and would potentially make use of it. The device
> drivers had to arbitrate and synchronize access to the DMA engine before
> they could start any DMA transfer. Synchronization amongst drivers is time
> consuming and there would be no predictable latencies from a request for
the
> DMA engine and the time the actual transfer starts.....
> Drivers would heavilly depend on each other and the underlying device
> hardware.
>
> Regards,
>
> Peter Marek
> General Director
> MarekMicro GmbH
> Kropfersrichter Str. 6-8
> D-92237 Sulzbach-Rosenberg
> Germany
> Phone: 049 - 9661 - 908 - 210
> Fax:      049 - 9661 - 908 - 100
> ----- Original Message -----
> From: Scott C. Karlin <scott@CS.Princeton.EDU>
> To: <peter.marek@marekmicro.de>
> Cc: <pci-sig@znyx.com>
> Sent: Friday, December 15, 2000 3:32 PM
> Subject: Re: How to achieve a long burst?
>
>
> > Peter,
> >
> > I think we are in agreement here.  Please allow me to clarify
> > and update my position a little bit.
> >
> > My experiment used 64-byte chunks because that is what my
> > application needs.  Larger sized transfers will definitely
> > be faster as they will have less overhead per byte.
> >
> > I've suspected that my DMA numbers were a little on the slow
> > side.  I have received some suggestions on how I can improve
> > the speed more (including what you suggest in point #3, below).
> > I have received reports that the board I am using can perform
> > DMA reads at 64 MByte/sec.  I obviously need to review all
> > my settings to get a better understanding as to why my
> > performance isn't as good as it could be.  If anyone on the
> > list has some specific experience with the RAMiX PMC694, I'd
> > like to start an off-line conversation to discuss this issue.
> >
> > My first reading of Ted Firlit's post made me think that he
> > wanted a DMA engine on the motherboard because he thought a
> > DMA engine would always be "better" than PIO (programmed I/O).
> > My response was intended to point out (1) that PIO can be faster
> > than DMA (for small transfers) and (2) that since PIO on some
> > motherboard/CPU combinations is relatively fast (68 MByte/sec
> > writes), one may discover that DMA is not needed for a particular
> > application.
> >
> > After thinking about this some more, I think that Ted Firlit was
> > pointing out that to get reasonable transfer rates from a PCI
> > peripheral to the host's local memory, you either need a high-speed
> > PCI DMA engine on the motherboard to read the data or you need the
> > peripheral to be a master and write the data across the bus.
> > Note that the peripheral does not need a DMA engine; it can also
> > use a local CPU to perform the transfer.
> >
> > I'm all for putting a high-speed PCI DMA engine on the motherboard.
> >
> > Finally, here's what DMA means to me.  DMA is simply Direct
> > Memory Access: something other than the processor in question
> > directly accesses that processor's memory.  If I'm writing code
> > on processor A, and data can either move within, leave, or enter
> > A's local memory space without processor A directly using CPU
> > instructions (such as reg->mem, mem->reg, or even mem->mem) then
> > that's DMA.  It doesn't matter to me if the data was moved with
> > a specialized state machine (traditional DMA engine) or with a
> > CPU on a peripheral PCI board.
> >
> > Regards,
> >
> > Scott
> >
> >
> > Peter Marek writes:
> > >
> > > you're comparing a horse and a cow in the scenario pointed out.
> > > I have to throw my hat in for DMA transfers.
> > >
> > > Some of my experience:
> > >
> > > 1) You just get reasonable DMA performance if you do long bursts (4kB
or
> so)
> > > 2) You need to use the right PCI commands during the transfer (Memory
> Read
> > > Line/Multiple and Memory Write and Invalidate)
> > > 3) For small junks of data, host initiated transfers may work as you
> pointed
> > > out.
> > > 4) PCI2PCI bridges kill DMA performance. If you could do the DMA
> directly
> > > from MPC8240 an with no 21554 in between you would receive much higher
> > > bandwidth.
> > > 5) We have implemented systems which transfer 80MB/s !!!sustained!!!,
> not
> > > peak using Bus master DMA.
> > >
> > >
> > > Regards,
> > >
> > >
> > > Peter Marek
> > > General Director
> > > MarekMicro GmbH
> > > Kropfersrichter Str. 6-8
> > > D-92237 Sulzbach-Rosenberg
> > > Germany
> > > Phone: 049 - 9661 - 908 - 210
> > > Fax:      049 - 9661 - 908 - 100
> > > ----- Original Message -----
> > > From: Scott C. Karlin <scott@CS.Princeton.EDU>
> > > To: <pci-sig@znyx.com>
> > > Sent: Thursday, December 14, 2000 6:03 PM
> > > Subject: Re: How to achieve a long burst?
> > >
> > >
> > > > Maybe you don't need DMA on the host bridge...
> > > >
> > > > I did some recent measurements of data transfer across the PCI bus
> > > > to and from a board with DMA capability.  The motherboard is an
> > > > Intel CA810E with a 733MHz Pentium III.  The target PCI board is
> > > > a RAMiX PMC694 which has a 266 MHz Motorola MPC8240 (PPC core with
> > > > other goodies including a DMA engine) behind an Intel 21554 bridge.
> > > > The PCI bus is 32 bits / 33 MHz.  I'm running Linux on the Pentium
> > > > and "raw" code on the 8240.
> > > >
> > > > Here's what I measured:
> > > >
> > > > Bus Master    Mode    Direction   MBytes/sec (64-byte chunks)
> > > > ===========   ====    =========   ===========================
> > > > Pentium III   memcpy  read                3.98
> > > > Pentium III   memcpy  write              68.70
> > > > MPC8240       DMA     read               22.66
> > > > MPC8240       DMA     write              34.19
> > > >
> > > > (memcpy is the gcc compiler supplied routine)
> > > >
> > > > The host bridge seems to be doing a good job of write combining.
> > > >
> > > > What this tells me is that for many applications the transfer
> > > > speed of a host Pentium may be fast enough.  It also confirms
> > > > what has been observed before: it is better to push data across
> > > > the bus than to pull it.  Certainly, if you want more speed, you
> > > > can implement a DMA controller on the host side; however, you'll
> > > > always be limited by the slower device (master or target) of the
> > > > transaction.  (DMA also has other advantages: you can overlap DMA
> > > > transfers with other computation and you might be able to use a
> > > > slower CPU as a result.)
> > > >
> > > > Scott
> > > >
> > > > Ted Firlit writes:
> > > > >
> > > > > Unfortunately, it appears that the bridge chips on PC motherboards
> do
> > > > > not implement DMA engines.  Therefore it falls to the peripherals
to
> > > > > implement DMA.. which also mandates they must be capable of
becoming
> the
> > > > > PCI bus master.
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > Scott C. Karlin                        Princeton University
> > > > http://www.cs.princeton.edu/~scott     Department of Computer
Science
> > >
> ---------------------------------------------------------------------
>
>
>