[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 440FX PCI CHIPSET performance



Here are a couple of things to look at on these problems.

Dave New wrote:

> Moving to a 200 MHz PPro single CPU EDO machine, running the same
> instruction sequence produces an arbitrarily long burst, but
> it actually wastes a lot of PCI bus bandwidth, because the host
> bridge is inserting wait states (deasserting IRDY) about 50-75%
> of the time during the multiple data phase burst.  The result
> is overall poorer throughput, while the PCI bus is essentially
> saturated by this activity.  I can't believe that a 200 MHz
> (66 MHz local bus) 64-bit PPro can't keep a host bridge that
> is talking to a 33 MHz 32-bit PCI bus busy.

I would guess that you are running into some of the PPro performance
weaknesses. Try a loop or in-line code instead of the REP to see if it
makes any difference. My memory is getting weak on this but from some
testing I did about 18 months ago I seem to remember that the loop was
the fastest and the REP was the slowest. The PPro does have a few
"interesting" behaviors related to performance.


Graeme Gill wrote:
> I have a very similar experience. I have a PCI mastering card that is
> capable of testing sustained PCI accesses at rates between 1 and
> 133 mbytes/sec, using full speed MRM/MWI bursts of up to 256 cycles.
> (It uses the PLX 9060SD chip)
> .........................
> For much of the time these systems seem to have comparable (or better)
> performance to the 430VX based system, but for certain periods of time,
> (e.g. when there is any slight CPU activity) available PCI <-> DRAM
> bandwidth drops alarmingly.
> 
>...........................
> During these periods I see the following activity on the PCI bus:
> 
> Start:
>   Request PCI bus, and are granted it in 1 cycle
>   Assert FRAME and IRDY
>   After 31 cycles, the 440FX asserts STOP for 2 cycles [ Retry ]
> 
>   De-assert FRAME, IRDY and REQ.
> 
>   Request PCI bus, and are granted it in 1 cycle
>   Assert FRAME and IRDY
>   After 17 cycles, the 440FX asserts TRDY for 8 cycles [Transfer 8 32 bit words]
>   TRDY goes inactive for 2 cycles.
>   The 440FX asserts STOP for 2 cycles. [ Retry ]
> 
>   De-assert FRAME, IRDY and REQ.
> 
>   Loop to start.

You don't say but I assume that you are performing reads here. I will
also assume that you double-checked to be sure that you are using the
MRM command. Given that I am a little bit confused about what is going
on. The first read is is treated as a delayed read request. Then when
the master comes back it is treated as a delayed read completion
although it is an awfully long latency for the read (31+17+idle time or
~1.5 us). This would seem to imply that the P6 bus if VERY busy and
there were several bus transactions queued up in front of it.

It also is a little bit surprising that only one cache line of data is
delivered. It should always deliver at least 2 cache lines of data when
a MRM command is used. This actually sounds more like the behavior of a
standard read command.

My first cut at this is that the P6 bus is much more than "slightly"
busy when this occurs. But this still does not explain why only 32 bytes
are delivered. Any of you current Intel guys want to take a cut at this
one?

-Bruce Young
On the net I speak for myself, not Gateway 2000 (or Intel or anyone
else)

