[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ordering problem---response to comments



Thanks for all the feedback on the proposal to fix the ordering
problem in PCI 2.1.  Here are a few replies to the comments:

******** Comments from: koontz@netapp.com (Dave Koontz)

> >Master A issues a Delayed Read request to X
>  
> 3.1.1 Command Definition
>  
> PCI bus command encodings and types are listed below...
>  
> C/BE[3::0]#     Command Type
>  
> 0000            Interrupt Acknowledge
>  ...
> 1111		Memory Write and Invalidate
> 
> (There is no way for Master A to intentionally issue a Delayed Read request)

You're right, it should be "Delayable" rather than "Delayed"...

> 3.6 Exclusive Access
> 
> (Where you can have conflicting accesses you should implement locks )
> 
> .... However, in some system architectures the host bus bridge cannot
> guarantee exclusivity of the locked resource to a PCI master that uses
> LOCK#.  In these systems, device drivers that require exclusivity use a 
> software mechanism to guarantee exclusion and do not rely on LOCK#.
> 
> (This can be viewed as a memory coherency problem)
> 
> If you do have a system where LOCK# is honored, I would suggest multiple
> BARs for different contexts (Masters A and B).
> 
> Coherency schemes generally require knowledge of the size of the transfer.
> Yours assumes one datum.

I didn't understand this, nor how it relates to the proposal.
What are BARs?  Note that LOCK# is mostly intended for backwards
compatibility, according to Section 3.6.

******** Comments from: wei@netcom.com (Wei-Ti Liu)

> Hello Sir:
> Let us keep the way it is. You have a superb proposal but to make 
> everyone to modify the design and chips. It just does not worth it.
> Thank you. It is a good idea.
> W. Liu(10/25/96)

Would the proposal be more acceptable to you if its implementation was
optional?  The last section of the proposal explained how devices that
do not implement the proposal can be used with devices that do.

******** Comments from: kevin.normoyle@Eng.Sun.COM (Kevin Normoyle)

> 0) The problem with master_id's: Their finite size. Also what
> happens if you have a multi-function master. You probably want to create
> ordering within each function's requests, not within each master.
> 
> (two functions on one master could create the scenario you describe with
> producer/consumer/observer?)
> 
> 
> 2) If you ignore that: Could achieve your goal by distributing the GNT signals to all
> targets. The GNT could be tracked to create your master id.
> Would have a problem distributing this "virtual id" across bridges though.

In the proposal, the Master IDs are *local* to one bus.  It seems that
4 bits, which make it possible to distinguish among 16 masters on one
bus, would be enough.  I don't know what you mean by "distributing the
GNT signals".  The GNT signals are point-to-point, there is one from
the arbiter to each master.  In the proposal, the Master ID is
precisely the index of the asserted GNT# signal in the array of GNT#
signals...  So "The GNT could be tracked to create your master id" is
the solution in the proposal.

Multi-function masters are also discussed in the proposal.  Each
multi-function master would be responsible for not having two
outstanding Delayed (i.e. Delayable) transactions for the same address
at the same time.

******** From: png@woof.net (Peter N. Glaskowsky)

> I think this is a bad idea. The scenarios described can be avoided without
> adding such a painful extension to PCI. Only a single master should be
> allowed (that is, eligible) to access a single PCI target address at any
> one time. If multiple masters need to access the same target address, the
> system should provide some other mechanism to arbitrate between them by
> assigning "ownership" of the resource to one master at a time.

Why is the extension so painful?  Would it be less painful if it were
optional?  That is, targets for which it matters would take into
account the Master ID lines, others would be free to ignore it.  (More
details on how to mix devices that implement the solution and devices
that do not can be found at the end of the proposal.)  There would be
a robustness/cost tradeoff in deciding whether to support the Master
ID lines.

Limiting access to a target to one master at a time seems a very
severe limitation on a protocol which is otherwise nice and general,
except for the ordering problem.  We think that the ordering problem
is just a bug in the protocol that can be fixed with a minimal
change...

> It seems to me that some of the scenarios described wouldn't necessarily
> work right even if the proposed changes were made-- observers wouldn't
> observe things consistently, multiple controllers wouldn't necessarily see
> their commands executed in the right order, etc.

Could you be more specific?  We believe that all the scenarios would
work fine with the proposed solution.

******** From: kevin.normoyle@Eng.Sun.COM (Kevin Normoyle)

> I find it interesting to compare the PCI producer/consumer model
> with the more stringent memory models we have to meet for Sparc v-9.
> I've always been curious about what performance or complexity constraints
> are created by the memory model.
> 
> 
> * The producer/consumer model doesn't say anything about the behavior
> of multiple reads. All flags and data have one writer and one reader.
> 
> * Semaphores can be built without multiple readers to the same address, i.e. 
> Dekker's Algorithm. (basically two parallel producer/consumer threads are
> used to guarantee mutual exclusion. Random backoff is used to prevent deadlock)
> 
> 
> 
> So one could argue that since your error cases with delayed transactions
> necessarily need multiple readers, it's outside the minimum model 
> required by PCI.

Yes, Scenario 2 involves a Producer, a Consumer, and an Observer,
while the Producer/Consumer model only involves a Producer and a
Consumer.  However, the presence of an Observer doing a read without
side-effects should not upset things.  (And scenarios 1, 3 and 4 are
independent of the Producer/Consumer model.)

******** From: kimmel@dg-rtp.dg.com (Jeff Kimmel)

> Hi.  I haven't had a chance to read and digest your entire post yet,
> but you're hitting on a topic that I have also worried about.  At first
> blush, I would think your proposed enhancement is just what the doctor
> ordered for the multiple master problem.
> 
> Richard Schmitt, a colleague of mine, forwarded my ramblings on 2.1
> ordering topics to the reflector back in May.  I've attached that post
> for your information, and hope that you'll review it to see whether you
> anticipated and/or have a solution in mind for all the scenarios I've
> documented.  In particular, my Example 2 does not involve multiple
> masters reading the same location, so I suspect you have not addressed
> it.  I frankly don't see how to solve it without risking deadlocks with
> older P2Ps, but maybe you'll have some ideas.

It's nice to see that you had already thought about this issue, along
the same lines as us.

I believe your scenarios 1, 3 and 4 are similar to our scenario 2, and
would be fixed by the proposed solution.  Your scenario 2 is:

>      2. One processor is communicating with a DMA-capable device:
> 
>        a. The device attempts to read memory location M, and is
>           issued a retry response after creating a DRR.
> 
>        b. Memory services the DRR, creating a DRC in its place.
> 
>        c. The processor writes new data to M.
> 
>        d. The processor performs an OUT to register R on the
>           device, and is retried after creating a DWR.
> 
>        e. The DWR bypasses the DRC.
>        
>        f. The device services and observes the DWR (creating a DWC).
> 
>        g. The device retries its read of M, and claims the DRC
>           created in step (b).
> 
>       The processor's writes to M and R are observed out-of-order
>       by the device, because M was written before R, yet the device
>       observed new data for R followed by old data for M.

I think step f should be viewed as including an internal read by the
device to the location written by the DWR.  And steps a and g are part
of the same external read operation.  The external read cannot be
viewed as following the internal read, since the external read is
initiated earlier.  So we cannot say here that "the device observed
new data for R followed by old data for M."

More generally, this scenario raises the issue of whether two read
requests to the same address by the same master (steps a and g above)
are to be considered part of the same transaction or not.  The
proposal addresses this issued by requiring multifunction masters and
bridges to avoid having two outstanding read (or I/O port write)
requests to the same address at the same time.

******** From: "Thomas Schutt" <Thomas_Schutt@splashtech.com>

> This is ok. However it adds 4 pins to targets, 8 pins to bridges, and 
> doesn't address the fact that all these delayed transactions require the 
> masters to pole for completion. I would prefer to see a solution that 
> returns the data to the master without poling the bus. Here are two 
> suggestions.
> 
> Have the target return a transaction ID with TRDY on a read transaction 
> that will be delayed. The target then uses this ID to write the data to 
> the initiator when it is aquired. This does require hardware in the 
> initiator and the target. And does nothing for delayed writes.
> 
> Convert all bus traffic into write, or write request operatiosn. Then 
> let everthing complete in the order presented to the write buffers. 
> Propose a write request command to replace reads. The initiator would 
> send the address it wants the data to be written to. The target would do 
> a local read and write the data back to the initiator as a memory write 
> or memory write invalidate to the given address. A write that would 
> require the initiator to wait for completion would be implemented as a 
> write followed by a write request for the acknowledge. The bridges in 
> between would treat both transactions as posted writes. ( Actual I 
> thought Sun was going to propose a split transaction system over a year 
> ago. )
> 
> Basically I would like to see the bus move towards a posted write 
> operational model. And would rather not promote protocals that burn up 
> bus bandwidth with retries, and block bridges waiting for a master to 
> retry an operation.

These proposals are interesting, but they do not seem to be backwards
compatible.

******** From: dvj@apple.com (David V. James)

Thanks for your very positive response.

> The proposal from HP has correctly observed that this can occur
> in not-that-unusual software scenarios. I could add a few more
> examples, that include DMA adapters fetching and executing
> stale command entries. However, if the previous examples were
> insufficient to force a progression from the denial phase,
> I doubt that one (or a few) more examples would really help.

We'll ask for your help if more examples are needed!

> The potential remaining gotcha is to accurately specify timeouts,
> so that:
>   1) The response data is not held forever (if requester is reset).
>   2) The response data is discarded before a requester has the
>      chance to be reset and then generate a second matching request.

Thanks for pointing this out.  We'll look into it.

Francisco

-----------------------------------------------------------------------
| Hewlett-Packard Company      |                                      |
| Francisco Corella   M/S 5649 | Tel  : +1 916 785 3504               |
| 8000 Foothills Boulevard     | Fax  : +1 916 785 3096               |
| Roseville, CA 95747-5649     | Email: fcorella@rosemail.rose.hp.com |
| USA                          |                                      |
-----------------------------------------------------------------------


ÔØÅ