[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "out-of-context" delayed completions
- To: Mailing List Recipients <pci-sig-request@znyx.com>
- Subject: Re: "out-of-context" delayed completions
- From: Joe Cowan <jpc@hpmckee.fc.hp.com>
- Date: Mon, 27 Jan 97 14:10:09 MST
- In-Reply-To: <199701271732.RAA04895@numoe.corollary.com>; from "David O'Shea" at Jan 27, 97 5:32 pm
- Mailer: Elm [revision: 70.85]
- Resent-Date: Mon, 27 Jan 97 14:10:09 MST
- Resent-From: pci-sig-request@znyx.com
- Resent-Message-Id: <"Uqftc3.0.ji5.zaHxo"@dart>
- Resent-Sender: pci-sig-request@znyx.com
Responding to David O'Shea:
> See my not very helpful comments below.
To the contrary, they confirmed what I assumed would be the answer(s).
Thanks.
> Your device is broken and non-compliant to the 2.1 revision of the
> specification. The specification says pretty clearly that a 2.1
> compliant device MUST retry FOREVER any RETRIED transaction specifically
> so that the PPB or HB will not hang until such time as the request is
> finally completed by the HB or PPB. ...
My question arose from internal host platform architecture discussions
on ordering requirements, not from a specific device. It was reported
that certain SCSI drivers on our current non-PCI platforms occasionally
need to reset the device as a work-around for "hangs" that occur in
certain large configurations. There was speculation that standard NT
PCI drivers, which we as a platform provider can't always modify, might
do similar things, and we want our future platforms to handle reasonable
cases correctly.
> ... The only exception is when
> the ENTIRE PCI bus is reset using PCIRST#. In that case, the HB
> or PPB is also reset and forget about the uncompleted Delayed Request.
>From the perspective of the platform needing to handle the reset case, I
like this answer.
> To make the device compliant, your programmed I/O, vendor specific
> "reset" would have to wait until all outstanding requests which have
> been retried have been completed, and then undergoe the internal reset
> operations of the device. You are not allowed to just drop the
> attempted request in mid-stream even if your device defines a "self-
> reset" condition port. The PCI bus does not recognize that a device
> is ever allowed to just forget about transactions unless the entire
> bus is reset using PCIRST#.
I do have a concern with the "2.1 compliance" argument, though. PCI
2.0-compliant devices know nothing about delayed transactions, but they
obviously know to reattempt a transaction that's retried. I can imagine
a hypothetical case where the driver of a 2.0-compliant device might see
no need for the device to "complete" a transaction that's been retried,
and think it can safely reset the device without resetting the PCI bus,
perhaps even claiming that the behavior is "2.0 compliant". Having the
device "work correctly" on existing 2.0-HBs, but "fail" on our 2.1-HBs
is not a situation I want to be in.
> In short, there is no way to overcome this problem. Your device
> driver should never reset your device in this manner if the behavior
> of the device would be to forget transactions. As you point out,
> and as has been demonstrated on a few known systems already, the bus
> would hang in this situation, or your device would receive incorrect
> data.
The main concern I have is if a significant number of PCI chip designers
and driver writers don't adhere to this policy.
> IF AND ONLY IF the HB or PPB device designer used some forsight, then
> that designer took advantage of the rule that allows the HB or PPB
> to drop the outstanding Delayed Request after 2**15 PCI clocks. In
> that case, you could feel safe if after "internally reseting" your
> device, you then waited around for 2**15 clocks before re-enabling
> your device. This has two problems. First, the HB and PPB are
> NOT required to dump the transaction after 2**15 clocks, as opposed
> to the DEFINITE requirement that your device never stop retrying the
> transaction, and second 2**15 clocks is a really long time.
BTW, the main case of concern is with devices doing DMA accesses to host
memory, which is prefetchable. Since no loss of data is involved, HBs
and PPBs are allowed to discard DRCs with prefetchable data earlier than
the 2**15 cycle limit for non-prefetchable data. However, as you point
out, there apparently is no requirement that HBs or PPBs discard delayed
request completions within some specified limit. Thus, a driver can't
wait some specific amount of time, and be guaranteed that any
"out-of-context" delayed request would have been discarded by then.
As far as 2**15 clocks being a "really long time", I calculate it to be
about one millisecond at 33 MHz. So, for the "occasionally hung SCSI"
case I described above, I imagine that some drivers might be better off
attempting this "quiet reset" approach before they resort to resetting
the entire bus, possibly creating a much larger disruption to the
overall system. Do folks see show-stopper problems with this?
> No easy solutions here.
Agreed!
Thanks again for the comments.
Joe Cowan
*----------------------------------------------------------------------*
| JOE COWAN |
| System Architecture and Design Lab |
| Email: jpc@fc.hp.com Joe Cowan - MS B5 |
| Work: (+1) 970 229-2404 | (T)229-2404 Hewlett-Packard Company |
| Fax: (+1) 970 229-3197 | (T)229-3197 3404 East Harmony Road |
| Fort Collins, CO 80525 USA |
*----------------------------------------------------------------------*
+ õ