[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
long burst writes under NT 4.0?
Apologies up front for the length of this
message, and if it's the wrong forum. I’ve tried to summarize it at
the top and leave the gory details for later. The whole message is
available at
ftp://parcftp.parc.xerox.com/transient/foote/BurstDebug.htm
Bottom line question: Is there a way to get long PCI combining write bursts out of NT 4.0?
Or, stated differently, what’s the right way to track down the cause of short PCI target burst writes? I suspect that I’ve misconfigured my system in some way that it’s convinced that it can’t burst more than 4 Lwords at a time I’m just stumped as to how I go about tracking down the misconfiguration.
Short summary: I’m trying to figure out why my Intel 440BX or VIA MPV3 based systems won’t burst more than 4 Lwords at a time out of main memory and through a PLX 9050 target bridge chip to a static ram buffer. With a logic analyzer I can see that bursts are happening, but the 440BX is toggling #FRAME after at most 4 Lwords even though the 9050 is holding STOP# high. What I’m trying to figure out is what’s possessing the 440BX to limit the size of its combining bursts to 4 Lwords.
Ugly details:
Test setup:
Tyan S1830 Tsunami AT motherboard with 500Mhz PIII and 256MB memory http://www.tyan.com/products/html/s1830s_sl.html
Adaptec PCI SCSI adapter
PLX 9050 RDK prototyping board http://www.plxtech.com/products/9050/briefs/9050rdk.pdf
Windows NT 4.0 service pack 6a
KRF’s WinDriver 4.0 for the PLX 9050 http://www.krftech.com/windriver.html
HP 15400C logic analyzer monitoring PCI bus and PLX 9050 local bus at 4ns sampling
AMI bios version 1112991500; some of the potentially relevant BIOS settings:
Advanced Chipset settings:
Master Latency Timer (Clks): 224
Multi-Trans Timer (Clks): 224
PCI1 to PCI0 access: disabled
PIIx4 Passive Release: enabled
PIIx4 Delayed Transaction: disabled
Plug & Play Settings:
PCI Latency Timer (PCI Clocks): 224
(So here I’ve set the various latency timers to the maximum values that the bios setup program will allow me to enter).
The KRF WinDriver code sets up the memory mappings for the PLX 9050’s configuration registers and local address space, but otherwise just calls memcpy() to do block writes and relies on the underlying software and hardware configuration to turn these block operations into burst transfers.
Here’s the relevant section of the test code using the WinDriver API:
static void WriteDBlock (P9050_HANDLE hPlx, DWORD dwOffset, PVOID buf,
- DWORD dwBytes, P9050_ADDR addrSpace, P9050_MODE mode)
{
WD_TRANSFER trans;
DWORD dwAddr = hPlx->addrDesc[addrSpace].dwAddr + dwOffset;
BZERO(trans);
if (!hPlx->addrDesc[addrSpace].fIsMemory)
exit(-1); /* error if it's not memory */
trans.cmdTrans = WM_SDWORD;
trans.dwPort = dwAddr;
trans.fAutoinc = TRUE;
trans.dwBytes = dwBytes;
trans.dwOptions = 0;
trans.Data.pBuffer = buf;
WD_Transfer (hPlx->hWD, &trans); /* This boils down to an memcpy() (?) */
}
static void TestBurstWrite(burstObjHandle p)
{
/* RDK has 128 Kbytes of static ram. Use that static ram to simulate
a FIFO and do burst writes in target mode. */
unsigned i, bufSize = 64*1024;
DWORD inputVal = 0x55555555;
char *writeBufPtr = (char *)malloc(bufSize);
(void)memset((void *)writeBufPtr, inputVal, bufSize);
WriteDBlock (p->hPlx, 0, (PVOID)writeBufPtr,
bufSize, P9050_ADDR_SPACE2, P9050_MODE_DWORD);
}
When I run this test case I get the logic analyzer trace shown in: ftp://parcftp.parc.xerox.com/transient/foote/brst_fst.jpg
This trace shows that the 440BX is indeed doing combining bursts, but it is only bursting at most 4 Lwords between toggles of FRAME#, at a net bandwidth of about 50Mbytes/second. The 9050 is never asserting STOP#, so as far as I can tell the 9050 would happily accept larger bursts if only the 440BX would send them.
PLX’s technical support claims that they’ve been able to use a two card configuration of a PLX 9054 RDK board acting as a bus mastering DMA initiator to send streams to a PLX 9050 RDK in target mode at over 120Mbytes/sec, so it seems that the 9050 is able to sink fast streams if only I could convince the 440BX to send them.
I’ve tried the same setup with a Tyan s1590s Trinity AT motherboard with an AMD K6-III 450Mhz cpu and a VIA MPV3 chipset http://www.tyan.com/products/html/s1590s.html and seen virtually the identical result. So my intuition is that either I’ve somehow misconfigured both of the different BIOSs on the two motherboards, or more likely, the Windows NT 4.0 drivers that WinDriver is layered on aren’t properly configured to support longer PCI target mode bursts.
Neither PLX technical support nor KRF’s technical support have been able to suggest a fix or a debugging strategy.
So what’s the right debugging strategy from here?
For example, is it possible that NT drivers are resetting the latency timers from what the bios sets them to at startup? Seems unlikely, but if I knew how to read these registers from NT I guess I could verify that the before and after values were the same.
PLX tech support suggested that I might be able to get longer bursts by using a ‘Memory Write and Invalidate’ command and a large Cache Line Size instead of just a ‘Memory Write’ command. But they didn’t think that this was something that the KRF WinDriver would support directly and didn’t have any insights into how one might get this operation from NT. And in any case the Intel 440BX data sheet says in section 3.3.3 that Bit 4 of the PCI Command Register is hardwired to disable this command. And for that matter, the PLX 9050 data sheet says the same thing for its PCI Configuration ID register in its section 4.2.1.
I found an article on Intel’s web site (http://developer.intel.com/design/PentiumII/applnots/24442201.pdf) that gives hints to miniport device driver writers on flags to pass to VideoPortMapMemory() and VideoPortGetDeviceBase() to enable Write Combining memory. But this seems specific to Video adapters it’s not clear how I’d use this for a more vanilla data mover.
If somebody knows an obvious fix I sure wouldn’t turn it down! But mostly I’m looking for a debugging strategy that might help me narrow down the cause, or a pointer to the right set of documentation to tell me how to enable longer write bursts.
Thanks,
Jim Foote
foote@parc.xerox.com