[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: long burst writes under NT 4.0?



Hello Jim,

>The KRF WinDriver code sets up the memory mappings for the PLX 9050’s
configuration registers and local address space, but >otherwise just calls
memcpy() to do block writes and relies on the underlying software and hardware
configuration to turn these >block operations into burst transfers.

Correct.

You can also perform the memcpy() yourself, directly in user-mode, using
dwUserDirectAddr as the BAR's base virtual address in user-mode.

This way you have no intervention of kernel-mode code in the memory transfer
itself.

> large Cache Line Size instead of just a ‘Memory Write’ command.  But they
didn’t think that this was something that the KRF 
> WinDriver would support directly and didn’t have any insights into how one
might get this operation from NT.  And in any case the 

As I wrote above, you can write the assembly code that performs the transfer
yourself, in user-mode, without calling WD_Transfer() function.

Best regards,
Derry 
______________________________________
KRFTech R&D
email: derry@krftech.com
Phone: 1-877-514-0537(USA)  +972-9-8859365(Worldwide), ext. 105
Fax: 1-877-514-0538(USA)  +972-9-8859366(Worldwide)

At 11:23 AM 2/20/00 +0200, you wrote: 
Resent-Date: Thu, 17 Feb 2000 09:14:47 -0800
To: pci-sig@znyx.com
From: Jim Foote <foote@parc.xerox.com>
Subject: long burst writes under NT 4.0?
Date: Thu, 17 Feb 2000 09:11:17 PST
Resent-From: pci-sig@znyx.com
Resent-Sender: pci-sig-request@znyx.com
X-MBF-FILE: MDaemon Gateway to RFC822 (RFC822.MBF v1.0)

Apologies up front for the length of this message, and if it's the wrong
forum.  I’ve tried to summarize it at the top and leave the gory details for
later.  The whole message is available at
ftp://parcftp.parc.xerox.com/transient/foote/BurstDebug.htm 

Bottom line question:  Is there a way to get long PCI combining write bursts
out of NT 4.0? 

Or, stated differently, what’s the right way to track down the cause of short
PCI target burst writes?  I suspect that I’ve misconfigured my system in some
way that it’s convinced that it can’t burst more than 4 Lwords at a time  I’m
just stumped as to how I go about tracking down the misconfiguration.

Short summary:  I’m trying to figure out why my Intel 440BX or VIA MPV3 based
systems won’t burst more than 4 Lwords at a time out of main memory and through
a PLX 9050 target bridge chip to a static ram buffer.  With a logic analyzer I
can see that bursts are happening, but the 440BX is toggling #FRAME after at
most 4 Lwords even though the 9050 is holding STOP# high.  What I’m trying to
figure out is what’s possessing the 440BX to limit the size of its combining
bursts to 4 Lwords.

Ugly details:

Test setup:
Tyan S1830 Tsunami AT motherboard with 500Mhz PIII and 256MB memory
http://www.tyan.com/products/html/s1830s_sl.html
Adaptec PCI SCSI adapter
PLX 9050 RDK prototyping board
http://www.plxtech.com/products/9050/briefs/9050rdk.pdf 
Windows NT 4.0 service pack 6a
KRF’s WinDriver 4.0 for the PLX 9050 http://www.krftech.com/windriver.html 
HP 15400C logic analyzer monitoring PCI bus and PLX 9050 local bus at 4ns
sampling
AMI bios version 1112991500; some of the potentially relevant BIOS settings:
        Advanced Chipset settings:
                Master Latency Timer (Clks): 224
                Multi-Trans Timer (Clks): 224
                PCI1 to PCI0 access: disabled
                PIIx4 Passive Release: enabled
                PIIx4 Delayed Transaction: disabled
        Plug & Play Settings:
                PCI Latency Timer (PCI Clocks): 224

(So here I’ve set the various latency timers to the maximum values that the
bios setup program will allow me to enter).

The KRF WinDriver code sets up the memory mappings for the PLX 9050’s
configuration registers and local address space, but otherwise just calls
memcpy() to do block writes and relies on the underlying software and hardware
configuration to turn these block operations into burst transfers.

Here’s the relevant section of the test code using the WinDriver API:

static void WriteDBlock (P9050_HANDLE hPlx, DWORD dwOffset, PVOID buf,   
DWORD dwBytes, P9050_ADDR addrSpace, P9050_MODE mode) 





{
    WD_TRANSFER trans;
    DWORD dwAddr = hPlx->addrDesc[addrSpace].dwAddr + dwOffset;

    BZERO(trans);
    if (!hPlx->addrDesc[addrSpace].fIsMemory) 
                exit(-1);  /* error if it's not memory */

    trans.cmdTrans = WM_SDWORD;
    trans.dwPort = dwAddr;
    trans.fAutoinc = TRUE;
    trans.dwBytes = dwBytes;
    trans.dwOptions = 0;
    trans.Data.pBuffer = buf;
    WD_Transfer (hPlx->hWD, &trans);  /* This boils down to an memcpy() (?) */
}

static void TestBurstWrite(burstObjHandle p)
{
        /* RDK has 128 Kbytes of static ram.  Use that static ram to simulate
           a FIFO and do burst writes in target mode. */
        unsigned i, bufSize = 64*1024;
        DWORD inputVal = 0x55555555;
        char *writeBufPtr = (char *)malloc(bufSize);
        
        (void)memset((void *)writeBufPtr, inputVal, bufSize);

        WriteDBlock (p->hPlx, 0, (PVOID)writeBufPtr, 
                bufSize, P9050_ADDR_SPACE2, P9050_MODE_DWORD);

}


When I run this test case I get the logic analyzer trace shown in:
ftp://parcftp.parc.xerox.com/transient/foote/brst_fst.jpg

This trace shows that the 440BX is indeed doing combining bursts, but it is
only bursting at most 4 Lwords between toggles of FRAME#, at a net bandwidth
of  about 50Mbytes/second. The 9050 is never asserting STOP#, so as far as I
can tell the 9050 would happily accept larger bursts if only the 440BX would
send them.

PLX’s technical support claims that they’ve been able to use a two card
configuration of a PLX 9054 RDK board acting as a bus mastering DMA initiator
to send streams to a PLX 9050 RDK in target mode at over 120Mbytes/sec, so it
seems that the 9050 is able to sink fast streams if only I could convince the
440BX to send them.

I’ve tried the same setup with a Tyan s1590s Trinity AT motherboard with an AMD
K6-III 450Mhz cpu and a VIA MPV3 chipset 
http://www.tyan.com/products/html/s1590s.html and seen virtually the identical
result.  So my intuition is that either I’ve somehow misconfigured both of the
different BIOSs on the two motherboards, or more likely, the Windows NT 4.0
drivers that WinDriver is layered on aren’t properly configured to support
longer PCI target mode bursts.

Neither PLX technical support nor KRF’s technical support have been able to
suggest a fix or a debugging strategy.

So what’s the right debugging strategy from here?

For example, is it possible that NT drivers are resetting the latency timers
from what the bios sets them to at startup?  Seems unlikely, but if I knew how
to read these registers from NT I guess I could verify that the before and
after values were the same.

PLX tech support suggested that I might be able to get longer bursts by using a
‘Memory Write and Invalidate’ command and a large Cache Line Size instead of
just a ‘Memory Write’ command.  But they didn’t think that this was something
that the KRF WinDriver would support directly and didn’t have any insights into
how one might get this operation from NT.  And in any case the Intel 440BX data
sheet says in section 3.3.3 that Bit 4 of the PCI Command Register is hardwired
to disable this command.  And for that matter, the PLX 9050 data sheet says the
same thing for its PCI Configuration ID register in its section 4.2.1.

I found an article on Intel’s web site
(http://developer.intel.com/design/PentiumII/applnots/24442201.pdf) that gives
hints to miniport device driver writers on flags to pass to
VideoPortMapMemory() and VideoPortGetDeviceBase() to enable Write Combining
memory.  But this seems specific to Video adapters  it’s not clear how I’d use
this for a more vanilla data mover.



If somebody knows an obvious fix I sure wouldn’t turn it down!  But mostly I’m
looking for a debugging strategy that might help me narrow down the cause, or a
pointer to the right set of documentation to tell me how to enable longer write
bursts.

Thanks,

Jim Foote
foote@parc.xerox.com