[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Write Combining difference between 98 and NT



 
 
I have a PCI device to which I am writing data to large regions of consecutive memory locations. The memory region is mapped using the MTRR's as Write Combined memory by the device driver. The actual writes to memory are performed by user level code. I have two versions of the driver - one for Windows 98 and one for Windows NT 4.0. The user application that actually performs the writes is the same for both OS's.
 
The application performs the writes in chunks, so each call to a particular functions writes the next N words to the WC memory region.
 
I am seeing a significant performance difference in the bandwidth achieved in the two OS's. Further analysis of the writes being performed accross the PCI bus revealed that the bursts occuring accross the bus were significantly different.
 
Under Windows 98, you see the expected result, with bursts of multiples of 8 words occuring accross the PCI bus as the WC buffers get flushed once full.
 
Under Windows NT however, bursts do not seem to be generated for WC blocks that are filled accross two calls to the function that writes data to the region. In other words, if a particular call to the function does not fill entirely a full WC buffer, then that buffer gets written out using between 1 and 4 partial writes according to how much data was actually written during that call. It does not seem to wait until the next call to the function which would have completed the WC buffer. Any WC buffer that does get completely filled is written out as an 8 word burst thus indicating the WC is actually enabled. Also, consecutive WC flushes never seem to get amalgamated into single longer bursts of 16, 24, 32, etc words as is the case under Windows 98.
 
This seems to impact upon the performance by around 25%.
 
In my test code, the calls to the function are generated from a FOR loop that simply calls the function on each itteration and does nothing else, eg:
 
for (i = 0; i < 1000000; i++)
    writeData();
 
No data is accessed by the writeData() function that would not already be in the primary data cache.
 
The writeData() function is written in assembler and does not contain any command that to my knowledge would force a WC buffer to get flushed. I've checked the list of events in Intels documentation and can not see any that would occur on each call to the function.
 
So my question is this - Does any one know what could be causing the WC buffers to get flushed between successive calls to the function - and why would this only occur under Windows NT 4.0 and not under Windows 98? Is there a way to speed things up under NT?
 
FYI: I am using a 700MHz Intel PIII processor on a Tyan S1867 Thunder 2500 motherboard. This contains a ServerWorks ServerSet III HE Chipset. I am dual booting the PC so both the test was performed on the same hardware under both OS's. The speed difference was also noticed on a motherboard using the Intel 440BX chipset.
 
Thanks for any help offered.
 
-Paul