[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Write Combining difference between 98 and NT
I have a PCI device to which I am writing data to large
regions of consecutive memory locations. The memory region is mapped using the
MTRR's as Write Combined memory by the device driver. The actual writes to
memory are performed by user level code. I have two versions of the driver - one
for Windows 98 and one for Windows NT 4.0. The user application that actually
performs the writes is the same for both OS's.
The application performs the writes in chunks, so each call to
a particular functions writes the next N words to the WC memory
region.
I am seeing a significant performance difference in the
bandwidth achieved in the two OS's. Further analysis of the writes being
performed accross the PCI bus revealed that the bursts occuring accross the bus
were significantly different.
Under Windows 98, you see the expected result, with bursts of
multiples of 8 words occuring accross the PCI bus as the WC buffers get flushed
once full.
Under Windows NT however, bursts do not seem to be generated
for WC blocks that are filled accross two calls to the function that writes data
to the region. In other words, if a particular call to the function does not
fill entirely a full WC buffer, then that buffer gets written out using between
1 and 4 partial writes according to how much data was actually written during
that call. It does not seem to wait until the next call to the function which
would have completed the WC buffer. Any WC buffer that does get completely
filled is written out as an 8 word burst thus indicating the WC is actually
enabled. Also, consecutive WC flushes never seem to get amalgamated into single
longer bursts of 16, 24, 32, etc words as is the case under Windows
98.
This seems to impact upon the performance by around
25%.
In my test code, the calls to the function are generated from
a FOR loop that simply calls the function on each itteration and does nothing
else, eg:
for (i = 0; i < 1000000; i++)
writeData();
No data is accessed by the writeData() function that would not
already be in the primary data cache.
The writeData() function is written in assembler and does not
contain any command that to my knowledge would force a WC buffer to get flushed.
I've checked the list of events in Intels documentation and can not see any that
would occur on each call to the function.
So my question is this - Does any one know what could be
causing the WC buffers to get flushed between successive calls to the function -
and why would this only occur under Windows NT 4.0 and not under Windows 98? Is
there a way to speed things up under NT?
FYI: I am using a 700MHz Intel PIII processor on a Tyan S1867
Thunder 2500 motherboard. This contains a ServerWorks ServerSet III HE Chipset.
I am dual booting the PC so both the test was performed on the same hardware
under both OS's. The speed difference was also noticed on a motherboard using
the Intel 440BX chipset.
Thanks for any help offered.
-Paul