[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deterministic transactions from the local bus side



Hi Yves,

I give .... up on all the text...... a simple block-diagram for the HW 
and a simple flow-diagram for SW
would be very handy indeed.

Kindest regards,
Martijn Emons




Yves Chartier wrote:

> Thank you all for your comments.
>
> I would like to give more details about a possible configuration with 
> numbers:
>
> A. A typical setup would be to use 4 inexpensive 1U rack servers with 
> a single 64 bit PCI slot.
>
> B. That slot would be used by a board housing a custom link (ex: 16 
> bit at 66 MHz) aimed at feeding data to the image mixer through a flat 
> cable. That board could comprise a FIFO in order to obtain maximum 
> peak bandwidth from the 9656 or another master chip.
>
> C. The image mixer would be another 1U rack enclosure with 4 inputs, 
> each having a peak bandwidth of 128 MB/s.
>
> D. The output of the mixer would have to sustain 256 MB/s.
>
> E. The custom hardware image mixer would be managed by a real time OS 
> like QNX.
>
> 1. Overall, it seems quite easy because each system has to provide 64 
> MB/s sustained. There is enough power from all those 4 PCs. The only 
> problem is the determinism of the whole system. The image mixer 
> comprises also buffers made with SRAMs able to accommodate the 4 links 
> in the same time. With a ping pong or other image buffer 
> configurations, the determinism problem is shifted to a page level 
> which gives more time (ex: 10 milliseconds or more). But still, it has 
> to be managed properly.
>
> 2. From your comments, it seems for me that the Image Mixer Manager 
> (IMM) has to call the shots in the following manner:
>
>     * It send image processing requests to each image processing
>       server through a TCP/IP link.
>     * Whenever one image is processed, it is placed in the allocated
>       memory output area. I suppose there is a way with Windows NT to
>       block further access to that memory space until it will be read
>       by the IMM. I suppose also that, even with Windows NT, this
>       memory space may be set up as a physical contiguous segment (ex:
>       4 MB). Then a TCP/IP message is sent to the IMM mentioning the
>       start pointer and the length of the image. More than one image
>       may be stored in a queue to obtain an elastic buffer of few
>       images in order to keep an absolute determinism. We are not
>       concerned by the overall latency as long as when the process is
>       started, it stays clean.
>     * The IMM has then the responsibility to gather those images by
>       programming the 9656 from the local side.
>     * Once the reading of an image by the IMM is done, then the IMM
>       send a message to the processing server to unlock the previously
>       locked memory space.
>
> With this approach, the coupling between Windows NT and the real time 
> OS is reduced to managing few hundred bytes at the maximum, each 10 
> milliseconds or more. The only challenging constraints are then that 
> the custom hardware of the mixer has to operate at the required speed 
> and that the processing power of the servers have to fulfill the 
> imaging needs.
>
> 3. I took a look on the documentation of some Intel chipsets. It is 
> not obvious to answer simple questions. In the present case, if there 
> is a separate PCI 64 bus just for the custom link  in the processing 
> server, how can we figure out how the chipset will react about sharing 
> the host memory bandwidth with that PCI 64 bus ? A server chipset 
> usually has more than one PCI bus. It must also keep happy all the 
> internal peripherals as well.With those 2 GHz motherboards, we may 
> think that a required 64 MB/s sustained rate from a PCI 64 bus is 
> nothing compared to the overall bandwidth of the chipset. But let us 
> say that we want more bandwidth in order to reduce the number of PCs 
> from 4 to 3 or 2. How can we figure it out simply?
>
> Yves Chartier, Eng.
> Epsimage Inc.
> T 450.974.9109
> F 450.974.3628
>  
>  
>  
>  
>
> wengt wrote:
>
>> Hi,
>> I disagree with your opinions.
>>
>> A smart software driver scheme can make it happen in 'deterministic' 
>> way you
>> want.
>>
>> The following is the skeleton your system and driver may follow to 
>> get the
>> implementation order in the 'deterministic' way:
>> 1. The software driver queues all image requests in a list;
>> 2. Never issue second DMA operation instructions to start next DMA 
>> operation
>> until the previous DMA operation is finished;
>> 3. All DMA operations strictly follows PCI specifications, i.e., it will
>> occupy PCI bus until both conditions fail to meet:
>> Latency Timer is over and bus nGNT become not available;
>> 4. Keep as few devices on PCI bus as possible. In your case, there 
>> are no
>> other boards except one hard disk driver and your  many boards as you 
>> want.
>> Especially your system disconnect net-board when the system working;
>> 5. Select system with PCI-X or PCI-66/64 bus;
>> 6. Select system with the largest memory space and set page swap area 
>> for
>> operating system in disk to 0: prohibit operating system from exchanging
>> data to disk;
>> 7. Eliminate all unnecessary disk operations during boards working 
>> period or
>> delay them until the board don't work;
>> 8. Select best combination of CPU and PCI bridge;
>> 9. Select low latency of PCI chip;
>> 10. All boards share the same interrupt pin;
>>
>> Why:
>> 1. Issuing one DMA operation instructions a time is equivalent to 
>> reducing
>> the number of board on PCI to compete for PCI to 1;
>> 2. When PCI bus is only one board, the board can run as long clocks 
>> as it
>> wants, limited only by bridge capacity;
>>
>> Transactions map on PCI looks like this one:
>> System command to start one DMA;
>> The board starts 2K bytes transaction;
>> The board generates interrupt;
>> System command to start next board DMA;
>> ...
>>
>> Weng
>>
>> -----Original Message-----
>> From: Martijn Emons [mailto:martijn.emons@arcobel.nl]
>> Sent: Friday, December 06, 2002 12:51 AM
>> To: Yves Chartier
>> Cc: pci-sig@znyx.com
>> Subject: Re: Deterministic transactions from the local bus side
>>
>> Hi Yves,
>>
>> Sorry for trashing your party but here's the blund anwser:
>> * nothing involving  Windows-OS is 'deterministic'
>> * nothing involving a PCI-bus with multiple-masters (ie. multiple 9656)
>> is 'determistic'.
>>
>> The answer lies in 'determinstic-enough'.
>> We have designed PCI-card that uses PCI-master write/reads, the driver
>> supports multiple cards for
>>   image processing
>> Our experiences:
>> * A normal Windows/PC-system using multiple boards can handle more than
>> 2000 interrupts easily.
>> * tweaking the master-latency timer values has some effects when the
>> BIOS uses these values.
>>    It's better to read the BIOS-manuals of some PC to get a grip of what
>> the BIOS supports via the
>>     normal BIOS-settings and/or what you can tweak using editors like
>> WPCEDIT (search the internet).
>> * I can not imagine that your PCI-card will have no buffer-capability
>> whatsoever. So.... calculate:
>>
>>    Latency in your 'determinstic-reads' => the number of 9656-masters X
>> (burst-size/PCI-speed).
>>
>>    For MS-Windows, the scatter-gather entries in the kernel-mode driver
>> are 4kbyte max in combination
>>    with a normal PCI-bus performance (say 100 Mbyte/sec sustained ) this
>> will take 40 us to be transfered.
>>    A PC-system with, say five, 9656-masters means that it will take 5 x
>> 40 => 200 us delay
>>    before every card will get the opportunity to receive (and process)
>> 4kbyte of data.
>>    A normal bit of pixel-processing on 4kbyte of pixel data will take
>> roughly 80 us on a high-speed DSP/CPU
>>    (all depending on algoritm ofcourse), but image-mixing falls in this
>> ball-park figure.
>>
>>   Is this 'deteriminstic enough' for you? Only you can answer this 
>> question.
>>
>> -- 
>>
>> Kind regards, Martijn Emons
>>
>> - Designer Consultant -
>> Arcobel ASIC Design Centre B.V.
>> Hambakenwetering 1
>> 5231 DD 's-Hertogenbosch,
>> The Netherlands.
>> tel.:  +31 73 64 60 100
>> fax :  +31 73 64 60 115
>>
>> Yves Chartier wrote:
>>
>> > We are considering using multiple PLX PCI 9656s to link many rackmount
>> > PCs to a kind of image mixer.
>> >
>> > Each PC host CPU running a standard image processing application on
>> > Windows 2000 or XP will fill a big image buffer up to 4 GBytes in the
>> > host memory. Then a real time subsystem running on the local bus side
>> > of the 9656 will set up requests to read data from that buffer.
>> >
>> > Our wondering is about the fact that the reading has to be done in a
>> > *deterministic* manner, thus eliminating the need of big elastic
>> > buffer. When there is a request to read a given part of that buffer
>> > (ex: a  line of 2Kbytes), it has to be delivered immediately with
>> > almost no latency . I understand that the 9656 may be set up as a
>> > master, but what about its priority. I do not know enough about
>> > Windows 2000 or XP to see if there is a way to to give the absolute
>> > priority to the 9656. I would like to stay with Windows 2000 or XP
>> > instead of a real time OS  in order to be able to use standard image
>> > processing appplications.
>> >
>> > Is it possible to set up the arbitration to give absolute priority to
>> > the 9656 ?
>> >
>> > Thanks,
>> >
>> > Yves Chartier, Eng.
>> > Epsimage Inc.
>> > T 450.974.9109
>> > F 450.974.3628
>> >
>>
>> ____________________________________________________________________________ 
>>
>>



______________________________________________________________________________________
This outbound message from KPN has been checked for all known viruses by KPN MailScan
(IV-Scan), powered by MessageLabs.
For further information visit: http://www.veiliginternet.nl
______________________________________________________________________________________