[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Deterministic transactions from the local bus side
Hi Yves,
I give .... up on all the text...... a simple block-diagram for the HW
and a simple flow-diagram for SW
would be very handy indeed.
Kindest regards,
Martijn Emons
Yves Chartier wrote:
> Thank you all for your comments.
>
> I would like to give more details about a possible configuration with
> numbers:
>
> A. A typical setup would be to use 4 inexpensive 1U rack servers with
> a single 64 bit PCI slot.
>
> B. That slot would be used by a board housing a custom link (ex: 16
> bit at 66 MHz) aimed at feeding data to the image mixer through a flat
> cable. That board could comprise a FIFO in order to obtain maximum
> peak bandwidth from the 9656 or another master chip.
>
> C. The image mixer would be another 1U rack enclosure with 4 inputs,
> each having a peak bandwidth of 128 MB/s.
>
> D. The output of the mixer would have to sustain 256 MB/s.
>
> E. The custom hardware image mixer would be managed by a real time OS
> like QNX.
>
> 1. Overall, it seems quite easy because each system has to provide 64
> MB/s sustained. There is enough power from all those 4 PCs. The only
> problem is the determinism of the whole system. The image mixer
> comprises also buffers made with SRAMs able to accommodate the 4 links
> in the same time. With a ping pong or other image buffer
> configurations, the determinism problem is shifted to a page level
> which gives more time (ex: 10 milliseconds or more). But still, it has
> to be managed properly.
>
> 2. From your comments, it seems for me that the Image Mixer Manager
> (IMM) has to call the shots in the following manner:
>
> * It send image processing requests to each image processing
> server through a TCP/IP link.
> * Whenever one image is processed, it is placed in the allocated
> memory output area. I suppose there is a way with Windows NT to
> block further access to that memory space until it will be read
> by the IMM. I suppose also that, even with Windows NT, this
> memory space may be set up as a physical contiguous segment (ex:
> 4 MB). Then a TCP/IP message is sent to the IMM mentioning the
> start pointer and the length of the image. More than one image
> may be stored in a queue to obtain an elastic buffer of few
> images in order to keep an absolute determinism. We are not
> concerned by the overall latency as long as when the process is
> started, it stays clean.
> * The IMM has then the responsibility to gather those images by
> programming the 9656 from the local side.
> * Once the reading of an image by the IMM is done, then the IMM
> send a message to the processing server to unlock the previously
> locked memory space.
>
> With this approach, the coupling between Windows NT and the real time
> OS is reduced to managing few hundred bytes at the maximum, each 10
> milliseconds or more. The only challenging constraints are then that
> the custom hardware of the mixer has to operate at the required speed
> and that the processing power of the servers have to fulfill the
> imaging needs.
>
> 3. I took a look on the documentation of some Intel chipsets. It is
> not obvious to answer simple questions. In the present case, if there
> is a separate PCI 64 bus just for the custom link in the processing
> server, how can we figure out how the chipset will react about sharing
> the host memory bandwidth with that PCI 64 bus ? A server chipset
> usually has more than one PCI bus. It must also keep happy all the
> internal peripherals as well.With those 2 GHz motherboards, we may
> think that a required 64 MB/s sustained rate from a PCI 64 bus is
> nothing compared to the overall bandwidth of the chipset. But let us
> say that we want more bandwidth in order to reduce the number of PCs
> from 4 to 3 or 2. How can we figure it out simply?
>
> Yves Chartier, Eng.
> Epsimage Inc.
> T 450.974.9109
> F 450.974.3628
>
>
>
>
>
> wengt wrote:
>
>> Hi,
>> I disagree with your opinions.
>>
>> A smart software driver scheme can make it happen in 'deterministic'
>> way you
>> want.
>>
>> The following is the skeleton your system and driver may follow to
>> get the
>> implementation order in the 'deterministic' way:
>> 1. The software driver queues all image requests in a list;
>> 2. Never issue second DMA operation instructions to start next DMA
>> operation
>> until the previous DMA operation is finished;
>> 3. All DMA operations strictly follows PCI specifications, i.e., it will
>> occupy PCI bus until both conditions fail to meet:
>> Latency Timer is over and bus nGNT become not available;
>> 4. Keep as few devices on PCI bus as possible. In your case, there
>> are no
>> other boards except one hard disk driver and your many boards as you
>> want.
>> Especially your system disconnect net-board when the system working;
>> 5. Select system with PCI-X or PCI-66/64 bus;
>> 6. Select system with the largest memory space and set page swap area
>> for
>> operating system in disk to 0: prohibit operating system from exchanging
>> data to disk;
>> 7. Eliminate all unnecessary disk operations during boards working
>> period or
>> delay them until the board don't work;
>> 8. Select best combination of CPU and PCI bridge;
>> 9. Select low latency of PCI chip;
>> 10. All boards share the same interrupt pin;
>>
>> Why:
>> 1. Issuing one DMA operation instructions a time is equivalent to
>> reducing
>> the number of board on PCI to compete for PCI to 1;
>> 2. When PCI bus is only one board, the board can run as long clocks
>> as it
>> wants, limited only by bridge capacity;
>>
>> Transactions map on PCI looks like this one:
>> System command to start one DMA;
>> The board starts 2K bytes transaction;
>> The board generates interrupt;
>> System command to start next board DMA;
>> ...
>>
>> Weng
>>
>> -----Original Message-----
>> From: Martijn Emons [mailto:martijn.emons@arcobel.nl]
>> Sent: Friday, December 06, 2002 12:51 AM
>> To: Yves Chartier
>> Cc: pci-sig@znyx.com
>> Subject: Re: Deterministic transactions from the local bus side
>>
>> Hi Yves,
>>
>> Sorry for trashing your party but here's the blund anwser:
>> * nothing involving Windows-OS is 'deterministic'
>> * nothing involving a PCI-bus with multiple-masters (ie. multiple 9656)
>> is 'determistic'.
>>
>> The answer lies in 'determinstic-enough'.
>> We have designed PCI-card that uses PCI-master write/reads, the driver
>> supports multiple cards for
>> image processing
>> Our experiences:
>> * A normal Windows/PC-system using multiple boards can handle more than
>> 2000 interrupts easily.
>> * tweaking the master-latency timer values has some effects when the
>> BIOS uses these values.
>> It's better to read the BIOS-manuals of some PC to get a grip of what
>> the BIOS supports via the
>> normal BIOS-settings and/or what you can tweak using editors like
>> WPCEDIT (search the internet).
>> * I can not imagine that your PCI-card will have no buffer-capability
>> whatsoever. So.... calculate:
>>
>> Latency in your 'determinstic-reads' => the number of 9656-masters X
>> (burst-size/PCI-speed).
>>
>> For MS-Windows, the scatter-gather entries in the kernel-mode driver
>> are 4kbyte max in combination
>> with a normal PCI-bus performance (say 100 Mbyte/sec sustained ) this
>> will take 40 us to be transfered.
>> A PC-system with, say five, 9656-masters means that it will take 5 x
>> 40 => 200 us delay
>> before every card will get the opportunity to receive (and process)
>> 4kbyte of data.
>> A normal bit of pixel-processing on 4kbyte of pixel data will take
>> roughly 80 us on a high-speed DSP/CPU
>> (all depending on algoritm ofcourse), but image-mixing falls in this
>> ball-park figure.
>>
>> Is this 'deteriminstic enough' for you? Only you can answer this
>> question.
>>
>> --
>>
>> Kind regards, Martijn Emons
>>
>> - Designer Consultant -
>> Arcobel ASIC Design Centre B.V.
>> Hambakenwetering 1
>> 5231 DD 's-Hertogenbosch,
>> The Netherlands.
>> tel.: +31 73 64 60 100
>> fax : +31 73 64 60 115
>>
>> Yves Chartier wrote:
>>
>> > We are considering using multiple PLX PCI 9656s to link many rackmount
>> > PCs to a kind of image mixer.
>> >
>> > Each PC host CPU running a standard image processing application on
>> > Windows 2000 or XP will fill a big image buffer up to 4 GBytes in the
>> > host memory. Then a real time subsystem running on the local bus side
>> > of the 9656 will set up requests to read data from that buffer.
>> >
>> > Our wondering is about the fact that the reading has to be done in a
>> > *deterministic* manner, thus eliminating the need of big elastic
>> > buffer. When there is a request to read a given part of that buffer
>> > (ex: a line of 2Kbytes), it has to be delivered immediately with
>> > almost no latency . I understand that the 9656 may be set up as a
>> > master, but what about its priority. I do not know enough about
>> > Windows 2000 or XP to see if there is a way to to give the absolute
>> > priority to the 9656. I would like to stay with Windows 2000 or XP
>> > instead of a real time OS in order to be able to use standard image
>> > processing appplications.
>> >
>> > Is it possible to set up the arbitration to give absolute priority to
>> > the 9656 ?
>> >
>> > Thanks,
>> >
>> > Yves Chartier, Eng.
>> > Epsimage Inc.
>> > T 450.974.9109
>> > F 450.974.3628
>> >
>>
>> ____________________________________________________________________________
>>
>>
______________________________________________________________________________________
This outbound message from KPN has been checked for all known viruses by KPN MailScan
(IV-Scan), powered by MessageLabs.
For further information visit: http://www.veiliginternet.nl
______________________________________________________________________________________