[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deterministic transactions from the local bus side



Thank you all for your comments.

I would like to give more details about a possible configuration with numbers:

A. A typical setup would be to use 4 inexpensive 1U rack servers with a single 64 bit PCI slot.

B. That slot would be used by a board housing a custom link (ex: 16 bit at 66 MHz) aimed at feeding data to the image mixer through a flat cable. That board could comprise a FIFO in order to obtain maximum peak bandwidth from the 9656 or another master chip.

C. The image mixer would be another 1U rack enclosure with 4 inputs, each having a peak bandwidth of 128 MB/s.

D. The output of the mixer would have to sustain 256 MB/s.

E. The custom hardware image mixer would be managed by a real time OS like QNX.

1. Overall, it seems quite easy because each system has to provide 64 MB/s sustained. There is enough power from all those 4 PCs. The only problem is the determinism of the whole system. The image mixer comprises also buffers made with SRAMs able to accommodate the 4 links in the same time. With a ping pong or other image buffer configurations, the determinism problem is shifted to a page level which gives more time (ex: 10 milliseconds or more). But still, it has to be managed properly.

2. From your comments, it seems for me that the Image Mixer Manager (IMM) has to call the shots in the following manner:

With this approach, the coupling between Windows NT and the real time OS is reduced to managing few hundred bytes at the maximum, each 10 milliseconds or more. The only challenging constraints are then that the custom hardware of the mixer has to operate at the required speed and that the processing power of the servers have to fulfill the imaging needs.

3. I took a look on the documentation of some Intel chipsets. It is not obvious to answer simple questions. In the present case, if there is a separate PCI 64 bus just for the custom link  in the processing server, how can we figure out how the chipset will react about sharing the host memory bandwidth with that PCI 64 bus ? A server chipset usually has more than one PCI bus. It must also keep happy all the internal peripherals as well.With those 2 GHz motherboards, we may think that a required 64 MB/s sustained rate from a PCI 64 bus is nothing compared to the overall bandwidth of the chipset. But let us say that we want more bandwidth in order to reduce the number of PCs from 4 to 3 or 2. How can we figure it out simply?

Yves Chartier, Eng.
Epsimage Inc.
T 450.974.9109
F 450.974.3628
 
 
 
 

wengt wrote:

Hi,
I disagree with your opinions.

A smart software driver scheme can make it happen in 'deterministic' way you
want.

The following is the skeleton your system and driver may follow to get the
implementation order in the 'deterministic' way:
1. The software driver queues all image requests in a list;
2. Never issue second DMA operation instructions to start next DMA operation
until the previous DMA operation is finished;
3. All DMA operations strictly follows PCI specifications, i.e., it will
occupy PCI bus until both conditions fail to meet:
Latency Timer is over and bus nGNT become not available;
4. Keep as few devices on PCI bus as possible. In your case, there are no
other boards except one hard disk driver and your  many boards as you want.
Especially your system disconnect net-board when the system working;
5. Select system with PCI-X or PCI-66/64 bus;
6. Select system with the largest memory space and set page swap area for
operating system in disk to 0: prohibit operating system from exchanging
data to disk;
7. Eliminate all unnecessary disk operations during boards working period or
delay them until the board don't work;
8. Select best combination of CPU and PCI bridge;
9. Select low latency of PCI chip;
10. All boards share the same interrupt pin;

Why:
1. Issuing one DMA operation instructions a time is equivalent to reducing
the number of board on PCI to compete for PCI to 1;
2. When PCI bus is only one board, the board can run as long clocks as it
wants, limited only by bridge capacity;

Transactions map on PCI looks like this one:
System command to start one DMA;
The board starts 2K bytes transaction;
The board generates interrupt;
System command to start next board DMA;
...

Weng

-----Original Message-----
From: Martijn Emons [mailto:martijn.emons@arcobel.nl]
Sent: Friday, December 06, 2002 12:51 AM
To: Yves Chartier
Cc: pci-sig@znyx.com
Subject: Re: Deterministic transactions from the local bus side

Hi Yves,

Sorry for trashing your party but here's the blund anwser:
* nothing involving  Windows-OS is 'deterministic'
* nothing involving a PCI-bus with multiple-masters (ie. multiple 9656)
is 'determistic'.

The answer lies in 'determinstic-enough'.
We have designed PCI-card that uses PCI-master write/reads, the driver
supports multiple cards for
  image processing
Our experiences:
* A normal Windows/PC-system using multiple boards can handle more than
2000 interrupts easily.
* tweaking the master-latency timer values has some effects when the
BIOS uses these values.
   It's better to read the BIOS-manuals of some PC to get a grip of what
the BIOS supports via the
    normal BIOS-settings and/or what you can tweak using editors like
WPCEDIT (search the internet).
* I can not imagine that your PCI-card will have no buffer-capability
whatsoever. So.... calculate:

   Latency in your 'determinstic-reads' => the number of 9656-masters X
(burst-size/PCI-speed).

   For MS-Windows, the scatter-gather entries in the kernel-mode driver
are 4kbyte max in combination
   with a normal PCI-bus performance (say 100 Mbyte/sec sustained ) this
will take 40 us to be transfered.
   A PC-system with, say five, 9656-masters means that it will take 5 x
40 => 200 us delay
   before every card will get the opportunity to receive (and process)
4kbyte of data.
   A normal bit of pixel-processing on 4kbyte of pixel data will take
roughly 80 us on a high-speed DSP/CPU
   (all depending on algoritm ofcourse), but image-mixing falls in this
ball-park figure.

  Is this 'deteriminstic enough' for you? Only you can answer this question.

--

Kind regards, Martijn Emons

- Designer Consultant -
Arcobel ASIC Design Centre B.V.
Hambakenwetering 1
5231 DD 's-Hertogenbosch,
The Netherlands.
tel.:  +31 73 64 60 100
fax :  +31 73 64 60 115

Yves Chartier wrote:

> We are considering using multiple PLX PCI 9656s to link many rackmount
> PCs to a kind of image mixer.
>
> Each PC host CPU running a standard image processing application on
> Windows 2000 or XP will fill a big image buffer up to 4 GBytes in the
> host memory. Then a real time subsystem running on the local bus side
> of the 9656 will set up requests to read data from that buffer.
>
> Our wondering is about the fact that the reading has to be done in a
> *deterministic* manner, thus eliminating the need of big elastic
> buffer. When there is a request to read a given part of that buffer
> (ex: a  line of 2Kbytes), it has to be delivered immediately with
> almost no latency . I understand that the 9656 may be set up as a
> master, but what about its priority. I do not know enough about
> Windows 2000 or XP to see if there is a way to to give the absolute
> priority to the 9656. I would like to stay with Windows 2000 or XP
> instead of a real time OS  in order to be able to use standard image
> processing appplications.
>
> Is it possible to set up the arbitration to give absolute priority to
> the 9656 ?
>
> Thanks,
>
> Yves Chartier, Eng.
> Epsimage Inc.
> T 450.974.9109
> F 450.974.3628
>

____________________________________________________________________________
_________________
This outbound message from KPN has been checked for all known viruses by
KPN MailScan (IV Scan), powered by
MessageLabs. For further information visit: http://www.veiliginternet.nl
____________________________________________________________________________
_________________