Inter-process really fast communication

bob · Post by **bob** » Thu Mar 03, 2005 3:24 am

Let's say I've got one application generating a video stream into whatever frame buffer (standard memory region) I want, and another application that wants to grab the video stream and do something with it, such as display it with as little latency as possible. Assume I have source to both of these apps.

What are the various ways to make this happen efficiently (Windows and UNIX), assuming truecolor NTSC resolution (.88 MB per frame) and 30 or 60 fps? Would a FIFO pipe or unix port be fast enough? How difficult would it be to negotiate a (triple buffered?) shared memory area with some semaphore locking?

Obviously the shared memory would be faster, but how do I negotiate that, with messages that each successive frame is ready?

Post by **Jonathan** » Thu Mar 03, 2005 5:54 am

Premature optimization. Just code it. Video playback is not that demanding at NTSC resolutions.

George · Post by **George** » Thu Mar 03, 2005 3:28 pm

It's always worth considering doing things efficiently, especially since in this case Bob presumably wants to reserve as much CPU time as possible or the producer and consumer's actual processing.

The multiple buffer/circular queue in shared memory is probably the best from a throughput perspective. You'd need to write a little more code yourself compared to the other options, but it's pretty simple. I've been using a circular queue in Win32 shared memory for passing messages (mostly to duplicate another platform's behavior rather than for throughput), and it's worked well.

To negotiate producer/consumer access to a circular queue with fixed size elements is pretty easy. The producer and consumer are each responsible for tracking which entry they are supposed to look at next. Create a pair of counting semaphores. The producer waits on its semaphore (signalling a queue entry is empty), produces, and posts to the consumer's semaphore. The consumer waits on this semaphore (signalling a queue entry is full), consumes, and posts to the producer's semaphore. Initialize the consumer semaphore to 0 and the producer semaphore to the size of the queue. Semaphores can be implemented fairly efficiently (though I've never timed the Win32 or any *nix implementations), so this shouldn't impact the performance of the system significantly, though it would block the producer which would be bad if you're responding to real-time events.

BSD sockets will probably be slower since they will presumeably perform at least one internal copy, and probably more if the implementation dooesn't have a good short circuit for localhost transfers. And the needless complexity of BSD socket setup negates any simplicity advantage it would otherwise have had.

I haven't had cause to look at the internals of a Unix pipe, but I suspect that they perform at least one and probably two copies internally. You can feed them data in any location, and it wouldn't be safe to give the other process access to that entire memory page. So data is probably copied to a buffer in the receiving process's memory map. Assuming you use the read function or something like it, it would then have to copy the data again to the buffer you feed to the read function. The only advantage of a pipe is that it is probably pretty simple to set up, and it can be read and written using the standard I/O routines.

Peijen · Post by **Peijen** » Thu Mar 03, 2005 3:38 pm

George wrote:Create a pair of counting semaphores.

Do you really need two semaphores for this? I think one semaphore is enough. Use one semaphore to indicate if the frames are filled or expired. The producer will fill the expired frame and set it to filled, consumer will use the filled frame and set it to expired, going round in circle. Of course it has been a long time since I deal with multi-threaded stuff, so maybe I am thinking of concept rather than implementation.

bob · Post by **bob** » Thu Mar 03, 2005 7:29 pm

I don't need to low-level optimize the code, but multimedia like this is certainly a case where I need to optimize the general method.

How does one create semaphores, shared memory, and message queues under Win32? I'm guessing it's not semget(), shmget(), and msgget().

George · Post by **George** » Thu Mar 03, 2005 7:40 pm

You're thinking of simply preventing concurrent access to memory. That could be made to work (store a filled/empty bit for each queue entry in the shared memory region), but it wouldn't be very efficient. Imagine the consumer falls behind. The producer has just filled the last entry and releases the semaphore. Then the producer waits to try to fill the next entry. The producer immediately unblocks (because it just posted and the consumer is presumably too busy to grab the semaphore), sees the next entery is still full, and now what? A solution is to release the semaphore, sleep for a little bit, and then try again. However, this is just a slowed-down polling. It's very wasteful of CPU cycles at a time when the consumer really needs them.

It's better to make use of the OS-level blocking that's usually implied by the wait operation. but you can't block on a semaphore you post without running into the above problem. Therefore, you really need two semaphores.

Interestingly, two semaphores is sufficient for any queue size and any number of producers and/or consumers.

As a side note, in a real-time system where the producer may have to react to some other event, you'd have to go with polling (or block with a timeout) in order to insure that the producer is available to react. This might be the case if Bob's producer is grabbing frames from a hardware source like a capture card. I'd still say use two semaphores in that case, but use the non-blocking wait, and then sleep a couple milliseconds if you don't obtain the semaphore.

George · Post by **George** » Thu Mar 03, 2005 7:58 pm

bob wrote:How does one create semaphores, shared memory, and message queues under Win32? I'm guessing it's not semget(), shmget(), and msgget().

Briefly, to create a shared memory region:

Code: Select all

char *bufferName; // Set this to the name you want
int bufferSize; // Set this to the bytes you want
HANDLE smbHandle;
char *buffer;
if (smbHandle = CreateFileMapping((HANDLE)0xFFFFFFFF, NULL, PAGE_READWRITE, 0, bufferSize, bufferName) == NULL)
//error

if (buffer = (char *)MapViewOfFile(smbHandle, FILE_MAP_ALL_ACCESS, 0, 0, 0) == NULL)
//error

To attach the second process to the buffer created above, use the same MapViewOfFile call, but replace the CreateFileMapping with OpenFileMapping(FILE_MAP_ALL_ACCESS, false, bufferName)

You can verify the arguments on MSDN if you're inclined. I haven't used the semaphores, but the functions you're looking for are probably CreateSemaphore, OpenSemaphore, WaitForSingleObject, and ReleaseSemaphore. I haven't used message queues except on VxWorks, so I don't know where to look for them in windows.

Peijen · Post by **Peijen** » Fri Mar 04, 2005 3:16 pm

George wrote:It's better to make use of the OS-level blocking that's usually implied by the wait operation. but you can't block on a semaphore you post without running into the above problem. Therefore, you really need two semaphores.

Interestingly, two semaphores is sufficient for any queue size and any number of producers and/or consumers.

You know, I think I made the same mistake in OS for our file system (use single lock instead of double lock). I need to stop thinking everything in term of physical resource and take time into consideration.

George · Post by **George** » Sat Mar 05, 2005 1:26 am

Peijen wrote:You know, I think I made the same mistake in OS for our file system...

To this day, I still don't know what went wrong with our file system. I wonder if I still have the code around anywhere. I'd probably be embarrassed by how bad it was now that I have some actual real-time system experience.

Peijen · Post by **Peijen** » Sat Mar 05, 2005 4:41 pm

George wrote:
Peijen wrote:You know, I think I made the same mistake in OS for our file system...
To this day, I still don't know what went wrong with our file system. I wonder if I still have the code around anywhere. I'd probably be embarrassed by how bad it was now that I have some actual real-time system experience.

Ha, I can say that I was able to answer our TA's question of why the fs crashes after creating the 47th file. Took Martin and the other guy a while to catch on what I was saying. Of course having a slow ass fs probably cancelled that out ...

George · Post by **George** » Sat Mar 05, 2005 5:05 pm

Our class had the file system due date pushed back so far that the TAs didn't have a chance to ask questions. I assume they must have graded it though, because it knocked my final grade down from an A to a B. Although, they might have just run the test scripts. Our FS failed due to bugs on most of the test scripts and didn't even have implementations for the commands used by some of the others.