The goal of this project is to implement a scalable real-time solution for streaming multiple high definition videos to be stitched into a "super high-definition" panoramic video. This is accomplished by streaming feeds from multiple calibrated HD cameras to a centralized location, where they are processed for storage/live viewing on multi-panel displays. To account for perspective distortion that affects displays with large vertical or horizontal fields of view, the videos are passed through a spherical warping algorithm on graphics hardware before being displayed.
Another important aspect of the project is in efficiently streaming the video across a network with unreliable bandwidth: if the bandwidth between the sender and receiver is lessened, we wish to allow the user to choose which aspect of the video to degrade to maintain desired performance (e.g. degrading frame rate to preserve quality, or vice-versa).
Currently, we are able to stream video from the two HD cameras we have available to us and display the warped videos simultaneously on the receiving end. The user can then interactively change the bitrate of the stream, change the focal depth of the virtual camera, and manually align the images.
The current implementation of the project consists of two programs, one of which runs on the sending side and one of which runs on the receiving side.
Each camera in the system is connected to its own machine, which obtains raw Bayer RGB from the camera, converts it to 32-bit RGB, compresses it with the XVid codec, and sends it across a wide network. As the SDK for interacting with the cameras (the GigE Link SDK, available from the downloads section of the Silicon Imaging website) is only available for Windows, these machines are running Windows XP.
On the receiving end, there is one Linux box (currently running CentOS, though has also been tested on SuSE 10.0) that receives all of these streams, processes them, and creates a buffer for the entire image that it then sends to SAGE.
The program on the sending end, xvidsend, and the code for receiving the data across the network and obtaining a single frame at a time on the receiving end, was written by Andrew. The program on the receiving end, the Amalgamator, was written by Alex. The code for the Amalgamator has been commented and is in that sense self-documented, but a high level overview of the structure of the program is provided here for clarity.
If you intend to compile the program yourself, you will most likely have to modify the makefile. The makefile is rather simple, and contains a few rules that are hardcoded to enable or disable certain options. For example, a make local will make the amalgamator without any special options. make sage enables SAGE support, make sage-tile enables tiled rendering for SAGE output (which is almost always necessary), and make sage-tile-shader enables tiled rendering for SAGE output and warps each of the images with a fragment shader. make network and other variations are deprecated.
The Amalgamator is a multithreaded application. The idea is to have one thread for each incoming stream and one thread for main processing. The stream threads sit and wait for network input, and on receiving data will decompress it and update a shared buffer. The main thread reads from the buffers for these images and draws them on a textured quad, so as to apply a pixel shader to each image. As the resulting image is too large for the local screen, and as the graphics card on our current machine does not have Linux drivers that enable support for OpenGL Framebuffer Objects, we make use of a tile rendering library to break the image into smaller chunks and render each chunk individually.
This is beneficial as it doesn't complicate the code very much, but detrimental because it results in a drastic slowdown in speed -- it seems that the overhead of switching from tile to tile effectively doubles the rendering time. The optimal choice, then, is to make the image as large as possible on the local end, which in this case is 1920x1080 (the size of half of the total image for two streams), as the window into which it is being rendered must fit on the desktop of the local screen.
Then, the user is allowed to control the position and warping amount of each image to align them manually. As of now there is no GUI on the local end; it wouldn't be difficult to use the windowspace necessary for drawing the images to display a GLUI GUI and provide the same functionality, but for now it's all operated with keyboard commands. There may be output to the console after some of the commands, so it's useful to have that within view. The commands are:
- W, A, S, D : Move the current image up, left, down, or right, respectively.
- N : Make the next image the current image.
- E, C : Increase or decrease the focal length, respectively. Decreasing the focal length will cause a greater spherical warping in the image.
Andrew Prudhomme: Andrew has worked on the back end of the project, specifically in receiving input from the cameras, performing the necessary steps to obtain the data in a usable format, and compressing and streaming it efficiently across a network.
Alex Zavodny: Alex has worked on the front end of the project, which involves receiving the individual streams from the cameras, processing them, and displaying them in an efficient and timely manner.