Real-Time Geometry Scanning System

From Immersive Visualization Lab Wiki
Revision as of 19:28, 26 April 2011 by Dtenedor (Talk | contribs)

Jump to: navigation, search

Introduction

The field of structure from motion within the study of computer vision is active and evolving. Existing approaches for using cameras to obtain the 3D structure of a scene use visual correspondence and tracking across multiple views to triangulate the position of points in the scene. This is a well-studied problem with entire textbooks written about the various stages of its solution, such as An Invitation to 3-D Vision: From Images to Geometric Models, by Yi Ma, Stefano Soatto, Jana Kosecka, and Shankar Sastry.

Unfortunately, purely vision-based approaches for using camera images to calculate the 3D geometry of a scene suffer from a number of well-known drawbacks. High-quality visual features must exist, and correspondences between them must be established within multiple views. The process of matching correspondences is subject to noise which depends on each view. Views without visual features, like images of the floors or walls of a building, are not suitable for use at all. In addition, aligningment of the views and triangulation of the geometry involves a considerable amount of computational expense. For many applications, this expense is acceptable, but once the geometry is constructed, it may be incomplete due to the presenece of "holes" from places where the user forgot to scan.

The recent emergence of geometry cameras that use structured patterns of infrared light to construct a camera-space depth map in hardware solve a number of these problems. The infrared light projection and reconstruction occur outside of the visible light spectrum, so the system does not depend on visible features at all. Many of the cameras are relatively inexpensive, and the geometry construction occurs in real-time. They do, however, arrive with a number of their own novel problems. Infrared light from energy sources such as the sun or other infrared cameras may interfere with the projected patten. The projected light may also reflect off of different surfaces, confusing the algorithm used to perform the reconstruction. Also, the data is only available as a depth map in camera space; it is impossible to obtain data about any geometry that is not immediately visible in the current view.

The former problems may be solved keeping the infrared camera out of direct sunlight and avoiding highly reflective surfaces like mirrors and consumer electronics liquid-crystal displays. This system presents a novel approach to solving the latter problem by saving the data obtained by each view of the camera into a global data structure. Provided with a means to obtain the pose of the camera at each frame, each frame's depth map can be transformed into a colored point cloud in world space, relative to some origin and set of coordinate axes. Once enough such vertices are obtained, they may be linked together into a triangle mesh, assigned texture coordinates and saved to a common geometry definition file format like Wavefront OBJ. The scanned geometry is now ready for use in real-time computer graphics applications like virtual tourism or video games, or offline applications like ray-tracing rendering systems.

This system uses a high-quality tracking system to obtain the pose of the camera in real-time. The projected points are inserted into a novel, but simple data structure based on spatial hashing. It uses ray tracing to enforce a maximum of one layer of scanned points for each correspoinding real surface. While the presence of positional noise in the 3D data cannot be eliminated due to hardware constraints, it can be acknowledged and corrected for as best as possible. We believe that this system is able to scan and generate triangle meshes in many types of scenes that other geometry scanning system are completely unable to process, such as those with few or no visual features. In addition, our data structures keep the total data size of scanned scene small, even though our camera sends us millions of colored points per second.

Features and Progress

  • Reading in data from the camera - accomplished using libfreenect
  • Projecting the depth map into camera space - accomplished by modifying the glview example source code
  • Transforming the camera space point cloud into world space - accomplished by constructing a camera matrix using data obtained by a Calit2 tracking system, installed and calibrated by Andrew Prudhomme (thanks!)
  • Ensuring that scanned surfaces are represented by at most one layer of points