Real-Time Geometry Scanning System
Introduction
The field of structure from motion within the study of computer vision is active and evolving. Existing approaches for using cameras to obtain the 3D structure of a scene use visual correspondence and tracking across multiple views to triangulate the position of points in the scene. This is a well-studied problem with entire textbooks about the various stages of its solution written, such as An Invitation to 3-D Vision: From Images to Geometric Models, by Yi Ma, Stefano Soatto, Jana Kosecka, and Shankar Sastry.
Unfortunately, purely vision-based approaches for using camera images to calculate the 3D geometry of a scene suffer from a number of well-known drawbacks. High-quality visual features must exist, and correspondences between them must be established within multiple views. The process of matching correspondences is subject to noise which depends on each view. Views without visual features, like images of the floors or walls of a building, are not suitable for use at all. In addition, aligningment of the views and triangulation of the geometry involves a considerable amount of computational expense. For many applications, this expense is acceptable, but once the geometry is constructed, it may be incomplete due to the presenece of "holes" from places where the user forgot to scan.
The recent emergence of geometry cameras that use structured patterns of infrared light to construct a camera-space depth map in hardware solve a number of these problems. The infrared light projection and reconstruction occur outside of the visible light spectrum, so the system does not depend on visible features at all. Many of the cameras are relatively inexpensive, and the geometry construction occurs in real-time. They do, however, arrive with a number of their own novel problems. Infrared light from energy sources such as the sun or other infrared cameras may interfere with the projected patten. The projected light may also reflect off of different surfaces, confusing the algorithm used to perform the reconstruction. Also, the data is only available as a depth map in camera space; it is impossible to obtain data about any geometry that is not immediately visible in the current view.
The former problems may be solved keeping the infrared camera out of direct sunlight and avoiding highly reflective surfaces like mirrors and consumer electronics liquid-crystal displays. This system presents a novel approach to solving the latter problem by saving the data obtained by each view of the camera into a global data structure. Provided with a means to obtain the pose of the camera at each frame, each frame's depth map can be transformed into a colored point cloud in world space, relative to some origin and set of coordinate axes.