3D Data Extraction

CS433b - Image Analysis Final Project

Problem

The problem being examined for this project is extracting depth information from stereo pair images.

A stereo pair consists of two images taken facing the same direction from slightly different locations. These images could be taken by two cameras on a plane both with the same orientation facing in a direction perpendicular to the plane.

The relative distances of objects from the plane can be determined by finding the change in position of the object between the two images. If the object stays in the same place it was very far from the plane. If the object moves a lot then it was close to the plane.

This technique along with many others is used by humans for depth perception.

For this project an application will be produced which will output grey scale depth map from a given pair of stereo images.

Approach

The method used to solve the depth extraction problem will be to use graph cuts to a Roy Cox graph construction to minimize the energy equation:

where Lp is a disparity label and and where C1 and C2 are constants.

The Min-Cut/Max-Flow Algorithm which will be used can be found here: http://www.csd.uwo.ca/faculty/yuri/Abstracts/pami04-abs.html

Results

Test Images

Images used to test the implementation (unless otherwise noted) where obtained from: Middlebury College Stereo Vision Research Page http://cat.middlebury.edu/stereo/newdata.html.

Note: Intensity in result image does not correspond to intensities in the ground truth images, as they use different scales.

Teddy Image

Source Left:

Source Right:

Result:

Ground Truth:

Analysis

The resultant image is generally close to the ground truth. However occlusion is not taken into account.

Also banding is apparent on sloped surfaces. This is because the labelling system is discrete, and depth is determined by distance between pixels.

Vertical edges to the left of object boundaries where the object is closer then the background are blurred. This is most likely due to occlusion where pixels in the left image have no corresponding pixel in the right image.

The background to the right of the teddy bear is not recognized as background. This is because there is no texture which would allow for differentiation between the left and right images.