- Corner Detection + Non-Maximal Suppression
- Feature Descriptors
- Feature Matching and Homography Computation
- Image Warping
This project implements a pipeline for the automatic stitching of images into panoramic compositions via reprojection of the images onto a common plane. In order to compute a homography to reproject the images, common features between pairs of images are detected and matched automatically. The project is an application of image registration and is based on the work by Brown, Szeliski and Winder .
The first step in the pipeline is detecting several Harris corners in both (grayscale) images and, in order to keep the computational cost of the subsequent matching stage low , the number of corners is further decreased by a suppression scheme. Additionally, in this project the corner detection and generation of feature descriptors are carried out in parallel (one thread per image) in order to further reduce computation time.
The suppression scheme presented in  is Adaptive Non-Maximal Suppression (ANMS). This scheme preserves a good spatial distribution for the remaining corners, which is desirable. In practice however, I found that just keeping the N strongest corners a pixel apart, over 3x3 pixel regions sometimes produced better corners, even if it resulted in a biased distribution.
peak_local_max. Notice the better convergence on the top-right.
From the remaining corners, 9x9 feature descriptors centered at each corner are extracted. Instead of following the exact approach in , I subsampled 45x45 patches with stride 5, resulting in 9x9 patches centered at each corner, which are then normalized.
Descriptors between both images are matched in a two-step process. First, features are matched based on the ratio of their two nearest neighbors (as determined by the metric). Then the resulting candidate features are evaluated with 4-point Random Sample Consensus (RANSAC) in order to further filter out outliers.
The RANSAC algorithm computes successive homographies in order to evaluate how good random 4-point (4 feature centers) sets are in mapping from source to target image. Once a suitable 4-point set is found, another homography is computed from those 4 points plus all points which mapped correctly (according to a pixel threshold) under that homography, in order to create a larger set of correspondences which translates to a more robust homography. This process is repeated for a number of times and the homography for the best candidate set is used to finally warp the images.
The homography maps source pixel locations to their corresponding target locations . That is, if is a source pixel location in homogeneous coordinates, the homography is such that , where and is the target in cartesian coordinates.
A homography satisfying the above equation can be recovered by formulating the search as a least squares approximation (i.e.: ), where represents the unknown parameters in .
The constraints to this system of equations are essentially derived from wanting each component in a target point to match that of a source point transformed by the homography. At least 4 matching feature pairs are needed (each pair provides 2 constraints; there are 8 unknowns / DOF in H) but the more the better:
Once a good homography has been found, the source image’s 4 corners are forward warped onto the target space in order to compute the extent of the final composite image and two empty images of this new size are created.
The target image is copied to one of these images without warping (though accounting for its offset in the new image size). The source image is instead inverse warped in order to interpolate pixel values. First, all coordinates from target space are warped onto source space (via the inverse of ). Then, for all target coordinates which mapped to within the range of the original source image (i.e.: between (0,0) and (w, h) of the unmodified source), pixels in the other empty image are assigned values bilinearly interpolated from the source space values.
Compositing is done by adding both final images, each multiplied by an alpha mask. The process is similar to linear interpolation of two images but instead each image has its own alpha mask with a nonlinear gradient applied only to the area where both images intersect. The full process can then be repeated with another image pair consisting of the composited image and a new image.