Project 4: Image Warping and Mosaicing

NITYA SRI ADAPALA

Introduction

In this project, I take two or three photographs and create an image mosaic by registering, projective warping, resampling, and compositing them. Throughout this project, I learned how to compute homographies, and use them to warp images.

Part A

Taking Images

Below are some images I used in this project:

Hildebrand Left

Hildebrand Center

Hildebrand Right

Memorial Stadium Left

Memorial Stadium Center

Memorial Stadium Right

Chou Hall Left

Chou Hall Center

Chou Hall Right

Recover Homographies

First, using the online tool provided by the course, I identified correspondence points in each pair of images, where one image is the image that will be warped onto the reference image and the second image is the reference image. We want to find a 3x3 matrix, H, to compute a projective transformation, as shown on the left image. On the right image, we see how we can rearrange the equations to solve for our H matrix:

Projective Transformation: H • v = v'

Rearranging Equations to Solve for H

After setting up the overdetermined linear system of equations, we solve it using least squares. After performing SVD, we get the matrices U, S and V transpose. In order to minimize the loss function, we want the vector that corresponds to the smallest eigenvalue in V transpose, which is the last vector, since the vectors are ordered in decreasing magnitude. Once we have solved for all the unknown values of the H matrix, we reshape the 9x1 vector into a 3x3 matrix. This becomes our H matrix.

Warp the Images

Now that we have calculated the homography matrix, we can use it to warp each image towards the reference image. First, I find the corners of the first image to identify the boundaries of the bounding box before warping. Then, I do a forward warp of the bounding box and normalize it so that I know the space in which the pixels need to be filled. I then find the bounding box of the final image, which is an empty canvas so I copy over image 2 into the final output. Once I've found the polygon in which the final image will be placed, I applied inverse homography to perform a backward warp to produce the final output coordinates of the warped image 1. Finally, I interpolated the pixels in image 1 that will be mapped to each pixel in the output. Below are the resulting images:

Canvas for Final Image with Hildebrand Center

Warped Bounding Box of Hildebrand Left

Hildebrand Left Warped onto Hildebrand Center

Canvas for Final Image with Hildebrand Center

Warped Bounding Box of Hildebrand Right

Hildebrand Right Warped onto Hildebrand Center

Canvas for Final Image with Memorial Stadium Center

Warped Bounding Box of Memorial Stadium Left

Memorial Stadium Left Warped onto Memorial Stadium Center

Canvas for Final Image with Memorial Stadium Center

Warped Bounding Box of Memorial Stadium Right

Memorial Stadium Right Warped onto Memorial Stadium Center

Here is an example of a failure case. I believe the railing in the forefront of the image is difficult to align whereas the background is well- aligned. Maybe using a spherical warp or using more correspondence points can help tackle this so that the features at the very front are also aligned.

Canvas for Final Image with Chou Hall Center

Warped Bounding Box of Chou Hall Left

Chou Hall Left Warped onto Chou Hall Center

Canvas for Final Image with Chou Hall Center

Warped Bounding Box of Chou Hall Right

Chou Hall Right Warped onto Chou Hall Center

Image Rectification

To ensure that my homography and warping is working correctly, I performed image rectification. I took pictures of things that I knew were square at an angle so that they weren't square in the image. I then rectified the images by warping the images so that the features become square. First, using the online tool, I selected four points in the original image that would map to a square, which I defined manually, and then I compute the H matrix using these correspondence points. Using this H matrix, we warp the image so that the features become square. The black areas of the image are caused due to the distortion of perspective. For the moffitt ceiling, I chose four corners of one tile on the ceiling to rectify and as you can see all the tiles in the rectified image are now square. For the campanile, I chose the four corners around the clock face and as you can see, the bounding box around the clock face has become a square. Here are the images before and after rectification:

Image of Moffitt Library Ceiling

Rectified Image of Moffitt Library Ceiling

Image of Campanile

Rectified Image of Campanile

Blend the images into a mosaic

After we checked that the warping works as expected, I proceeded to blend the images seamlessly so that it looks like one image. I do this by first finding the distance transforms for the warped image and the reference image. These distance transforms are calculated by assigning each pixel in the region the distance to the nearest edge. The final result is then normalized to get a mask with values ranging from 0 to 1. Below I have shown the visualizations of these distance transforms. I then created a mask such that a pixel's value in the warped image will be taken if its distance transform in the warped image is larger than its distance transform in the reference image. Using this mask, the warped image and the reference image in the warped bounding box, I constructed Gaussian and Laplacian stacks similar to Project 2 and blended the two images together seamlessly to produce the final blended outputs which you see below:

Distance Transform for Hildebrand Left

Distance Transform for Hildebrand Center

Blending Mask

Final Blended Output of Hildebrand Left and Center

Distance Transform for Hildebrand Right

Distance Transform for Hildebrand Center

Blending Mask

Final Blended Output of Hildebrand Right and Center

Distance Transform for Memorial Stadium Left

Distance Transform for Memorial Stadium Center

Blending Mask

Final Blended Output of Memorial Stadium Left and Center

Distance Transform for Memorial Stadium Right

Distance Transform for Memorial Stadium Center

Blending Mask

Final Blended Output of Memorial Stadium Right and Center

Even when blended, the misalignment of the railing is too obvious whereas the background seamlessly blends together:

Distance Transform for Chou Hall Left

Distance Transform for Chou Hall Center

Blending Mask

Final Blended Output of Chou Hall Left and Center

Distance Transform for Chou Hall Right

Distance Transform for Chou Hall Center

Blending Mask

Final Blended Output of Chou Hall Right and Center

Part B: Feature Matching for Autostitching

In part B of this project, we create a system for automatically stitching images into a mosaic by learning how to read and implement the research paper found at this link: https://inst.eecs.berkeley.edu/~cs180/fa24/hw/proj4/Papers/MOPS.pdf.

Harris Corner Detection

First, in order to detect the features, we need to identify the corners first because they are localized, distinct and can be found in two images of the same scene. These are great properties of corners that will be useful later on in the project. As a result, I used the corner_harris function from the skimage.feature library to find all the Harris corners in the image. Once I found them, I plotted them on the images as you can see below.

Hildebrand Left with labelled Harris points and Harris matrix

Hildebrand Center with labelled Harris points and Harris matrix

Hildebrand Right with labelled Harris points and Harris matrix

Stadium Left with labelled Harris points and Harris matrix

Stadium Center with labelled Harris points and Harris matrix

Stadium Right with labelled Harris points and Harris matrix

Chou Left with labelled Harris points and Harris matrix

Chou Center with labelled Harris points and Harris matrix

Chou Right with labelled Harris points and Harris matrix

ANMS: Adaptive Non-Maximal Suppression

ANMS helps distribute feature points more evenly across the image instead of clustering them in highly textured areas like with the Harris Corners. For image stitching, we need good feature matches across the entire overlap region between images, not just in small clustered areas. In terms of implementation, I only keep the points that have a longer distance to strong corner intensities, which allows the points to be evenly distributed. For each point, I find the minimum distance to any other point that is stronger than the threshold, which is the suppression radii. I then sort each point by their suppression radii in descending order from largest to smallest and keep the top 500 points. Below, you can see the selected points overlaying the respective image.

Feature Descriptor Extraction

The feature descriptor extraction is necessary for creating distinctive 8x8 windows of each feature point. By first extracting a 40x40 window around each corner and downsampling it to 8x8, we naturally blur it to make the descriptor more robust to small pixel variations and noise. Additionally, the bias/gain normalization step ((patch - mean)/(std)) ensures the descriptor is not affected by lighting differences so therefore we can match features even when one image is brighter or darker than the other. These normalized 8x8 patches are great, reliable descriptors to help us match corresponding points between images, making the matching process much more efficient than using raw pixel values.

First five patches for Hildebrand Left

First five patches for Hildebrand Center

First five patches for Hildebrand Right

First five patches for Stadium Left

First five patches for Stadium Center

First five patches for Stadium Right

First five patches for Chou Left

First five patches for Chou Center

First five patches for Chou Right

Feature Matching

For feature matching, I find the corresponding points between two images by comparing their feature descriptors which we computed in the previous section. For each descriptor in the first image, it computes the Euclidean distances to all the descriptors in the second image, which creates a distance matrix. I wrote a fidn2nn function which helps find the two nearest neighbours for each descriptor by keeping track of the two smallest distances and their indices. I learned that the Lowe's ratio test calculates the ratio between the best and second-best match distances. If this ratio is less than 0.6, it means that the best match is significantly better than the second best, making it a good match. I used 0.6 after consulting Figure 6b in the paper. The Lowe's ratio test was necessary to effectively filter out ambiguous matches where a descriptor might be similar to various other similar regions in the other image.

Top 20 Feature Matches between Hildebrand Left and Center

Top 20 Feature Matches between Hildebrand Right and Center

Top 20 Feature Matches between Stadium Left and Center

Top 20 Feature Matches between Stadium Right and Center

Top 20 Feature Matches between Chou Left and Center

Top 20 Feature Matches between Chou Right and Center

RANSAC

RANSAC, which stands for Random Sample Consensus, handles outliers in feature matches that could severely affect the least-squares solution. To write the RANSAC algorithm, I followed the RANSAC loop taught in lecture. Instead of using all of the matches to compute the homography, RANSAC repeatedly samples small random sets of four point pairs, computes a homography from each sample, and counts how many other points agree with this transformation (inliers). Once I computed all the inliers, I kept the largest set of inliers and recomputed the least-squares homography estimate, ransac_H, on all the inliers. RANSAC helps increase the robustness of least-squares when finding the homography matrix necessary for image stitching in the next and final step. Below are some images with the inliers overlaid:

Inliers of Feature Matches between Hildebrand Left and Center

Top 20 Feature Matches between Hildebrand Left and Center

Inliers of Feature Matches between Hildebrand Right and Center

Top 20 Feature Matches between Hildebrand Right and Center

Inliers of Feature Matches between Stadium Left and Center

Top 20 Feature Matches between Stadium Left and Center

Inliers of Feature Matches between Stadium Right and Center

Top 20 Feature Matches between Stadium Right and Center

Inliers of Feature Matches between Chou Left and Center

Inliers of Feature Matches between Chou Right and Center

Final Results

Using the homography matrix produced by the RANSAC algorithm, I recovered the homographies and repeated the steps in part A to warp image 1 onto image 2 and blend the two images together into a smooth mosaic. Below are the final results of the automatic stitching beside the results of the manual stitching.

Hildebrand Left and Center with Automatic Correspondences

Hildebrand Left and Center with Manual Correspondences

Hildebrand Right and Center with Automatic Correspondences

Hildebrand Right and Center with Manual Correspondences

Stadium Left and Center with Automatic Correspondences

Stadium Left and Center with Manual Correspondences

Stadium Right and Center with Automatic Correspondences

Stadium Right and Center with Manual Correspondences

Chou Left and Center with Automatic Correspondences

Chou Left and Center with Manual Correspondences

Chou Right and Center with Automatic Correspondences

Chou Right and Center with Manual Correspondences

The coolest thing I have learned in this project is feature matching through automatic correspondences. I find it fascinating how we can create feature descriptors to help us uniquely identify a feature in one image that can be compared with other feature descriptors in other images to match the features efficiently. It is so much faster and produces better outputs than manual correspondences.