Core concepts of SfM

Before we delve into the implementation of a SfM pipeline, let's revisit some key concepts that are an essential part of the process. The foremost class of theoretical topics in SfM is epipolar geometry (EG), the geometry of multiple views or MVG, which builds upon knowledge of image formation and camera calibration; however, we will only brush over these basic subjects. After we cover a few basics in EG, we will shortly discuss stereo reconstruction and look over subjects such as depth from disparity and triangulation. Other crucial topics in SfM, such as Robust Feature Matching, are more mechanical than theoretical, and we will cover them as we advance in coding the system. We intentionally leave out some very interesting topics, such as camera resectioning, PnP algorithms, and reconstruction factorization, since these are handled by the underlying sfm module and we need not invoke them, although functions to perform them do exist in OpenCV.

All of these subjects were a source of an incredible amount of research and literature over the last four decades and serve as topics for thousands of academic papers, patents, and other publications. Hartley and Zisserman's Multiple View Geometry is by far the most prominent resource for SfM and MVG mathematics and algorithms, although an incredible secondary asset is Szeliski's Computer Vision: Algorithms and Applications, which explains SfM in great detail, focusing on Richard Szeliski's seminal contributions to the field. For a tertiary source of explanation, I recommend grabbing a copy of Prince's Computer Vision: Models, Learning, and Inference, which features beautiful figures, diagrams, and meticulous mathematical derivation.