An introduction to augmented reality

Location-based and recognition-based augmented reality are the two main types of augmented reality. Both types try to derive where the user is looking. This information is key in the augmented reality process, and relies on properly calculating the camera pose estimation. In order to accomplish this task, the two types are briefly described as follows:

Location-based augmented reality relies on detecting the user's location and orientation by reading data from several sensors, that are very common in smartphone devices (for example, GPS, digital compass, and accelerometer) to derive where the user is looking. This information is used to superimpose computer-generated elements on the screen.
On the other hand, recognition-based augmented reality uses image processing techniques to derive where the user is looking. Obtaining the camera pose from images necessitates finding the correspondences between known points in the environment, and their corresponding camera projections. In order to find these correspondences, two main approaches can be found in the literature:
- Marker-based pose estimation: This approach relies on using planar markers (those based on square markers have gained popularity, especially in the augmented reality field) to compute the camera pose from their four corners. One major disadvantage of using square markers is in connection with the computation of the camera pose, which relies on the accurate determination of the four corners of the marker. This task can be very difficult in the case of occlusion. However, some approaches based on detection of markers can also deal with occlusion really well. This is the case of ArUco.
- Markerless-based pose estimation: When the scene cannot be prepared using markers to derive pose estimation, the objects, that are naturally present in the image can be used for pose estimation. Once a set of n 2D points and their corresponding 3D coordinates have been calculated, the pose of the camera is estimated by solving the Perspective-n-Point (PnP) problem. Due to these methods relying on point matching techniques, the input data is seldom exempt from outliers. This is why robust techniques to outliers (for example, RANSAC) can be used in the pose estimation process.

In the next screenshot, the two aforementioned approaches (marker-based and markerless-based augmented reality) are shown in connection with image processing techniques:

In the preceding screenshot, on the left side, you can see an example of the marker-based approach, where the marker is used to compute the camera pose from their four corners. Additionally, on the right side, you can see an example of the markerless-based approach, where the €50 note is used to compute the camera pose. Both approaches are explained in the following sections.