Making sense of the world

As mentioned before, one of the hard parts of building an AR experience is the amount of work you have to perform in order to make sense of all the sensor data your app receives. This becomes especially hard if you try to make content that you add to the AR world behave as if it's part of the real world. Even if you successfully interpret device motion, acceleration, and orientation, it still would not be enough to create a proper AR experience because your implementation would be unaware of what the user's surroundings look like. Simply knowing what the user is doing with their device is not enough information to create a convincing AR illusion. Apple has solved this problem in ARKit by comparing all images coming in from the camera with each other.

By comparing all images, ARKit can detect and analyze different aspects of the user's surroundings. By doing this, ARKit can detect that one object is closer to the user than another object. It might also detect that there is a large flat square (plane) in the camera's view. It can then even go so far as recognizing that a certain plane is a floor. Or maybe that two seemingly different planes are in fact a single larger plane. Apple calls this feature detection. Feature detection is one of the key aspects of how ARKit is able to make sense of a scene and the contents in it.

An ARSessionConfiguration determines how much of the world surrounding the user is tracked. In order to track all three dimensions surrounding the user, you must enable world tracking on the configuration. When world tracking is enabled, the entire world surrounding the user is tracked. For instance, if the user rotates their device or moves it, ARKit will notice this and the rendered scene will update accordingly. This will make sure that any nodes that you have added to your scene are rendered and appear to be pinned in the real world as you would expect. The following screenshots illustrate this:

These screenshots show the end result of the gallery you'll have built by the end of the chapter. As you can see, the picture is placed at a certain location in the physical world and when you pan the camera around this object its position does not appear to change. It's as if the object is pinned to the real world.

In order to get the best results from world tracking, you will want to make sure that the image your user is looking at is as static as possible. A living room with some furniture in it will work far better than trying to use AR in a crowded shopping mall. If your user uses your AR app in a sub-optimal environment, tracking of the scene might be limited. If this happens, ARKit will call the session(_:cameraDidChangeTrackingState:) delegate method from the ARSessionObserver protocol. The ARSessionDelegate and ARSKViewDelegate protocols both extend ARSessionObserver. This means that you can implement ARSessionObserver methods if you conform to any of the protocols that extend it. You can access a trackingState property on the ARCamera instance that this method receives to figure out what happened, whether tracking is limited, and why. It's a good idea to implement this delegate method so you can inform your user if their experience might not be as good as it can be.

Now that you are aware that ARKit tracks the user's environment by reading data from sensors and comparing camera images, let's see how ARKit knows what to do with all the information it receives. After all, ARKit is supposed to make implementing AR as simple as possible and, to do this, it must be quite clever!