Image Detection

Another way ARKit can interact with the real world is through image detection. Image detection involves storing images and then using the camera to recognize those exact same images when viewed through an augmented reality view.

Image detection is similar but different from machine learning image recognition. With image recognition, an app can recognize items it has never seen before such as different varieties of cars, pencils, or computers. Image detection simply recognizes a stored image. If you don’t store an image ahead of time, image detection can never recognize it.

Image detection can be handy because it allows an app to recognize a fixed image and then respond in different ways, such as displaying more information about that image. For example, a museum might offer an augmented reality app that lets users point an iPhone at a painting. As soon as the app recognizes that painting, it can display additional information about that painting in the user’s native language such as English, Spanish, Arabic, or Japanese.

To learn about image detection, let’s create a new Xcode project by following these steps:

1.
Start Xcode. (Make sure you’re using Xcode 10 or greater.)
2.
Choose File ➤ New ➤ Project. Xcode asks you to choose a template.
3.
Click the iOS category.
4.
Click the Single View App icon and click the Next button. Xcode asks for a product name, organization name, organization identifiers, and content technology.
5.
Click in the Product Name text field and type a descriptive name for your project, such as ImageDetection. (The exact name does not matter.)
6.
Click the Next button. Xcode asks where you want to store your project.
7.
Choose a folder and click the Create button. Xcode creates an iOS project.

Now modify the Info.plist file to allow access to the camera and to use ARKit by following these steps:

1.
Click the Info.plist file in the Navigator pane. Xcode displays a list of keys, types, and values.
2.
Click the disclosure triangle to expand the Required Device Capabilities category to display Item 0.
3.
Move the mouse pointer over Item 0 to display a plus (+) icon.
4.
Click this plus (+) icon to display a blank Item 1.
5.
Type arkit under the Value category in the Item 1 row.
6.
Move the mouse pointer over the last row to display a plus (+) icon.
7.
Click on the plus (+) icon to create a new row. A popup menu appears.
8.
Choose Privacy – Camera Usage Description.
9.
Type AR needs to use the camera under the Value category in the Privacy – Camera usage Description row.

Now it’s time to modify the ViewController.swift file to use ARKit and SceneKit by following these steps:

1.
Click on the ViewController.swift file in the Navigator pane.
2.
Edit the ViewController.swift file so it looks like this:
import UIKit
import SceneKit
import ARKit
class ViewController: UIViewController, ARSCNViewDelegate {
let configuration = ARWorldTrackingConfiguration()
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
    }
}

To view augmented reality in our app, add a single ARKit SceneKit View (ARSCNView) and expand it so it fills the entire user interface. Then add constraints by choosing Editor ➤ Resolve Auto Layout Issues ➤ Reset to Suggested Constraints at the bottom half of the menu under the All Views in Container category.

The next step is to connect the user interface items to the Swift code in the ViewController.swift file. To do this, follow these steps:

1.
Click the Main.storyboard file in the Navigator pane.
2.
Click the Assistant Editor icon or choose View ➤ Assistant Editor ➤ Show Assistant Editor to display the Main.storyboard and the ViewController.swift file side by side.
3.
Move the mouse pointer over the ARSCNView, hold down the Control key, and Ctrl-drag under the class ViewController line.
4.
Release the Control key and the left mouse button. A popup menu appears.
5.
Click in the Name text field and type sceneView, then click the Connect button. Xcode creates an IBOutlet as shown here:

@IBOutlet var sceneView: ARSCNView!

6.
Edit the viewDidLoad function so it looks like this:
override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view, typically from a nib.
    sceneView.debugOptions = [ARSCNDebugOptions.showWorldOrigin, ARSCNDebugOptions.showFeaturePoints]
    sceneView.delegate = self
    sceneView.session.run(configuration)
}

Storing Images

Before ARKit can recognize physical objects in the real world, you need to store images of those items in your app. In addition to storing an image, you must also specify the width and height of that real-world object. That way when ARKit spots that actual item through the iOS device’s camera, it can compare that image with its stored image. If they match in both appearance and size, then ARKit can recognize that real-world item.

First, you must capture an image of the item you want to detect. Since these images need to be high resolution, you can capture public domain images off the Internet such as at NASA ( www.nasa.gov ). Then you can display these images on your computer screen for your iOS device to recognize. Once you have an image on your Macintosh, you’ll need to store it in your Xcode project.

To store one or more images that you want ARKit to recognize, follow these steps:

1.
Click the Assets.xcassets folder in the Navigator pane.
2.
Click + icon in the bottom of the pane. A popup menu appears.
3.
Choose New AR Resource Group, as shown in Figure 14-1. Xcode creates an AR Resources folder.

../images/469983_1_En_14_Chapter/469983_1_En_14_Fig1_HTML.jpg — Figure 14-1
The yellow plane blocks the green box from view

4.
Drag and drop the images into your newly added AR Resource folder that you want ARKit to recognize in the real world. Xcode displays a yellow alert icon in the bottom-right corner of your images.
5.
Click the Attributes Inspector icon or choose View ➤ Inspectors ➤ Show Attributes Inspector. An AR Reference Image pane appears, as shown in Figure 14-2.

../images/469983_1_En_14_Chapter/469983_1_En_14_Fig2_HTML.jpg — Figure 14-2
Defining the width and height of the item to recognize

6.
Click in the Width and Height text fields and type the actual width and height of the real item. You can also click the Units popup menu to change the default measurement unit from meters to something else, such as inches or centimeters.

Once we’ve added one or more images of the real objects we want ARKit to recognize, we need to write actual Swift code to recognize the image when spotted through an iOS device’s camera.

First, we need to access the folder containing the images of items to recognize. This folder can be called anything such as “AR Resources”. This means using a guard statement to verify that the image folder even exists like this:

guard let storedImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else {

fatalError("Missing AR Resources images")

}

This code looks for a folder named AR Resources . If it fails to find it, it ends the program and displays “Missing AR Resources Images.” If it finds an AR Resources folder, then we can define where the detected images are stored like this:

configuration.detectionImages = storedImages

Finally, we need to use the didAdd renderer function, which runs every time the camera updates its view. If the camera detects a recognized image (ARImageAnchor) then we can verify this by printing "Item recognized" like this:

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {

if anchor is ARImageAnchor {

print("Item recognized")

}

The entire ViewController.swift file should look like this:

import UIKit

import SceneKit

import ARKit

class ViewController: UIViewController, ARSCNViewDelegate {

@IBOutlet var sceneView: ARSCNView!

let configuration = ARWorldTrackingConfiguration()

override func viewDidLoad() {

super.viewDidLoad()

// Do any additional setup after loading the view, typically from a nib.

sceneView.debugOptions = [ARSCNDebugOptions.showWorldOrigin, ARSCNDebugOptions.showFeaturePoints]

sceneView.delegate = self

guard let storedImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else {

fatalError("Missing AR Resources images")

}

configuration.detectionImages = storedImages

sceneView.session.run(configuration)

}

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {

if anchor is ARImageAnchor {

print("Item recognized")

}

To test this app, place the item that you took a picture of and stored in the AR Resources folder and put that item on a table or floor. Then follow these steps:

1.
Connect an iOS device to your Macintosh through its USB cable.
2.
Click the Run button or choose Product ➤ Run. The first time you run this app, it will ask permission to access the camera so give it permission.
3.
Load the picture on your computer that you stored in your project’s AR Resources folder.
4.
Aim the iOS device’s camera at the screen that displays the picture you want ARKit to recognize. When ARKit recognizes the image, it displays "Item recognized" in the debug area of Xcode.
5.
Click the Stop button or choose Product ➤ Stop.

Detecting Multiple Images

Just recognizing a single image would be limiting. That’s why to recognize multiple images, an app just needs to store multiple images in its AR Resources folder. Now an app will be able to recognize different images and respond based on what image it recognizes.

To identify which particular image ARKit recognizes, you can give each image a unique name. Now each time the app recognizes an image, it can retrieve the name of that image to determine exactly which image is in front of the iOS device’s camera.

When you store an image in the AR Resources folder , you must not only define its width and height, but also its name, which can be any arbitrary, descriptive name, as shown in Figure 14-3.

../images/469983_1_En_14_Chapter/469983_1_En_14_Fig3_HTML.jpg — Figure 14-3
Every image needs a unique name

Once you’ve added two or more images into the AR Resources folder , you can retrieve the name property of a recognized image to determine which image currently appears in front of the iOS device’s camera.

To see how to identify a recognized image, let’s modify the ImageDetection project by following these steps:

1.
Drag and drop a second image into the AR Resources folder. Make sure you give that second image a unique name so both stored images have different names.
2.
Underneath the classViewController line, create a structure like this:
struct Images {
var title: String
var info: String
}
3.
Next, create an empty array to hold this structure like this:

var imageArray: [Images] = []
4.
In the viewDidLoad function, add the following function call:

getData()
5.
At the bottom of the ViewController.swift file, write the following getData function. In this example, the project contains two pictures of rockets so the text in the getData function reflects this, but you can type any text you want that best corresponds with the two images stored in your project:
func getData() {
    let item1 = Images(title: "CRS-15 SpaceX rocket", info: "Commercial Resupply Service")
    let item2 = Images(title: "Saturn V rocket", info: "Apollo moon launch vehicle")
    imageArray.append(item1)
    imageArray.append(item2)
}
The getData function creates two structures and fills the elements of those structures with text for the title and info properties. Then it stores that structure into an array. Now we need to use the name property of each image to identify what image the app currently recognizes.
6.
Write the following nodeFor renderer function :
func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
    guard let imageAnchor = anchor as? ARImageAnchor else { return nil }
    switch imageAnchor.referenceImage.name {
    case "CRS-15":
        print(imageArray[0].title)
        print(imageArray[0].info)
    case "SaturnV":
        print(imageArray[1].title)
        print(imageArray[1].info)
    default:
        print("Nothing found")
    }
    return node
}

This function runs when it recognizes an image (ARImageAnchor). Then it uses the name property of the recognized image to determine which information to display. The entire ViewController.swift file should look like this:

import UIKit

import SceneKit

import ARKit

class ViewController: UIViewController, ARSCNViewDelegate {

@IBOutlet var sceneView: ARSCNView!

let configuration = ARWorldTrackingConfiguration()

struct Images {

var title: String

var info: String

}

var imageArray: [Images] = []

override func viewDidLoad() {

super.viewDidLoad()

// Do any additional setup after loading the view, typically from a nib.

sceneView.debugOptions = [ARSCNDebugOptions.showWorldOrigin, ARSCNDebugOptions.showFeaturePoints]

sceneView.delegate = self

guard let storedImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else {

fatalError("Missing AR Resources images")

}

configuration.detectionImages = storedImages

getData()

sceneView.session.run(configuration)

}

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {

if anchor is ARImageAnchor {

print("Item recognized")

}

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {

guard let imageAnchor = anchor as? ARImageAnchor else { return nil }

switch imageAnchor.referenceImage.name {

case "CRS-15":

print(imageArray[0].title)

print(imageArray[0].info)

case "SaturnV":

print(imageArray[1].title)

print(imageArray[1].info)

default:

print("Nothing found")

}

let node = SCNNode()

node.addChildNode(planeNode)

return node

}

func getData() {

let item1 = Images(title: "CRS-15 SpaceX rocket", info: "Commercial Resupply Service")

let item2 = Images(title: "Saturn V rocket", info: "Apollo moon launch vehicle")

imageArray.append(item1)

imageArray.append(item2)

}

To test this code, follow these steps:

1.
Connect an iOS device to your Macintosh through its USB cable.
2.
Click the Run button or choose Product ➤ Run.
3.
Point the iOS device’s camera at one of the images stored in the app’s AR Resources folder .
4.
The Xcode debug area displays text related to your image.
5.
Click the Stop button or choose Product ➤ Stop.

Displaying Information in Augmented Reality

Right now, our app only displays information about each image in the Xcode debug area, which users will never see. What we really need to do is take the information about each recognized image and display it within the augmented reality view.

First, we need to identify the boundaries of the image that the app recognizes. We can do that be creating a plane that’s the exact width and height of the recognized image like this:

let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height)

Now we need to give the plane a color so we can see it. Later we’ll make this plane clear, but for now, we want to make sure that the plane completely covers any recognized image:

plane.firstMaterial?.diffuse.contents = UIColor.yellow

Next we need to create a node to hold the plane. Since the plane will appear flat, we also need to rotate it -90 degrees so it faces the camera:

let planeNode = SCNNode()

planeNode.geometry = plane

let ninetyDegrees = GLKMathDegreesToRadians(-90)

planeNode.eulerAngles = SCNVector3(ninetyDegrees, 0, 0)

Finally, we need to add this planeNode to the augmented reality view:

let node = SCNNode()

node.addChildNode(planeNode)

The entire nodeFor renderer function should look like this:

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {

guard let imageAnchor = anchor as? ARImageAnchor else { return nil }

let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height)

plane.firstMaterial?.diffuse.contents = UIColor.yellow

let planeNode = SCNNode()

planeNode.geometry = plane

let ninetyDegrees = GLKMathDegreesToRadians(-90)

planeNode.eulerAngles = SCNVector3(ninetyDegrees, 0, 0)

switch imageAnchor.referenceImage.name {

case "CRS-15":

print(imageArray[0].title)

print(imageArray[0].info)

case "SaturnV":

print(imageArray[1].title)

print(imageArray[1].info)

default:

print("Nothing found")

}

return node

}

let node = SCNNode()

node.addChildNode(planeNode)

return node

}

If you test this code, you’ll see that as soon as you point an iOS device’s camera at an image that the app can recognize, it immediately covers the entire image with a yellow plane, as shown in Figure 14-4. Try this with different size images to see that the yellow plane correctly recognizes the shape of every image stored in its AR Resources folder .

../images/469983_1_En_14_Chapter/469983_1_En_14_Fig4_HTML.jpg — Figure 14-4
A yellow plane completely masks a recognized image

The reason for knowing the recognized image’s size is so we can properly display text in the lower-left corner of that image. We’ll be displaying both title and info of each image based on the structure we created earlier:

struct Images {

var title: String

var info: String

}

Before we write any more code and we know that the plane completely covers up any recognized images, we don’t need to see the plane any more, so we can make it clear like this:

plane.firstMaterial?.diffuse.contents = UIColor.clear

Although the plane still exists, it’s hidden from the user. We just need to use the plane as a reference so we know where to place text. First, we want to place the title text in the lower-right corner of each recognized image. This involves creating SCNText using the title text stored in our structure, defining a flatness value for our SCNText as 0.1, and a font size of 10 points like this:

let title = SCNText(string: imageArray[0].title, extrusionDepth: 0.0)

title.flatness = 0.1

title.font = UIFont.boldSystemFont(ofSize: 10)

Now we need to create an SCNNode , assign the SCNText to that node, and color it white while scaling its size smaller:

let titleNode = SCNNode()

titleNode.geometry = title

titleNode.geometry?.firstMaterial?.diffuse.contents = UIColor.white

titleNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

Feel free to experiment with other scaling values besides 0.0015 until you achieve the text appearance you like best. Finally, we need to place this title text relative to the plane and add it to the planeNode like this:

titleNode.position.x = -Float(plane.width) / 2.2

titleNode.position.y = -Float(plane.height) / 2.2

planeNode.addChildNode(titleNode)

The x and y positions are defined by the plane’s width and height divided by 2.2. Experiment with different values so you can see how higher or lower values affect the location of text.

Displaying the info text for each image is similar like this

let info = SCNText(string: imageArray[0].info, extrusionDepth: 0.0)

info.flatness = 0.1

info.font = UIFont.boldSystemFont(ofSize: 8)

let infoNode = SCNNode()

infoNode.geometry = info

infoNode.geometry?.firstMaterial?.diffuse.contents = UIColor.gray

infoNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

infoNode.position.x = -Float(plane.width) / 2.2

infoNode.position.y = -Float(plane.height) / 1.8

planeNode.addChildNode(infoNode)

The entire ViewController.swift file should look similar to the following except you may choose different text to match the specific images you stored in your AR Resources folder:

import UIKit

import SceneKit

import ARKit

class ViewController: UIViewController, ARSCNViewDelegate {

@IBOutlet var sceneView: ARSCNView!

let configuration = ARWorldTrackingConfiguration()

struct Images {

var title: String

var info: String

}

var imageArray: [Images] = []

override func viewDidLoad() {

super.viewDidLoad()

// Do any additional setup after loading the view, typically from a nib.

sceneView.debugOptions = [ARSCNDebugOptions.showWorldOrigin, ARSCNDebugOptions.showFeaturePoints]

sceneView.delegate = self

guard let storedImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else {

fatalError("Missing AR Resources images")

}

configuration.detectionImages = storedImages

getData()

sceneView.session.run(configuration)

}

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {

if anchor is ARImageAnchor {

print("Item recognized")

}

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {

guard let imageAnchor = anchor as? ARImageAnchor else { return nil }

let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height)

plane.firstMaterial?.diffuse.contents = UIColor.clear

let planeNode = SCNNode()

planeNode.geometry = plane

let ninetyDegrees = GLKMathDegreesToRadians(-90)

planeNode.eulerAngles = SCNVector3(ninetyDegrees, 0, 0)

switch imageAnchor.referenceImage.name {

case "CRS-15":

let title = SCNText(string: imageArray[0].title, extrusionDepth: 0.0)

title.flatness = 0.1

title.font = UIFont.boldSystemFont(ofSize: 10)

let titleNode = SCNNode()

titleNode.geometry = title

titleNode.geometry?.firstMaterial?.diffuse.contents = UIColor.white

titleNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

titleNode.position.x = -Float(plane.width) / 2.2

titleNode.position.y = -Float(plane.height) / 2.2

planeNode.addChildNode(titleNode)

let info = SCNText(string: imageArray[0].info, extrusionDepth: 0.0)

info.flatness = 0.1

info.font = UIFont.boldSystemFont(ofSize: 8)

let infoNode = SCNNode()

infoNode.geometry = info

infoNode.geometry?.firstMaterial?.diffuse.contents = UIColor.gray

infoNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

infoNode.position.x = -Float(plane.width) / 2.2

infoNode.position.y = -Float(plane.height) / 1.8

planeNode.addChildNode(infoNode)

case "SaturnV":

let title = SCNText(string: imageArray[1].title, extrusionDepth: 0.0)

title.flatness = 0.1

title.font = UIFont.boldSystemFont(ofSize: 10)

let titleNode = SCNNode()

titleNode.geometry = title

titleNode.geometry?.firstMaterial?.diffuse.contents = UIColor.white

titleNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

titleNode.position.x = -Float(plane.width) / 2.2

titleNode.position.y = -Float(plane.height) / 2.2

planeNode.addChildNode(titleNode)

let info = SCNText(string: imageArray[1].info, extrusionDepth: 0.0)

info.flatness = 0.1

info.font = UIFont.boldSystemFont(ofSize: 8)

let infoNode = SCNNode()

infoNode.geometry = info

infoNode.geometry?.firstMaterial?.diffuse.contents = UIColor.gray

infoNode.scale = SCNVector3(0.0015, 0.0015, 0.0015)

infoNode.position.x = -Float(plane.width) / 2.2

infoNode.position.y = -Float(plane.height) / 1.8

planeNode.addChildNode(infoNode)

default:

print("Nothing found")

}

let node = SCNNode()

node.addChildNode(planeNode)

return node

}

func getData() {

let item1 = Images(title: "CRS-15 SpaceX rocket", info: "Commercial Resupply Service")

let item2 = Images(title: "Saturn V rocket", info: "Apollo moon launch vehicle")

imageArray.append(item1)

imageArray.append(item2)

}

If you run this app, you’ll see the title text appear in the lower-right corner of the image while the info text appears below, as shown in Figure 14-5.

../images/469983_1_En_14_Chapter/469983_1_En_14_Fig5_HTML.jpg — Figure 14-5
Displaying text in an augmented reality view

Summary

Image detection works by storing one or more images in a special AR Resources folder and then using the iOS device’s camera to run multiple renderer functions when ARKit recognizes one of its many stored images. When you drag and drop an image into the AR Resources folder, you must define that image’s physical width and height. In addition, you also need to give each image a distinct name so you can use that name later to identify which recognized image the iOS device’s camera currently has found.

Once you’re able to use the name of each recognized image to specify which particular image has been recognized, you can display text related to that image within the augmented reality view. This text can appear positioned within or outside the boundaries of the recognized image.

Image detection works best with high-resolution pictures. By using image detection, your augmented reality apps can respond to real-world objects and provide information about that object as virtual text that’s only visible when seen through the camera of an iOS device.

14. Image Detection

Storing Images

Detecting Multiple Images

Displaying Information in Augmented Reality

Summary