9Anatomy of a Driverless Car

Driverless cars “see” and “hear” by taking in real-time data that flows in from several different types of on-board sensors. Cars recognize their current location using a GPS device and a high-definition stored digital map. Let’s take an in-depth look at the suite of hardware devices that provide data to the car’s operating system.

High-definition digital maps

Humans learn their way around a new neighborhood by recognizing distinctive landmarks. Driverless cars find their way around with a GPS, with visual sensors, and by following a high-definition (HD) digital map, a detailed and precise model of a region’s most important surface features. Driverless cars use machine-learning software to deal with real-time traffic situations, and rich, detailed, and constantly updated high-definition digital maps to handle longer term navigation.

A driverless car knows its ballpark location by looking up its GPS coordinates on a high-definition digital map. GPS coordinates, however, tend to be a few feet off the mark, making them insufficient for autonomous driving. Driverless-car designers have come up with different techniques to compensate for the inability of GPS data to pinpoint the car’s exact location. The operating system of early driverless cars placed more weight on stored data from digital maps and less on real-time GPS and sensed data. As the performance of machine-learning software and visual sensors—particularly digital cameras—improves, it’s increasingly common for a car’s operating system to calculate its current location by relying on visual cues in the flow of real-time sensor data that depicts the nearby environment.

HD maps differ from standard digital maps in their degree of detail. An HD map depicts both big geographical features, such as mountains and lakes, and minor topographic details, such as the presence of trees and sidewalks. An HD map for a driverless car focuses on the static surface details of a road or intersection, for example, its lane markings, intersections, construction zones, and road signs.

10222_009_fig_001.jpg

Figure 9.1 High-definition map of an intersection, overlaid with sensor data.

Source: HERE

Traditional maps created for human eyes were two-dimensional pictorial depictions of a particular place where notable landmarks were indicated by static labels. In contrast, HD digital maps have a concealed powerful back-end. While an HD map usually offers its user a pictorial depiction of the region, behind the scenes, it’s actually a database that contain millions of stored entries of topographical details, each logged along with other relevant details such as its geographical location, size, and orientation.

The brain of the average human houses a high-quality local map. In fact, our brains enjoy an “auto-update” and “auto-correct” capacity that any software engineer or digital cartographer would envy. Updating a high-definition digital map is a laborious process that involves exhaustively driving around with several cameras and lidar (laser radar) sensors, a process we will discuss in detail later in this chapter.

Digital cameras

Digital maps are stored, static data that help identify a car’s location. In contrast, digital cameras are the equivalent of human eyes, capturing the visual environment outside the car in a stream of real-time data. As digital camera technology continues to get faster and more precise, roboticists have eagerly harnessed these rapid advancements to improve the performance of mid-level control software.

Just two decades ago, Apple’s Quicktake 100 was considered a cutting-edge digital camera. The QuickTake, manufactured by Kodak, was famed for its portability and the fact that it could store eight 640 × 480 color images at a time (while weighing in at a dainty 16 ounces). Today an average consumer camera can take thirty high-resolution images a second.

It’s important to understand how digital cameras work, since the structure of a digital image feeds directly into the deep-learning software. A digital camera gathers light through a lens in the form of photons. Each photon carries a certain amount of energy. As the photons stream through the camera’s lens, they land on a silicon wafer that’s made up of a grid of tiny individual photoreceptor cells.

Each photoreceptor absorbs its share of photons and translates the photons into electrons, which are stored as electrical charges. The brighter the stream of light, the higher the number of photons and the stronger the electrical charge. The amount of light hitting the grid of photoreceptors is then transformed into a format a computer can understand: a collection of numbers on a grid that represent the location of each individual “picture element,” or pixel. JPEGS, GIFs, and all other image files are just different ways of storing this array of light intensities.

Digital cameras borrow some concepts from mammalian eyes. The silicon sensor is somewhat analogous to the retina: in both, visual data is broken up into several smaller units in order to be processed. On the retina, millions of specialized biological photoreceptor cells called rods and cones absorb photons and convert the energy into neural signals that are sent to the brain to be processed into visual information.

In the human eye rods and cones are arranged in a random fashion, densely packed in the center of the retina and less densely packed around the edges. In contrast, in a silicon sensor inside a digital camera, individual pixels are arranged in a rectangular pattern with regular spacing. A one megapixel camera contains a silicon sensor that has an array of 1000 × 1000 individual photoreceptors that correspond to a total of 1,000,000 pixels.

Some specialized digital cameras used for autonomous driving do more than just record pixel values. Rather than outputting an array of raw numbers direct from the silicon sensor’s grid of pixels, advanced automotive cameras also analyze the image data in real time, inside the camera’s hardware. This way, image processing is faster and the camera can eliminate irrelevant information before sending it upstream to the mid-level control software.

10222_009_fig_002.jpg

Figure 9.2 What your eye sees (left) versus what the cameras sees (right). Can you tell the difference between the human and the background just by looking at the numbers?

Source: Photo of Manhattan 14th Street, looking west from Fifth Avenue; Wikipedia.

More sophisticated automotive cameras take it one step further and begin the process of making sense of what’s contained in the images by making a list of objects being detected and tabulating the result. For example, an automotive camera will describe a scene, “1. There’s a pedestrian in the upper left corner, moving left at a speed of 1.23 meters per second. 2. Far right, fire hydrant. Static. 3. Left lane, truck approaching at a speed of 5 meters per second 4. Southeast, unidentified object. Static.”

Biological life forms have two (or more) eyes placed side by side, an adaptation that enables depth perception, or what biologists call stereo vision. Digital cameras, however, do not have stereo vision, a limitation that has been one of the biggest problems in their application to autonomous driving. Digital cameras capture information on the intensity of light into a grid of pixels, an elegant way to digitally capture a three-dimensional world into a two-dimensional format. What gets omitted during this capture process, unfortunately, is a piece of information critical in enabling depth perception: how far away objects are from the camera.

Several different techniques are being explored to overcome this inherent limitation. One solution is to place multiple digital cameras on the same car. On a driverless car, many cameras are strategically placed to capture the same scene from slightly different viewing angles, a placement that enables the car’s on-board computer to reconstruct a 3-D model of the scene, enabling better understanding of the surrounding space.

Another potential solution is structured-light cameras that use a camera-projector combo that augments image data with depth information. To emulate depth perception, structured-light cameras project a pattern onto a scene and measure its distortion. By the degree of distortion, a structured-light camera can calculate depth. While structured-light cameras like the XBOX Kinect are great for indoor applications such as interactive video games, it’s not clear yet whether they’ll find a home in driverless cars.

One of the biggest weaknesses of structured light cameras is that while they offer rapid depth perception, they don’t work so well during daylight hours, since the reflected light pattern can get muddled by natural light. Nor do they work well beyond about ten meters, a potentially fatal shortcoming when placed on speeding vehicles. Because of these limitations, the best application for structured-light cameras is in indoor settings, perhaps guiding driverless cars through parking garages, or inside the car’s cabin, sensing the physical whereabouts and movements of the car’s passengers.

Digital cameras continue to improve by leaps and bounds, yet ironically they have a low-tech Achilles heel: dirt. Even the best automotive digital camera will be rendered blind by a splash of muddy water. Roadside dust, sand, bird droppings, bugs, and other indignities of outdoor driving can render useless the most sophisticated digital cameras and machine-vision software. Perhaps the solution will be similarly low-tech—equipping on-board cameras with their own cleaning mechanism similar to the windshield wiper that human drivers rely on, or tears and eyelids that their human counterparts use.

Many of the solutions to the limitations of digital cameras will eventually lie in the car’s operating system. To ensure its on-board cameras are clean and dry, each driverless car should be equipped with a software tool that executes a periodic self-test on the quality of the data from the digital cameras. As the artificial-intelligence software that guides the car’s mid-level control software continues to improve, someday it will have the ability to autocorrect faulty visual data, helping a car see in conditions that would blind a human driver, such as fog, heavy rain, and blinding sun.

Light detection and ranging (lidar)

Another primary image sensor is the lidar, an acronym of “light detection and ranging,” also called laser radar. A digital camera works by breaking down the three-dimensional visual world into a two-dimensional matrix of pixels. In contrast, a lidar device “spray paints” its surroundings with intense beams of pulsed light, measures how long it takes for each of those beams to bounce back, and then calculates a three-dimensional digital model of its nearby physical environment.

Like digital cameras, lidar sensors have also followed the Moore’s Law trajectory, morphing from gigantic and expensive stationary devices in the 1960s to today’s robust and portable devices. Unlike digital cameras, however, lidar sensors are still more expensive than the average person can afford. While their cost is dropping each year, in 2016 a sixteen-channel lidar sensor made by a company called Velodyne, weighing 600 grams and accurate to within a few centimeters, cost $8,000.

Lidar sensors have been used for decades by surveyors to capture the topographic details of parcels of land. The notion of mounting a device onto a moving vehicle to shoot laser beams into the environment is a more recent innovation. During the early years of robotic autonomous vehicles before the advent of modern digital cameras, lidar was the gold-standard sensor for visual data in autonomous vehicles. In all three DARPA Challenges, lidar played a critical role in capturing the visual scene in front of the car. The iconic cones on top of Google’s first-generation fleet of self-driving Priuses were lidar sensors.

Lidar sensors have been a crucial tool in driverless cars since the 3-D digital model they generate is highly detailed and contains accurate depth perception. They work by sending one or more laser beams into the surrounding environment and recording the time it takes for the laser signal to reflect off of the object. Since light travels at a speed of about one foot per billionth of a second, a lidar sensor with gigahertz speed microprocessors can measure depth at single-centimeter resolution.

A laser beam is an ideal measurement tool. Unlike a candle or an incandescent lamp, which radiates light in all directions, a laser beam shines in a straight line for a great distance. The laser beam does not spread out like light from a flashlight; it remains collimated—that is, parallel—no matter whether it’s striking an object just a few feet or dozens or yards away.

To create a full 3-D digital image of the surrounding environment, a lidar sensor spins its laser beams around and around at high speeds. A set of rotating mirrors deflects the laser beam in a rotational scanning motion. A lidar sensor, like a digital camera, can vary in resolution. Multiple beams can work together continuously to scan and measure the surrounding environment in parallel. The more laser beams there are in play, the higher the resolution of the resulting digital model of the scene.

Imagine a room full of ornately shaped but invisible objects. Then imagine taking a can of red spray paint and coating all the invisible objects with paint until they are fully visible. If you were armed only with a single can of paint, it would take you a long time to “see” the shape of the invisible objects. However, if several people had cans of red spray paint, the invisible objects would quickly be coated with paint, and therefore visible. Lidar sensors work much the same way.

In a driverless car, the data generated by a lidar is fed to software that arranges the information in a digital model called a point cloud. If the laser beams had been pointed straight up into the distant sky, the digital model would be blank, void of solid objects that would reflect the beam’s light. If the laser beams were directed onto a city street during rush hour, however, the resulting point cloud would be full of interesting details.

Watching a digital point cloud emerge is somewhat akin to watching a hologram emerge in thin slices. The lidar’s laser beams are aimed outward in a specific pattern. The spinning mirrors train the lasers into a series of rapid, horizontal passes over the road in front of the car. The digital point cloud that’s built from lidar data is made up of lots of finely textured scan lines, each row in the digital model corresponding to one scan line of the spinning mirrors.

10222_009_fig_003.jpg

Figure 9.3 View of 3-D point cloud data captured from a lidar mounted on a car driving through a bustling intersection.

Source: Alex Kushleyev and Dan Lee, University of Pennsylvania

Figure 9.3 shows a point cloud generated by lidar. If you look closely, you can see the scan lines from the horizontal sweeps of the spinning mirrors. A casual observer might conclude that a lidar point cloud is pretty much the same thing as a digital image. In reality, a lidar point cloud and a digital photograph differ from one another in many ways.

One critical difference is that lidar sensors do not capture color information. The image shown above has a ghostly, romantic feel as if it were generated during a moonlight drive right after a snowstorm. In reality, the software that interpreted the point cloud adds the color, artificially, coding closer objects in blue tones and more distant objects in red ones. The black sky indicates the absence of any physical object, or that the laser beam was not reflected back.

A second difference between lidar point clouds and a digital photograph is the point in time that’s depicted. A spinning lidar sensor continually refreshes the digital model it generates. On the one hand, this is advantageous since the point cloud is constantly updated; on the other hand, however, the entire process is nowhere near the instantaneous “snap” of a digital camera. Lidar sensors are slow, and while highly effective for depicting the contours of a static landscape or a slow-moving traffic jam, they cannot feed visual data to a computer fast enough to provide the split-second reflexes needed in some emergency driving situations.

Today’s driverless cars use both digital cameras and lidar. In the AI-poor world of decades past, lidar was the more essential visual sensor. Today, lidar sensors are expensive and slow compared to digital cameras, but the point clouds they generate can guide a moving car through the majority of routine driving environments.

In the past few years, digital cameras have finally come into their own as a tool for mobile robotics. The long-standing bottleneck that delayed the use of digital cameras as a machine-vision sensor was their poor 3-D perception. Unpacking arrays of pixels in order to process them requires vast amounts of computing power, which results in very poor real-time performance, a serious shortcoming in a mobile robot. As microprocessor speeds continue to improve the performance of both the digital camera and the software that processes the digital images, it’s quite likely that digital cameras could replace lidar as the queen of the visual sensors.

Some experts agree. In a press conference in October 2015, Tesla CEO Elon Musk commented on technologies for Tesla’s future driverless cars. “I don’t think you need lidar. I think you can do this all with passive optical and then with maybe one forward RADAR,” he said. “I think that completely solves it without the use of lidar. I’m not a big fan of lidar, I don’t think it makes sense in this context.”1

RAdio Detection and Ranging (Radar)

In addition to cameras and lidar, driverless cars use radar sensors to “look” at the nearby environment. If digital cameras capture a scene in a pixelated grid and lidar sensors are the equivalent of a can of digital spray paint, a radar sensor is similar to the surface of a pond. The way a radar sensor works is reminiscent of the process of throwing stones into a body of water and keeping track of where the ensuing ripples go as they ricochet back and forth.

Radar has its roots in military applications. During World War II, radar towers were placed on beaches and fields to detect the approach of enemy aircraft, ships, and incoming missiles. After the war, air-traffic controllers used radar to track and confirm flight trajectories of commercial airliners. Many people have felt the direct effects of radar technology if they’ve ever received a speeding ticket from the highway patrol.

In another demonstration of Moore’s Law, radar sensors have become small and robust enough to be mounted on a moving car. Radar sensors are used in modern human-driven cars in adaptive cruise control technology. A built-in radar device senses the speed and location of cars in front of and behind a car, so the cruise control can adjust the brake and gas pedal accordingly. Another common driver-assist application for radar sensors is to warn a driver if another car is in his blind spot.

10222_009_fig_004.jpg

Figure 9.4 Raw target density plot of a forward-looking radar (left), and corresponding front view from the car (right). Large static objects are captured (parking cars, building barriers, street lamps). Grid based operation at 24Ghz.

Source: SmartMicro 3DHD

A radar sensor detects the presence of physical objects in the nearby environment using electromagnetic wave echo. A radar device sends out a series of electromagnetic waves that radiate outward. A radar sensor consists of a transmitter, the unit that sends out the electromagnetic waves, and a receiver, the device that awaits their return.

If the waves do not encounter an object in their path, the waves continue their circular expansion outward until they are lost in the distance. If they do encounter an object in their path, the wave ricochets off of it and changes direction. Since electromagnetic waves, also known as radio waves, travel at the speed of light, this entire process works very rapidly.

Radar sensors are increasingly sensitive and intelligent. Since the returning waves are significantly weaker than when they departed, the receiver uses amplification techniques to detect the whisper-quiet echo. To prevent the sensor from accidentally picking up the waves emitted by another nearby radar transmitter, the electromagnetic wave is sent out accompanied by a unique signature “chirp.”

Waves convey a surprising amount of information. Shape and timing of the reflected waves provide insight into the reflecting object’s shape and what material it is made of. Some radar sensors can calculate which direction a reflecting object is moving by analyzing changes in the frequency of the reflecting wave.

The wavelength of an electromagnetic wave is the distance from the crest of one wave to the crest of the one behind it. Different radar sensors employ different wavelengths. Waves spaced further apart—long wavelengths—travel farther. Yet since they are more likely to overlook small objects in the environment, they tend to offer a less precise reading of what’s in the nearby environment. Short-range radar sensors that send microwaves into the distance can detect objects as small as a cat or as thin as a bicycle.

Electromagnetic waves reflect best off surfaces that have high electric conductivity—think smooth, glossy surfaces such as those of a shiny metal bicycle or a wet road surface. Nonconductive objects, things made of porous plastic or wood, appear relatively “transparent” to the radar and are more difficult to detect. Fortunately, most cars, even those made up of a lot of plastic parts, have plenty of metal in them, and hence are easily detectable to a radar sensor.

A radar sensor can “look” only in a particular, narrow direction, so most radar sensors are mounted in arrays that overlap slightly. On a driverless car, a typical configuration will be that three radars will be mounted side by side in a way that gives them good 180-degree coverage of the field of view. For autonomous driving, the great advantage of radar sensors is that, unlike cameras, they can “see” through fog, rain, dust, sand, and even blinding headlights.

Another advantage is that electromagnetic waves travel easily through nonconductive and thin materials, so they’re not going to be distracted by a plastic bag blowing across the highway or a tumbleweed. Electromagnetic waves prefer larger objects, therefore they “notice” bulky obstacles that drivers care about. Conversely, the biggest drawback of a radar sensor is its relatively low resolution.

Another advantage of a radar sensor is that it can detect not just the position of an object, but (much to the chagrin of speeding drivers) also its speed, using the Doppler effect, named after the nineteenth-century Austrian physicist Christian Doppler. One of the most commonly observed manifestations of the Doppler effect is when a person standing near the side of a highway hears the “ZZZZOOOOooom” of a fast-approaching car that suddenly drops in pitch as the car speeds past. The sudden drop in pitch happens because the sound waves emitted by a speeding engine get compressed (sounding higher pitched) as the car speeds closer. After the car roars past, the sound waves expand, resulting in a lower pitch.

Radar sensors use the Doppler effect to track the speed of moving objects. By recording the change in frequency between outgoing and incoming electromagnetic waves, a radar sensor can determine if the sensed object is approaching or moving away. The sensor can also calculate the object’s speed. This velocity information is helpful for classifying exactly what a sensed object is. Something moving along the roadside at thirty miles an hour is probably not a human pedestrian.

On a driverless car, radar detectors complement the visual sensors in figuring out the surrounding environment. By sensing the size, density, speed, and direction of nearby objects, the radar sensor passes along useful information that can be compared against the images from the digital camera and the 3-D point cloud from the lidar sensor. Is that small object sitting by the side of the road a cat or an empty cardboard box? Watch out! A big metal box-like object is tailgating us!

Today’s radar detectors are much smarter than their ancestors. The radar of WWII displayed the raw analog echo signal on a greenish screen, with a rotating line representing the current scanning direction. Similar to the process used by advanced cameras, a modern radar sensor processes raw information into a target list consisting of objects, along with their size, location, and speed. That information is much more compact, requires less bandwidth to communicate, and is easier for the driverless “brain” to digest.

In order not to over-report, radars try to omit the presence of the road pavement itself from the list of sensed objects, even though the tarmac reflects waves. For this reason, some radar sensors automatically eliminate any nonmoving object from the report. While eliminating static objects from the target list is more efficient, it has its risks. The radar could overlook something large and deadly such as a stalled car parked under a bridge by assuming that the static car is merely part of the bridge’s infrastructure.

Ultrasonic sensors (Sonars)

If lidar and cameras are the equivalent of human eyes, sonar is like a human ear. Sonar is the close-range cousin of radar. Like radar, a device emits waves and detects their echoes, but sonar uses sound waves instead of radar’s electromagnetic waves. The term sonar combines “sound navigation” and “radar.”

A sonar sensor detects the position and speed of objects based on the time, frequency, and shape of sound waves reflecting off their surfaces. A sonar device is composed of two subunits: an emitter and a receiving sensor. The emitter generates sound waves that have a frequency above 20 kHz, a sound beyond the range of human hearing. The receiver listens for the echoes from the emitted sound waves and processes them.

Sonar sensor share many of the benefits and drawbacks of lidar and radar. Like radar sensors, they can see through fog and dust, and they are not blinded by sun. Since sound waves travel much more slowly than electromagnetic waves, they can see much smaller objects at much higher resolution. But because their energy decays rapidly with distance and wind, sonar can detect objects only at a much closer range. Sonar therefore often complements radar for applications that require close-range precision detection, such as parking.

Global positioning systems (GPS)

So far, we’ve covered digital maps and sensors. One of the more critical technologies of mobile robotics is neither map nor sensor, although it plays an essential role in guiding a driverless car. A global positioning system (GPS) device supplies coordinates to pin down a car’s exact location on its HD digital map.

GPS is another decades-old technology with roots in the military that, following Moore’s Law, has blossomed into a reliable, low-cost consumer appliance. Just a few decades ago, the typical GPS receiver was the size of a refrigerator; today they are small chips that can be embedded into cell phones, cameras, laptops, and cars.

GPS devices are miracles of advanced engineering that listen to signals from satellites rotating in the heavens. A GPS receiver in your car or cell phone determines your latitude and longitude by listening to beeps that arrive from a family of satellites spinning high above the earth. Each satellite follows an exactly prescribed orbit, all the while emitting a steady stream of electric pulses precisely once a second.

Twenty-four satellites provide GPS signals, but the GPS receiver needs only four to calculate its own location on Earth. Each satellite emits its own unique signature beep, which enables the GPS receiver to attribute a particular beep to its satellite of origin. As beeps stream into the GPS receiver, the receiver listens carefully. By calculating the time lapse between beeps, a GPS receiver is able to calculate its own exact location using a mathematical process known as triangulation. If beeps from two satellites arrive at exactly the same time, then the GPS receiver knows that it must be somewhere on the bisector plane that is half way between them. A total of four satellites are needed to pinpoint exactly where the receiver is; additional satellite signals refine the position even further.

In a normal driving environment, a typical GPS receiver is accurate to a distance of about four meters, or roughly thirteen feet.2 If the location information GPS receivers provide were perfect and up-to-date, the job of building a driverless car would be much easier. Unfortunately, satellite signals can be blocked or delayed as a result of atmospheric turbulence, clouds, or rain, which can result in an inaccurate calculation.

Another serious problem in urban driving is that of reflected pulses. If you have ever used a GPS receiver in Manhattan, you’ve experienced what happens when satellite pulses bounce off tall skyscrapers. The confused GPS begins to go berserk, informing you of a new location position every few minutes in a seemly random fashion. What’s happening is that pulses arriving from satellites ricochet off tall skyscrapers, giving the GPS receiver an illusion that the pulses are arriving at a slightly different time. This urban canyon effect can mislead even the best GPS receivers.

The inner ear (IMU)

GPS failure can be catastrophic. The solution to the potentially deadly problem of losing satellite reception is another descendent of military technology, a device that serves two critical functions: it compensates for GPS inaccuracies, and it serves as a driverless car’s “inner ear,” sensing, literally, which way is up.

An inertial measurement unit (IMU) is a multipurpose device that serves several functions. An IMU contains acceleration and orientation sensors that keep track of the car’s position, which way it’s nose is pointing, whether the tires on the left side are level with those on the right, and so on. A modern IMU is a complex bundle of devices including an odometer, accelerometer, gyroscope, and compass, whose combined data is fused together and parsed with sophisticated estimation algorithms.

An IMU is a unique sensor in that its purview is confined to the car’s own body. Humans have a set of roughly equivalent senses called proprioceptive senses. Proprioceptive senses, unlike our outer-facing ones such as sight and hearing, keep track of what’s going on inside our bodies. The sense of balance is a proprioceptive sense. If you close your eyes in a train leaving the station, the sense of acceleration you feel, knowing you’re moving forward without visual confirmation, is another proprioceptive sense.

To keep track of a car’s exact location between GPS readings and to compensate for GPS inaccuracies, an IMU uses an ancient navigational technique known as dead reckoning. For centuries, mariners navigated the open seas by referencing the location of the stars. Problems arose, however, during stretches of stormy weather, when the stars were hidden behind a layer of clouds. Dead reckoning enabled sailors to calculate their ship’s location by measuring how far their ship had traveled since the last time they saw the stars. By measuring relative, rather than absolute, geographical location, sailors could keep their ship mostly on course until the skies cleared and the stars were once again visible to guide them.

Dead reckoning worked as follows. During a stretch of cloudy weather, sailors would drop a rope with regularly spaced knots over the back of the ship as it sailed forward. They would count how quickly the knots on the rope flew overboard, which enabled the sailors to calculate their ship’s speed. Even today the rate at which a ship is moving forward is measured in knots. Once sailors knew how fast their ship was moving and in which direction (using a compass), even without seeing the stars, they could calculate how far their ship had traveled from their last known location, a navigation point known as a fix.

On a driverless car, the IMU uses a similar approach when a car goes into a tunnel or travels through an urban canyon that blocks satellite signals. Rather than counting the rate at which knots on a rope slide overboard, the IMU uses its odometer to count the number of wheel revolutions from its last known location. Although wheel revolutions are a fairly precise mechanical action to count, the tally still accumulates uncertainty. The tires might slip as tire pressure changes or if the car changes lanes several times. On a curving segment of highway, the odometer reading might wind up with a different wheel revolution count depending on whether the car had driven on the inner or outer lane, a difference that, depending on distance traveled, can add up to tens of meters.

Since a simple odometer read from the fix isn’t perfectly accurate, an IMU brings in its accelerometer sensor to help address the problem. When a car is traveling at a constant speed, its acceleration—perhaps counterintuitively—is recorded as zero. Its acceleration varies only when the car increases its speed, slows down, or suddenly changes direction.

To calculate how far the car has driven since the GPS signal failed, the IMU combines data from the accelerometer and the odometer. An accelerometer, however, does not provide insight into which direction the car is driving. That’s where the compass comes in. An IMU paired with a GPS and a compass is a powerful and foolproof combination. But the IMU is more than just a pinch-hitter when the GPS fails. The IMU also provides a driverless car with a sense of balance.

Roboticists call a robot’s orientation in space its pose. A car’s pose is a measurement of which direction its nose is pointed and to what degree its body is tilted. To measure pose, we need to add another sensor to the bundle that makes up the IMU: a gyroscope. A gyroscope is a spinning wheel—either mechanical or, in some cases, optical—that is used to measure pose.

The IMU needs three pieces of information to measure and track the car’s physical orientation in space: which direction it’s facing, at what angle its nose is tilted up or down, and at what angle it’s tilted to the side. Once again, the ancient art of nautical navigation has left its imprint on the modern IMU. Ancient shipbuilders and modern aerospace engineers alike call these three different dimensions of a vehicle’s (or ship’s) orientation yaw (the degree to which a vehicle is turned left and right), pitch (how high or low the ship or vehicle’s noise points), and roll (how much it is tilting side to side).

That a car understand its own pose is an important safety feature, even if the GPS device is working just fine. Imagine an icy road where a car starts to skid. Its yaw measurement rapidly spins from zero to 360. As it skids downhill, its pitch shifts forward. If the downhill skid continues and two of its wheels leave the road, the car will begin to roll. Many modern cars have a built-in IMU that measures the car’s roll, pitch, and yaw in real time and feeds this data into software that can help the car right itself, unlock its brakes during a skid, or send out a call for distress if it is tipping precariously. Since the IMU tracks the car’s motions, it can respond by tightening seatbelts or dynamically stabilizing the shock absorbers on the car’s wheels.

The modern IMU is another example of the effect of Moore’s Law. IMUs developed during World War II, before the advent of silicon chips, were elaborate mechanical devices that were built to calculate optimal launch trajectories for rockets. In the 1980s, IMU technology was transformed by software and the invention of tiny sensors called micro electro-mechanical systems (MEMS) devices. The invention of MEMS technology changed IMUs from expensive and highly specialized devices used in space travel and military operations, into relatively tiny, affordable navigational units.

A high-end IMU that’s precise enough for commercial ships and submarines costs about a million dollars, but Moore’s Law is relentlessly driving down their cost. Simple IMU’s are inside every cell phone. Most smartphones can tell which direction the cell phone is pointing using a built in compass. If you’re stuck on a long flight and watching your fellow passengers play with their iPads, when they shake or jiggle to control a bouncy video game avatar, they’re using IMU technology.

Regardless of the price, the biggest weakness of the technology is that an IMU can’t work without a GPS for long without gradually drifting off course. The calculations from the various sensors contain tiny inaccuracies that become a problem when they accumulate over time. Without the guidance of accurate satellite data, like an ancient ship navigating through weeks of cloud-covered night skies, even high-end IMUs gradually wander off course.

Drive by wire

The driverless car’s sensors—its digital cameras, lidar, radar, sonar, and IMU devices—supply a steady stream of real-time data. The magic happens when these data streams are merged so the car’s operating system can process the data. As we covered in previous chapters, the car’s operating system uses several types of artificial-intelligence techniques to make rapid-fire decisions. The final step is converting these decisions into actual physical motions such as turning the steering mechanism or pressing the brake or gas pedal.

In the old days, engineers transformed a regular car into a driverless car by retrofitting it with special custom-built mechanical “drive by wire” contraptions that substituted for human hands and feet. These contraptions, called actuators, had to actually swivel the steering wheel or compress the brake pedal. Building accurate and reliable mechanical actuators was an engineering specialty of its own, nearly as complex a process as that of creating workable artificial intelligence to guide the car.

As car subsystems have become progressively more automated over the past two decades, the work of creating artificial driving “muscles” has gotten considerably easier. Computer-guided controls have replaced hydraulic and mechanical controls. Most modern cars have several computer-guided subsystems (that is, low-level controls) that contain embedded microprocessors that run several millions of lines of code. These days, a roboticist rigging up a driverless car, rather than creating a special mechanical “foot” to press a gas pedal, simply tinkers with the car’s electrical system.

Software is the ghost in the machine. A driverless car translates instructions from its operating system to its high-, low-, and mid-level controls using an electronic communication system. Today, the average human-driven car contains several subsystems—for example, the engine control unit (ECU), the antilock braking system (ABS), and the transmission control unit (TCU). These subsystems communicate with each other using a bus.

In computer lingo, a bus is a communication channel that transfers data from one component inside a computer to another. Fittingly, a city bus and a computer data bus have a similar etymological origin, both deriving from the Latin word omnibus, meaning “for all.” Like a city bus that delivers people from place to place, in a driverless car a data bus transfers data between different subsystems, similar to the universal serial bus (USB) that yokes together your computer’s mouse, keyboard, and printer.

Many vehicles today use the CAN bus protocol (CAN stands for “controller area network”) that ferries data back and forth at a rate of approximately 1 megabit per second (Mbps). The CAN bus protocol is a “point-to-point” protocol governed by the international standards ISO 11898 and 11519. The fact that it’s held to a publicly open standard means that any device that can plug into the CAN bus and “understand” the protocol can chime in to the conversation between the car’s modules.

Car companies will not normally advertise their specific control protocols publicly. However, they often share that information with other manufacturers that require access to vehicle’s controls to install a new built-in device. Or, they may share a detailed description of network protocols with an automotive manufacturer that sells a vehicle chassis to another carmaker that builds RVs, for example.

In an automotive CAN bus, as in any network, key challenges are the network’s bandwidth and its reliability. Bandwidth is the maximum rate at which data can be transferred through the wires that make up the network, a metric that’s usually measured in bits per second (bps). In a wireless network, physical wires are replaced by radio waves, or frequency channels. A network’s bandwidth is determined by the speed of the microprocessors that encode and decode the electric pulses sent along the wires or channels, and how many parallel channels are available in the bus.

Bandwidth is important in any network, but the speed of a network inside a moving car is even more critical. Most networks save time by using a set of agreed-upon codes to represent specific actions. For example, imagine that a driverless car’s software just issued a command, “BRAKE!” The car’s software would have ready a specific two-digit number that means “BRAKE,” which it would send through the network to the braking subsystem. Because a two-digit number is a small and efficient unit of meaning, it would take the system only about sixteen microseconds to transmit and receive the message.

Sixteen microseconds is an acceptable response time for a driverless car. It is about a thousand times faster than a blink of an eye, which takes about 100–400 milliseconds.3 Although a CAN bus can rapidly stream tiny units of data, bandwidth challenges arise in a driverless car when the car’s CAN bus is asked to handle data streams that pour from its various sensor systems.

The system slows down when streaming large globs of real-time data from the car’s sensors. On a network that passes along data at a rate of one million bits per second, it will take an endless eight seconds to send a one-megabyte image from a camera to the car’s mid-level control software module. If you imagine adding to the network load the data flowing in from the car’s other sensors it quickly becomes apparent that a million bits per second isn’t fast enough for driving. Under the load of real-time visual data, the car’s CAN bus would wind up limping along at a response speed unacceptable for real-world driving situations.

At some point in the future, automakers will need to agree on a robust and transparent communications standard for driverless cars that can handle large amounts of streaming sensor data and is resistant to data breaches. In other words, driverless cars need a high-bandwidth bus. On any network there are basically two ways to resolve communication bottlenecks: one, by increasing the number of electrical wires or channels available and streaming data through in parallel streams; and two, by using a compression algorithm to compact large chunks of data into more compact and therefore more efficient units. The point of compression can take place at the sensor level. For example, some automotive cameras contain built-in software that analyzes the images in real time and sends along only the information the camera considers relevant.

In addition to bandwidth, reliability is another critical characteristic of an on-board automotive network. Network reliability takes several forms. One is the ability to withstand hackers. A CAN bus can become a battlefield if a malevolent third party creates a device that interferes with the network’s ability to transmit data, somewhat like a belligerent and unruly guest who interferes with the polite conversation at the dinner table. A truly malevolent device could do more than just interfere; it could hijack control of the car once it gained access to its on-board network.

Another facet of reliability is the network’s tolerance of errors and its ability to correct ones caused by network noise. The CAN bus on a driverless car needs to use error-correction protocols that are as resilient and efficient as those used by aircraft avionics. Driving should be as smooth a process as downloading a music file, and more secure than the process of making a financial transaction.

Imagine the carnage that would result if a driverless car’s software sent along a message along the bus saying “increase throttle by 1 percent” which was misinterpreted by the fuel injector subsystem as “increase throttle by 100 percent.” To prevent such errors of miscommunication, an error-correction protocol provides oversight similar to that offered by a calm yet stern proofreader, double-checking the content of every message sent. Subsystems on a driverless CAN bus need to trust one another. A good communication protocol also enables subsystems to verify that the message they received was actually the message sent.

Given how vulnerable we humans are to malevolent attacks when we’re speeding along inside a vehicle—whether autonomous or human-driven—one would assume that carmakers would carefully encrypt the communication protocols on the CAN bus. Unfortunately, a security mindset has not yet taken root in the automotive industry.4 The modern car is not that difficult to hack into, perhaps because, in the past, some car owners delighted in tinkering with their car engine.

Some of a car’s vulnerability is intentional. Most cars have a physical connector called the on-board diagnostics (OBD) jack that mechanics plug into to diagnose mechanical problems. The OBD jack can be used by hobbyists to plug their own devices into the CAN bus. The physical port for the jack is typically hidden somewhere near the steering column. Carmakers consider the jack to be secure because it is inside the car’s cabin, and is therefore accessible only to people who have a key to the car.

Several commercial products take advantage of the fact that the OBD jack offers an easy window into a car’s operational back panel. One ingenious product called DASH5 is a phone app that uses Bluetooth to connect to the car’s diagnostics, or as the company’s advertising proclaims, “give your car a voice.” The way DASH works is that it “eavesdrops” on the CAN bus, tracking how many times a driver presses the brake or gas pedal.

DASH is a well-intentioned device that’s intended to help car owners drive more efficiently. It also aggregates the data and lets municipalities know statistics such as where on their roads drivers tend to hit the brake or swerve abruptly. The problem with devices such as DASH is privacy. As DASH knows everything about your driving habits, your car becomes as intriguing to bosses and marketers as your web browsing habits.

Are we there yet? From the point of view of the hardware sensors that provide the data, the answer is a resounding “yes.” The quality and cost of today’s sensor suite is more than adequate for driverless cars. In fact, as Moore’s Law continues to prevail, sensors will continue to become faster, cheaper, and better at exponential rates, doubling their sensitivity and halving their costs every so many months. Now let’s turn our attention back to software and delve into one missing piece of the puzzle that is finally falling into place: deep-learning technology, the crown jewel of the control software that guides a robot’s artificial perception and response.

10222_009_fig_005.jpg

Figure 9.5 Key sensors used in driverless cars. Most autonomous vehicles use multiples and combinations of some of these sensors.

Notes