WHAT DRIVES SELF-DRIVING CARS
Self-driving cars have been a staple of futurist predictions since the mid-1900s. The 1939 World’s Fair in New York had a General Motors exhibit that took viewers on a ride through a futuristic model of a 1960 automated highway. We all love the idea of taking a nap or watching a movie while our car drives us to our destination. Self-driving vehicles would also improve independence for the elderly and the disabled.
More importantly, humans are bad drivers. The NHTSA in the US says that 94 percent of serious vehicle crashes are due to human error.1 We get bored, we get tired, and we get distracted by texts, phone calls, and the radio. Add those factors to the people driving under the influence of alcohol and other drugs, and it is easy to see why the idea of self-driving cars is so compelling.
And while we are at it, we might also consider the potential benefits of all types of autonomous vehicles, including trucks, motorcycles, buses, boats, helicopters, ferries, drones, lawnmowers, tractors, and golf carts. After all, airplanes have had an autopilot mode for many years, and the same is true for self-driving trucks in mining areas.2 We are also just starting to see self-driving baggage trucks at airports.3
A driverless car can, at least theoretically, do several things better than people. It can attend to multiple objects in the environment (pedestrians, stop signs, birds, and road hazards) simultaneously. It will never get tired, drunk, or distracted. It can react much faster than humans to avoid accidents.4
If all cars were driverless, highways might have much less congestion. Instead of stop-and-go traffic, cars could go sixty, eighty, or more miles per hour with minimal distances between the vehicles.5 The elderly would not have to give up their independent mobility when they can no longer drive. Self-driving cars could transform the lives of blind and disabled people. Shuttle services could run 24/7 without paying (or charging) overtime. Analysts estimate that using self-driving taxis will cost half the amount of owning a car.6
So, you might be wondering, what is the holdup? Why aren’t we all piling into our self-driving minivans on the way to visit Grandma for the holidays? There is little question that, once perfected, driverless cars could reduce a considerable amount of stress and increase productivity tremendously. But we are not there yet.
THE ROAD SO FAR
Self-driving car research has a long history. Scientists created the first driverless vehicle in the mid-1920s. It was a radio-controlled car developed by the Houdina Radio Control Company. An operator in a trailing car used a remote control to maneuver the car on the streets of New York City.7 Other researchers patented an automated parallel parking system in 1933.8
The first car to visually recognize lane markings was developed in Japan in 1977 at the Tsukuba Mechanical Engineering Laboratory. It had two cameras that it used to capture images of the road, and it achieved speeds of up to nineteen miles per hour.
During the latter half of the 1980s, two independent projects, one in the US and one in Europe, produced compelling demonstrations of self-driving capabilities. The two projects used very different methods, but both achieved remarkable results.
The US program took place at Carnegie Mellon University (CMU), was named NavLab (for Navigation Laboratory), and produced a series of vehicles. The second NavLab vehicle (NavLab 2) was powered by a software program called ALVINN (for autonomous land vehicle in a neural network) that was funded by DARPA from 1986 to 1993.9 NavLab 2 was a Humvee that drove itself around CMU. Initially, it had a top speed of three and a half miles per hour, but it was able to drive seventy miles per hour by 1993. For all the NavLab test drives, a driver was behind the wheel, put the car in gear, performed the acceleration and braking, but took over the steering only when necessary. You can find a video on YouTube from a 1989 newscast showing the vehicle driving on a city street with the driver’s hands off the wheel.10 In 1995, a CMU research scientist and a graduate student drove NavLab 5 from Washington, DC, to San Diego with the driver’s hands off the wheel for 98.2 percent of the miles driven. The driver only had to steer for about 55 miles of the 3,100-mile-long trip.
ALVINN was a neural network–based supervised learning algorithm that took as input the images from four cameras attached to the vehicle, and it took as output the steering movements made by a human driver. In less than ten minutes of driving on a given road, it could learn enough to steer on that road. However, researchers had to retrain it for each road. One of the remarkable innovations produced by the CMU team was a system that used maps to analyze the safe speed for upcoming curves and warned the driver if the car was going too fast for the curve.11 This 1990s innovation is a feature I would like to see on my Tesla, which never seems to slow down for curves.
The European effort, named the PROMETHEUS (Program for European Traffic with Highest Efficiency and Unprecedented Safety) Project, took place from 1987 to 1995 and received 749 million euros of funding from the European Commission.12 In 1994, a team led by Ernst Dickmanns, a professor at Bundeswehr University Munich, demonstrated a Mercedes-Benz van outfitted with self-driving capabilities driving through the crowded streets of Paris that reached speeds of up to sixty miles per hour. A year later, the van drove 1,200 miles at speeds of up to 80 miles per hour on an emptied Autobahn highway. In both cases, a driver was behind the wheel and took over when necessary.13
The PROMETHEUS system did not use a learning algorithm. Instead, the developers hand coded complex models of each road (e.g., the Autobahn) and of the way the vehicle needed to respond to curves and bumps and other aspects of driving.14 The researchers paired these models with features extracted from the camera images using various filters. The PROMETHEUS team input these filters into a conventionally coded algorithm that produced near-instantaneous instructions on how to steer the car, manipulate the throttle, and when and how hard to depress the brakes.
In 2000, Congress mandated that one-third of all military vehicles needed to be autonomous by 2015, and as a result, in February 2003, DARPA was looking to jumpstart research into autonomous vehicles.15 It created a 142-mile race in the Mojave Desert named the Grand Challenge and offered a $1 million prize to whichever team’s vehicle finished the race first. It was an extremely challenging course with hard turns, elevation changes, and obstacles ranging from tumbleweeds to rocks. The idea was to build a vehicle that could use GPS and sensors to drive the course and avoid obstacles. It was an imposing challenge.
The race took place in March 2004. DARPA selected the 15 most promising teams out of 140 teams that applied. There was a wide variety of entrants. One team was composed of high school students. Other teams had designed vehicles in their home garages. There were also two CMU entries and a motorcycle entered by Andrew Levandowski, who was then a University of California at Berkeley engineering graduate student and would later go on to become one of the leading figures in autonomous vehicles. However, none of the vehicles made it to the eight-mile mark. CMU’s vehicle Sandstorm went the farthest, 7.4 miles, before getting stuck on a berm, where its front wheels caught fire. Another vehicle got stuck in an embankment. Another could not get up a hill. Another vehicle flipped over, and others suffered mechanical issues. Wired magazine reported that it was like a scene out of a Mad Max movie.16 The million-dollar prize went unclaimed.
Still, the vehicles showed enough promise for DARPA to announce a second race in October 2005 with a $2 million prize. This race was far more successful, with five vehicles completing the 132-mile course. The winning entry, a modified Volkswagen SUV, was created by a team led by Sebastien Thrun, then a Stanford University AI professor. Thrun named the vehicle Stanley, created its technology as a twenty-student class project, and then enlisted a handful of students after the course ended to work on the project from July 2004 until the October 2005 event. Early in the process, the team drove the challenge course in a human-driven SUV. Even manned, it took the team seven hours to drive the course, which made the ten-hour challenge limit appear daunting.17
Figure 8.1 Grand Challenge entry from CMU. Licensed from Getty Images ID 55940868.
The CMU vehicles took second and third place in 2005 but went on to win the third DARPA-sponsored race, the Urban Grand Challenge, in 2007, and claim the $2 million purse for that race. Levandowski, Thrun, and many of the developers on the CMU team moved on to join autonomous vehicle companies.
COMMERCIALIZATION
Autonomous vehicles pose a massive commercial opportunity that is attracting the world’s top players in transportation and technology. Notably, technology companies, not automobile companies, have taken the lead in commercializing autonomous vehicles. Levandowski started a company that developed the camera technology that Google, at least initially, used for its Google Maps Street View. In 2007, Google recruited Thrun to head up its nascent self-driving car initiative.
In 2008, the Discovery Channel asked Levandowski’s company to build a prototype of a self-driving pizza delivery truck that successfully navigated a journey over the Bay Bridge in San Francisco.18 At least partly based on this successful pizza delivery, Google purchased the company in 2009. In 2016, Google spun off the self-driving vehicle project into a separate company, named Waymo, that is owned by Google’s parent company Alphabet. Waymo is one of the leaders in the development of self-driving taxis and trucks.
Uber hired forty researchers away from CMU in 2015 by doubling their previous salaries and offering hiring bonuses.19 In 2016, Levandowski left Google to start a self-driving truck company named Otto. Later that year, Uber purchased Otto for a little less than 1 percent of Uber’s stock, which was worth a reported $680 million, and put Levandowski in charge of Uber’s self-driving research.20 Tesla has also made a great deal of noise in the self-driving car space via its “autopilot” mode, which, despite the name, currently requires the driver to keep their hands on the wheel and to be ready to take over at all times.
Figure 8.2 A driverless shuttle bus.
© Haiyin | Licensed from Dreamstime.com ID 150893098.
Although independent technology companies have led the charge, the major car vendors had also done some experimentation with self-driving technologies. For example, Mercedes-Benz funded the PROMETHEUS Project in 1987. However, the independent tech companies had the technology and made the first substantial investments.
The major automakers quickly followed the technology companies into the autonomous vehicle fray, and virtually every major automaker is making significant investments in the technology. Many are working together to offset the massive costs of development. GM and Honda are collaborating via GM’s Cruise division. Ford and Volkswagen formed Argo AI. Bosch and Daimler teamed up. Nissan, Renault, and Microsoft are working together. Many other vendors around the world, operating independently of the major automakers, are working on autonomous vehicle technology.
HOW SELF-DRIVING VEHICLES WORK
Self-driving technology starts with sensors that provide the car with information that enables it to adapt to its environment. Some can also communicate with other vehicles on the road so that they can network—for example, to avoid collisions. Finally, all that sensor and communication input must be processed somehow and turned into driving decisions.
SENSORS
Autonomous vehicles use four types of sensors to “see” other vehicles, pedestrians, lane markers, and other elements of the driving environment: cameras, radar, lidar (light detection), and ultrasound. As an example, as of January 2020, Tesla vehicle equipment includes eight cameras, a forward-facing radar unit, and twelve ultrasonic sensors.21
Cameras have the best resolution among vision sensors. They are the best type of sensor to provide input to deep learning systems for lane detection and traffic sign recognition. Infrared cameras, which will start to become available on cars in 2021,22 can work outside visible-spectrum light and can detect lane markers, pedestrians, bicycles, and animals at night. Cameras are also the only type of vision sensor that can detect color, which is important for distinguishing traffic lights and emergency vehicle lights. However, cameras struggle with wet roads, reflective surfaces, and low sun angles, because the light reflections obscure the images much like they do when taking a snapshot of someone with the sun directly behind them. One other important difference between cameras and the other sensors is that machine learning is required to interpret the pattern of pixels23 captured by the camera. In contrast, the other sensors directly provide information about the distance between the vehicle and various objects.
Other types of sensors work by sending out signals and analyzing the echoes caused by objects in the signal’s path. Radar works by sending out a radio signal and measuring the time for the return echo. It has the most extended range and has the added advantage of being the least expensive. It is also effective when visibility is low, like at night and in fog. Radar can assist with lane detection, detecting objects in the vehicle’s blind spots, alerting the driver to cross traffic, sensing side impacts, avoiding collisions, and assisting with parking, emergency braking, and adaptive cruise control. However, radar only reflects off materials that conduct electricity, like steel.24 Radar does not reflect off trees or other nonconductive entities. As a result, it cannot see pedestrians and many other objects. Also, while radar is good at determining the speed of moving objects, it does a poor job of detecting stationary objects. This radar limitation caused Tesla vehicles to crash into the backs of stopped fire trucks on three separate occasions in 2018.25
Lidar (for light detection and ranging) works like radar but uses light waves instead of radio waves. A lidar unit emits laser light and computes distance by measuring the time it takes for the light to return to the unit. Lidar has a much higher resolution than radar and can create a 3D view of what it “sees.” However, it is expensive, and it cannot detect objects as far away as radar can.
Ultrasonic sensors emit sound waves that travel at known speeds, hit objects in the path, and bounce back to the sensor. The sensors use the round-trip time to calculate the distance to the object. Ultrasonic sensors are very short range (up to about twenty feet) and are mostly used today for parking and backup assistance.
The sensors discussed to this point all help the vehicle “see” road markings, pedestrians, stop signs, and other objects. AI decision-making needs this visual input, but it also needs other sensors that have been in cars for a long time that were introduced years ago by automakers to provide both safety and convenience features. For example, GPS is used to determine a car’s current location. However, while GPS indicates where the vehicle is on a map, it is not granular enough to determine the vehicle’s lane.
Autonomous vehicles use other sensors to detect several other variables, including temperature, shock, vibration, acceleration, and angular rotation. Inertial measurement units (IMUs) use a combination of accelerometers and gyroscopes to provide input to inertial navigation systems (INS). Vehicle computers use INS units for vehicle localization (i.e., aligning the vehicle location to the street map) in cities, where tall buildings can block the GPS signal. An autonomous vehicle can compensate for a short time by calculating its position based on an INS, using a navigation technique known as dead reckoning.26 Dead reckoning determines a vehicle’s current position from its previous position plus the output of an IMU or other sensors that measure acceleration and angular rotation.27
COMMUNICATIONS
Vehicle-to-vehicle communication (V2V) may become a vital component in collision avoidance systems. They relay data on position, speed, and status. V2V may play a role in more efficient highway merging, successful right-of-way determination, and safe responses to other everyday driving situations. Cars could theoretically communicate with one another to avoid collisions using V2V.
However, V2V can’t be 100 percent reliable until all cars have it. Because communication is only possible when each vehicle has V2V, unequipped vehicles will be blind and invisible to the V2V system. Combine this with the unfortunate fact that there are two competing V2V standards, and the ubiquitous use of V2V is likely to be delayed until around 2050.28
An important special case of V2V is truck platooning, when two or more trucks drive together. The lead truck has a human driver. The other trucks drive autonomously. The lead truck uses V2V to communicate with the trucks behind it and instructs the trucks on the proper maneuvers. The trailing trucks will initially have human drivers who can rest. As technology progresses, the trailing trucks will be fully autonomous. The Netherlands is leading the way with a government and industry collaboration in truck platooning, with a goal of having convoys of one hundred trucks in the near future.29
Many states and municipalities also have invested in vehicle-to-infrastructure (V2I) communication. V2I enables cars to communicate with city infrastructure, such as smart traffic lights to determine when the lights should change based on how many cars are waiting. Governments could require that all construction activity is filed in V2I systems so that these systems can notify cars approaching the area. For example, cars nearing a construction zone could be instructed to all move to the left lane in an orderly fashion, reducing traffic jams (and probably some road rage). V2I could also be used to warn cars of adverse road conditions, accidents, and emergency response vehicles.
Another standard being developed is vehicle-to-everything (V2X) communication. V2X is a combination of V2V and V2I. V2X has the potential to overcome the problem of the limited ranges of V2V and V2I. Imagine a city or highway that links all the V2I devices. Then a car that communicates to a V2I device could also gather information from faraway V2I devices, thereby extending the range of V2V communications.
PROCESSING THAT INFORMATION
Autonomous vehicles have in-car computers that process the sensor outputs and make decisions. Computer algorithms must identify obstacles and determine, for example, whether the sensor outputs include a traffic signal and whether that signal is red, yellow, or green. The computer also must determine what other cars and pedestrians are doing and must make predictions about their intent and what they will do next. The computer needs to plan routes and motions. And the computer needs to control the vehicle’s steering, braking, transmission, and acceleration. Autonomous vehicle computers contain a set of modules for controlling the vehicle.
In autonomous vehicles, perception systems identify pedestrians, cyclists, animals, and many different types of objects such as vehicles, lane markers, signs, and trees. The perception software is typically composed of many different machine learning systems, each of which learns to “see” a different category of animate or inanimate object. As of February 2020, Tesla vehicles had forty-eight different machine learning components, most of which use supervised learning networks.30 Each component recognizes different object types. For example, there is one component just for stop sign recognition.31
To develop the network for their stop sign detector, the Tesla team initially used video sent back to headquarters from Teslas driven by consumers. The team found stop signs in the videos and created a supervised learning training table by hand-labeling the images from the videos. They created an initial CAPTCHA-like test (e.g., “Find all the images with stop signs”) from these images and applied machine learning to create an initial stop sign detector that they deployed to all Tesla vehicles in over-the-air software updates. Each car’s machine learning system then rated each stop sign detection on a confidence scale. Sometimes, the detector had a low confidence in detecting a specific stop sign, or it detected a stop sign that was not on its map. In those cases, the video was sent back to headquarters and reviewed and labeled by Tesla staff for machine learning refinement.
Under certain circumstances, the Tesla team also builds specialized detectors to source data for special cases. For example, the stop sign detector was having trouble with occluded stop signs. A separate occluded stop sign detector was created and sent to the fleet for the sole purpose of sourcing possible occluded stop sign images that could then be hand-labeled and fed into the regular stop sign algorithm. They also have a specialized detector for “Except right turn” signs posted below stop signs based on ten thousand images collected in this fashion.
The perception system also detects the motion of objects and can use this information to identify them. For example, a bicycle and a motorcycle might look alike, but a motorcycle travels at a faster speed. It is also used to predict the trajectory of objects. For example, Tesla has developed a methodology for determining when a car will cut into a lane from another lane without requiring human labeling of images.32 The “cut-in” machine learning predictor works in shadow mode in all consumer-driven Teslas. In shadow mode, the car’s computers make calculations that are sent back to Tesla HQ but do not affect the operation of the consumer vehicle.33 For the “cut-in” predictor, the car computer predicts when a vehicle in one lane on a highway is about to switch lanes. Sometimes the vehicle will signal, and sometimes it will not signal. The machine learning algorithm constantly makes predictions about whether each vehicle in the field of vision will cut in. Human labeling is not necessary because the vehicle either will or will not cut in. Instead, each prediction is automatically labeled as correct or incorrect based on whether the vehicle actually cuts in or not. This creates additional data for the algorithm, which gets better and better.
LOCALIZATION AND MAPPING
An autonomous vehicle computer must also maintain an internal, two-dimensional map of the vehicle’s location. The vehicle has a GPS sensor and at least a coarse map of the roads, like those found on Google Maps. The GPS sensor, however, is only accurate to one or two meters. If the computer were to try to drive the vehicle based just on this information, that range of error would frequently put the vehicle on the road median or worse. Additionally, the GPS might not be available at all under certain atmospheric conditions or when surrounded by buildings.
Instead, autonomous vehicle computers use sensor information from the GPS, cameras, lidar, and radar to create a three-dimensional, high-definition internal map that contains lane markings, crosswalks, pedestrians, vehicles, buildings, bicycle lanes, and other objects relative to the location of the vehicle. These high-definition maps are accurate to a few centimeters. They primarily use conventional software code to fuse together the information from all the sensors. Map elements that are fixed, such as roads and stoplights, can be precomputed. Everything else needs to be computed in real time. As the car moves, both the map and the vehicle’s location on the map are continually updated.34
These maps are also used to aid the perception process. For example, if a stop sign is occluded by a bus but is present on the map, the autonomous vehicle system will still assume that the stop sign is there. It cannot do so with a 100 percent likelihood. However, it will only need a small glimpse of the stop sign to move that likelihood to 100 percent. Each time the stop sign is confirmed, that fact can then be sent back to a central database, stored, and used by other vehicles that link to that central database.
PREDICTION OF OBJECT TRAJECTORIES
The autonomous vehicle software computes the likely trajectory (i.e., the future position) for each person, animal, or object in the map. For example, the autonomous vehicle computer needs to know where each car and pedestrian will be in the near future (e.g., the car coming toward me will be in the intersection in two seconds). This can be done with conventionally coded rules using, for example, the speed and direction of the object. Some types of trajectory prediction are also done using machine learning algorithms.
PATH PLANNING
The autonomous vehicle computer needs to track the end destination and constantly make decisions about how to reach that destination. Should it move into the left-hand lane in preparation for a left-hand turn? Should it switch lanes to pass a slow-moving vehicle? Should it navigate around a double-parked car?
To make these constant decisions, the autonomous vehicle computer must consider its internal HD map and its predictions of the trajectories of the other vehicles, pedestrians, and animals in its map. It must also consider many other variables, including goals of minimizing the trip time, avoiding obstacles and accidents, obeying road rules, and not accelerating or stopping too suddenly if it can be avoided. If V2V or V2I information is available, it needs to consider this data also. If the tire pressure monitor indicates low pressure, this fact must also be brought into the equation. These path planning decisions are mostly made using rule-based conventional software. However, some decisions are made using machine learning components. For example, Tesla uses machine learning to create algorithms for left-hand turns and for navigating cloverleaf curves.35 The learning occurs in silent mode. The Tesla software predicts the driver behavior and then compares its prediction with the actual behavior. When it makes incorrect predictions, the computer images and the actual driver behavior are sent back to Tesla headquarters to be incorporated as new training examples.
Finally, conventional software is needed to translate the path planning decisions into commands that control the throttle, brakes, and steering. Except for perception, most of the software in an autonomous vehicle is conventional software that contains hand-coded rules that take as input the output of the machine learning components. That said, most autonomous vehicle vendors have a goal of replacing more and more of the conventional codebase with machine learning components over time.
DIFFERENCES BETWEEN AUTONOMOUS VEHICLE VENDORS: WAYMO VERSUS TESLA
Tesla approaches self-driving car technology in a way that is dramatically different from every other manufacturer, but let’s focus on just one of those competitors: Waymo, Google’s autonomous vehicle project. Tesla differs from most other manufacturers in the same ways.
TESLA SHADOW MODE VERSUS WAYMO SIMULATORS
Tesla turns every car owner into a participant in a massive self-driving experiment. There were over 825,000 Tesla vehicles on the road with Tesla Autopilot 2 software at the end of 2019.36 All these vehicles make shadow mode decisions, record the actions taken by the human driver, and send all that data back to the central Tesla data center, where the company’s computers compare the proposed and actual actions.
In 2017, Tesla vehicles started including short videos in the data sent back to Tesla HQ to improve recognition of lane lines, stoplights, and other roadway features. The Tesla team uses this data to determine the safety of the proposed actions. When certain types of decisions are proven safe after millions of miles of actual driving, Tesla rolls them out as improvements to the Tesla autopilot systems. Similarly, when Tesla makes changes to the self-driving algorithms, they can be tested for safety and effectiveness using a simulator that can replay all the captured miles using the new algorithms. Then, they can be rolled out in shadow mode until Tesla verifies them as safe and effective, and finally, Tesla rolls them into production use.
In contrast to Tesla’s 825,000 test cars, Waymo has approximately 600 test cars in its primary, 100-square-mile testing region near Phoenix, Arizona. These cars also collect data and send it back to the Waymo technology team; however, the volume of data is far smaller for Waymo than for Tesla. Tesla’s director of AI, Andrej Karpathy, wondered out loud in a talk how the other autonomous vehicle manufacturers could build robust detectors when they are only testing their vehicles in small geographies with small numbers of vehicles.37
To increase its test capacity, the Waymo team uses a combination of its detailed maps and the test car driving data to create a driving simulator. Every mile driven by Waymo test vehicles is recorded and simulated.38 Waymo uses the simulator for virtual testing of new features before rolling them out into test cars for real-world testing. The simulator enables Waymo to test enhancements to the self-driving software for safety and effectiveness and to make sure that the new improvements do not cause any previously working features to stop functioning.
Virtual cars in the simulator log billions of miles per year, and the simulators can use this information to make tweaks to the recorded data to simulate situations that test vehicles have yet to encounter. For example, programmers can vary the number of vehicles, pedestrians, and cyclists.
MAPS
Waymo and most other manufacturers use extremely detailed maps in their cars.39 Before Waymo sends self-driving test cars to a location, it maps the area by sending human-driven cars that create three-dimensional lidar maps. The mapping team at corporate headquarters then labels features such as driveways, fire hydrants, buildings, stop signs, traffic signals, crosswalks, lane boundaries, curb locations and heights, trees, construction zones, and other information.40
During self-driving tests, the software compares what it senses to what is on the map.41 Waymo and many other vendors argue that this is crucial for two reasons: First, the software can use these maps to help identify objects. Systems that rely entirely on vision sensors might not see a stop sign or stoplight if, for example, a bus or dense fog occludes its view. However, if the map detail indicates that there should be a stop sign in twenty yards, the system can take steps to see it and accept lower-probability sensor-based cues to find the stop sign, light, or other objects.
Second, these maps also work where GPS signals are blocked (e.g., by tall buildings). Even when GPS signals are not blocked, the system can only localize a car’s position to within two meters of the car’s actual position. However, combining the GPS information with Waymo’s detailed maps plus what the car senses enables the cars to know their position within ten centimeters.
Unfortunately, it is unreasonable to expect detailed maps to be available everywhere. They might be available in urban areas, but they are unlikely to be found in rural ones.42 Additionally, roadwork projects periodically pop up and will not appear on the maps. Waymo’s cars are programmed to determine when the map contents do not match what the vision systems see (e.g., a new construction zone is present). That information, including the vision sensor data, is sent back to Google, and the mapping staff updates the maps.
One reason most autonomous vehicle vendors are focusing on services like taxis and buses is that they can roll them out in small, well-mapped areas. In contrast, Tesla, which is trying to turn its Level 2 consumer vehicles into Level 3 and higher vehicles, cannot rely on high-definition maps, because it would need them for every road in the world.
NO LIDAR FOR TESLA
Nearly every manufacturer of autonomous vehicles is using or planning on using lidar in their vehicles. For example, Zoox, which is developing autonomous taxis for city use, equips each of its vehicles with eight lidar units (in addition to eighteen cameras and ten radar units).
Tesla, however, relies on cameras, radar, and ultrasonic sensors for vision. One possible reason for this is lidar’s expense. Tesla’s Model 3 was available in 2019 in the US for a base price of $39,000. In 2017, a top-of-the-line lidar unit was retailing for $75,000. That cost would likely triple the cost of a Tesla vehicle. The cost of lidar is coming down, but it is too late for vehicles already on the road or in production.
Interestingly, Tesla argues that lidar is unnecessary. Cameras are more like human eyes than lidar, because both the human eye and a camera capture only two-dimensional information. The human brain stitches together the images from the two eyes to produce a three-dimensional image. Tesla is using self-supervised learning techniques to develop three-dimensional images from multiple two-dimensional camera images that it claims are almost as good as lidar images and are getting better all the time.43 Manufacturers using lidar face the additional challenge of integrating what the lidar sees with what the cameras see.
ISSUES FOR AUTONOMOUS VEHICLES
There are several issues that represent a barrier to the ubiquitous rollout of self-driving vehicles.
CARS DO NOT SEE LIKE PEOPLE
Computer vision systems are prone to incorrect classifications. Computer vision systems can be fooled in ways that people are usually not. For example, researchers showed that minor changes to a speed limit sign could cause a machine learning system to think the sign said 85 mph instead of 35 mph and could unsafely accelerate as a result.44 Similarly, some Chinese hackers tricked Tesla’s autopilot into changing lanes.45 In both cases, these minor changes fooled cars but did not fool people, and a bad actor might devise similar ways of confusing cars or trucks into driving off the road or into obstacles. In real-world driving, many Tesla owners have reported that shadows, such as of tree branches, are often treated by their car as real objects.46 In the case of the Uber test car that killed the pedestrian, the car’s object recognition software first classified the pedestrian as an unknown object, then as a vehicle, and finally as a bicycle.47 I don’t know about you, but I would rather not be on the road as a pedestrian or a driver if vehicles cannot recognize pedestrians with 100 percent accuracy!
In 2009, Captain Sully Sullenberger had just piloted his plane into the air when a flock of Canadian geese took out the engines. The plane was only 2,900 feet above the ground, and Sullenberger and his copilots had only a few minutes to maneuver before the plane hit the ground. They had received no training on this specific scenario; they could only apply a few basic rules and common sense. To decide the best course of action, they factored in the likelihood of their passengers surviving various crash alternatives, the likelihood of injuring people on the ground, where rescue vehicles would be quickly available, and many other factors. Then they heroically landed in the Hudson River, all 155 passengers survived, and no one was hurt.
Pilots receive extensive training, but it is impossible to train them for every possible situation. For those edge cases—situations similar to but not exactly like their training—they must use their commonsense knowledge and reasoning capabilities.48
The same is true for automobile drivers. A Clearwater, Florida, high school student noticed a woman having a seizure while her car was moving. The student pulled her car in front of the woman’s car and stopped it with no injuries and only minor bumper damage.49
Most of us have encountered unexpected phenomena while driving: A deer darts onto the highway. A flood makes the road difficult or impossible to navigate. A tree falls and blocks the road. The car approaches the scene of an accident or a construction zone. A boulder falls onto a mountain road. A section of new asphalt has no lines. You notice or suspect black ice. The car might fishtail when you try to get up an icy hill. We all have our stories.
We do not learn about all these possible edge cases in driving school. Instead, we use our commonsense reasoning skills to predict actions and outcomes. If we hear an ice cream truck in a neighborhood, we know to look out for children running toward the truck. When the temperature is below 32 degrees, there is precipitation on the road, and we are going down a hill, we know that we need to drive very slowly. We change our driving behavior when we see the car in front of us swerving, knowing that the driver might be intoxicated or texting. If a deer crosses the road, we are on the lookout for another deer, because our commonsense knowledge tells us they travel as families. We know to keep a safe distance and handle passing a vehicle with extra care when we see a truck with an “extra wide load” sign on the back. When we see a ball bounce into the street, we slow down because a child might run into the street to chase it. If we see a large piece of paper on the road, we know we can drive over it, but if we see a large shredded tire, we know to stop or go around it.
Because autonomous vehicles lack the commonsense reasoning capabilities to handle these unanticipated situations, their manufacturers have only two choices. They can try to collect data on human encounters with rare phenomena and use machine learning to build systems that can learn how to handle each of them individually. Or they can try to anticipate every possible scenario and create a conventional program that takes as input vision system identification of these phenomena and tells the car what to do in each situation. What will happen when autonomous vehicles encounter unanticipated situations for which there is no training or programming? A scary video filmed in 2020 illustrates what can happen. It shows a Tesla on a Korean highway approaching an overturned truck at high speed in autopilot mode. A man is standing on the highway in front of the truck waving cars into another lane. The Tesla never slows down, the man has to jump out of the way, and the Tesla crashes into the truck at full speed.50
It will be difficult, if not impossible, for manufacturers to anticipate every edge case. It may be possible for slow-moving shuttles on corporate campuses, but it is hard to imagine for self-driving consumer vehicles.
SAFETY VERSUS TRAFFIC JAMS
The autonomous vehicle industry will also need careful navigation of the trade-off between safety and traffic jams. In early 2020, Moscow hosted a driverless vehicle competition. Shortly after it began, a vehicle stalled out at a traffic light. Human drivers would reason about this edge case and decide to just go around the stalled car. However, none of the driverless cars did that, and a three-hour traffic jam ensued.51 We do not want autonomous vehicles to crash, but we also do not want them to stop and block traffic every time they encounter an obstacle.
The Insurance Institute for Highway Safety analyzed five thousand car accidents and found that if autonomous vehicles do not drive more slowly and cautiously than people, they will only prevent one-third of all crashes.52 If manufacturers program cars to drive more slowly, the result will be more cars on the road at any given point in time. This will increase the already too-high levels of congestion on many of our roads.
A REALISTIC TIMELINE
As tech journalist Doug Newcomb noted,53 the move to driverless cars has a lot in common with the move to horseless carriages more than one hundred years ago. Back then we eliminated the horses; now we are eliminating the drivers. The problem with this analogy is that most drivers are smarter than their horses, but most cars are not smarter than their drivers.
Figure 8.3 Examples of autonomous vehicle use cases and the relative difficulty of technology implementation.
As is illustrated in figure 8.3, the intended use of the autonomous vehicles has a significant impact on the relative difficulty of creating the technology. Perhaps the simplest use is the development of shuttles that drive fixed routes on private land, such as corporate campuses and retirement villages. Because the shuttles retrace the same route over and over, they only need to learn to navigate a small number of routes. Additionally, the operators of the vehicles have the option to shut them down in bad weather and to operate only during specific times, such as during the day. In comparison, autonomous consumer vehicles need to learn how to navigate any drivable road, anywhere in the world, under a wide range of weather conditions, and at all hours of the day. More importantly, autonomous consumer vehicles need to be operable at high speeds, whereas campus shuttles can motor along slowly at speeds as low as five miles per hour. As a result, even if a low-speed shuttle has an accident, the likelihood of severe injury to passengers, pedestrians, and other vehicles is far lower than that for a consumer vehicle. So, consumer vehicle manufacturers must meet a much more rigorous safety standard than low-speed shuttle developers.
Moreover, the number of potential edge cases increases from left to right in figure 8.3. Campus shuttles are likely to encounter relatively few edge cases. Consumer vehicles are likely to encounter so many edge cases that it may be impossible to identify all of them and create autonomous vehicle code to handle them.
Slow-moving shuttles and delivery vehicles will likely be the first to be rolled into production. Their slow speeds minimize the risk of injury and property damage. Because they can be shut down at night or in bad weather, these vehicles will encounter the fewest edge cases. Shuttles have the additional advantage of traveling a fixed route. Also, since they often operate on private land, they are less likely to cause traffic jams.
The Mayo Clinic in Jacksonville, Florida, is testing a driverless shuttle to transport potentially contagious medical samples from one part of its campus to another.54 EasyMile, a French company, started a test rollout of driverless shuttles with a maximum speed of twelve miles per hour in sixteen US cities. Nuro is starting to operate tiny pizza and grocery delivery vehicles at speeds up to twenty-five miles per hour on the roads in Houston, Texas.55
We are also seeing tests of self-driving taxis in cities and suburban areas. From 2016 through 2019, Waymo tested driverless taxis in a well-mapped one-hundred-square-mile region in the Phoenix suburbs with a safety operator in the vehicle. However, starting in late 2019, Waymo began to offer taxi rides in a vehicle without a safety operator56 within a fifty-square-mile subregion of the test area. Waymo offers a compelling video of a car driving on public roads without anyone in the driver’s seat.57 Zoox and other vendors are also testing autonomous taxis in various cities.58
That said, there are far more edge cases for city-based taxis than for campus shuttles. If manufacturers somehow manage to identify and program all the edge cases, they will need to develop a different system for each city. For example, the edge cases for San Francisco will be different from those for Bangalore, India, where it is not unusual to see cattle in the same lanes as cars.
On the consumer side, nearly all major auto manufacturers are heading toward Level 3 capabilities by virtue of their Level 2 driver assistance offerings. Tesla is the furthest along, because it is unobtrusively testing these capabilities on over 825,000 vehicles. However, no consumer vehicles are close to ready for a true Level 3 rollout. It is hard to imagine manufacturers capturing enough edge cases for these vehicles to make them safe enough for a driver to read a book during vehicle operation.59
The prospect of fully autonomous consumer vehicles is particularly scary. Consumers can drive them anywhere they want, at any time of day, and in any weather conditions. These vehicles will encounter all the edge cases that human drivers encounter. But they do not have human-like commonsense reasoning skills. How will they respond? Will they cause serious accidents and traffic jams?
Autonomous trucks fall somewhere in between autonomous taxis and consumer vehicles. On one hand, trucking companies could decide to only roll out autonomous trucks on certain stretches of highway. For example, it might be possible to use human drivers or teleoperators to get them on and off the highways. This would reduce the number of edge cases they would encounter. Not operating the trucks in bad weather would further reduce the edge cases. On the other hand, autonomous trucks that do all the driving will need to account for perhaps even more edge cases than consumer vehicles. Additionally, they pose far more risk because of their size.
We will likely see some limited rollouts of autonomous vehicles over the next ten years. Initially they will be slow-moving vehicles with fixed routes on private land. They will likely progress to moderate speed vehicles with fixed routes on public roads. If we do see significant city-based rollouts of autonomous taxis, they will likely be limited to very specific, well-mapped areas and will need different software for each city. However, due to the lack of commonsense reasoning in autonomous vehicles, coupled with the seeming impossibility of anticipating every possible situation a vehicle might encounter, we will probably not see autonomous vehicles dominating our highways and city streets for a long time.60