★
The Four Waves of AI
The year 2017 marked the first time I heard Donald Trump speak fluent Chinese. During the U.S. president’s first trip to China, he showed up on a big screen to welcome attendees at a major tech conference. He began his speech in English and then abruptly switched languages.
“AI is changing the world,” he said, speaking in flawless Chinese but with typical Trump bluster. “And iFlyTek is really fantastic.”
President Trump cannot, of course, speak Chinese. But AI is indeed changing the world, and Chinese companies like iFlyTek are leading the way. By training its algorithms on large data samples of President Trump’s speeches, iFlyTek created a near-perfect digital model of his voice: intonation, pitch, and pattern of speech. It then recalibrated that vocal model for Mandarin Chinese, showing the world what Donald Trump might sound like if he grew up in a village outside Beijing. The movement of lips wasn’t precisely synced to the Chinese words, but it was close enough to fool a casual viewer at first glance. President Obama got the same treatment from iFlyTek: a video of a real press conference but with his professorial style converted to perfect Mandarin.
“With the help of iFlyTek, I’ve learned Chinese,” Obama intoned to the White House press corps. “I think my Chinese is better than Trump’s. What do all of you think?”
iFlyTek might say the same to its own competitors. The Chinese company has racked up victories at a series of prestigious international AI competitions for speech recognition, speech synthesis, image recognition, and machine translation. Even in the company’s “second language” of English, iFlyTek often beats teams from Google, DeepMind, Facebook, and IBM Watson in natural-language processing—that is, the ability of AI to decipher overall meaning rather than just words.
This success didn’t come overnight. Back in 1999, when I started Microsoft Research Asia, my top-choice recruit was a brilliant young Ph.D. named Liu Qingfeng. He had been one of the students I saw filing out of the dorms to study under streetlights after my lecture in Hefei. Liu was both hardworking and creative in tackling research questions; he was one of China’s most promising young researchers. But when we asked him to accept our scholarship offer and become a Microsoft intern and then an employee, he declined. He wanted to start his own AI speech company. I told him that he was a great young researcher but that China lagged too far behind American speech-recognition giants like Nuance, and there were fewer customers in China for this technology. To his credit, Liu ignored that advice and poured himself into building iFlyTek. Nearly twenty years and dozens of AI competition awards later, iFlyTek has far surpassed Nuance in capabilities and market cap, becoming the most valuable AI speech company in the world.
Combining iFlyTek’s cutting-edge capabilities in speech recognition, translation, and synthesis will yield transformative AI products, including simultaneous translation earpieces that instantly convert your words and voice into any language. It’s the kind of product that will soon revolutionize international travel, business, and culture, and unlock vast new stores of time, productivity, and creativity in the process.
THE WAVES
But it won’t happen all at once. The complete AI revolution will take a little time and will ultimately wash over us in a series of four waves: internet AI, business AI, perception AI, and autonomous AI. Each of these waves harnesses AI’s power in a different way, disrupting different sectors and weaving artificial intelligence deeper into the fabric of our daily lives.
The first two waves—internet AI and business AI—are already all around us, reshaping our digital and financial worlds in ways we can barely register. They are tightening internet companies’ grip on our attention, replacing paralegals with algorithms, trading stocks, and diagnosing illnesses.
Perception AI is now digitizing our physical world, learning to recognize our faces, understand our requests, and “see” the world around us. This wave promises to revolutionize how we experience and interact with our world, blurring the lines between the digital and physical worlds. Autonomous AI will come last but will have the deepest impact on our lives. As self-driving cars take to the streets, autonomous drones take to the skies, and intelligent robots take over factories, they will transform everything from organic farming to highway driving and fast food.
These four waves all feed off different kinds of data, and each one presents a unique opportunity for the United States or China to seize the lead. We’ll see that China is in a strong position to lead or co-lead in internet AI and perception AI, and will likely soon catch up with the United States in autonomous AI. Currently, business AI remains the only arena in which the United States maintains clear leadership.
Competition, however, won’t play out in just these two countries. AI-driven services that are pioneered in the United States and China will then proliferate across billions of users around the globe, many of them in developing countries. Companies like Uber, Didi, Alibaba, and Amazon are already fiercely competing for these developing markets but adopting very different strategies. While Silicon Valley juggernauts are trying to conquer each new market with their own products, China’s internet companies are instead investing in these countries’ scrappy local startups as they try to fight off U.S. domination. It’s a competition that’s just getting started, and one that will have profound implications for the global economic landscape of the twenty-first century.
To understand how this coming competition will play out at home and abroad, we must first take a dive into each of the four waves of AI washing over our economies.
FIRST WAVE: INTERNET AI
Internet AI already likely has a strong grip on your eyeballs, if not your wallet. Ever find yourself going down an endless rabbit hole of YouTube videos? Do video streaming sites have an uncanny knack for recommending that next video that you’ve just got to check out before you get back to work? Does Amazon seem to know what you’ll want to buy before you do?
If so, then you have been the beneficiary (or victim, depending on how you value your time, privacy, and money) of internet AI. This first wave began almost fifteen years ago but finally went mainstream around 2012. Internet AI is largely about using AI algorithms as recommendation engines: systems that learn our personal preferences and then serve up content hand-picked for us.
The horsepower of these AI engines depends on the digital data they have access to, and there’s currently no greater storehouse of this data than the major internet companies. But that data only becomes truly useful to algorithms once it has been labeled. In this case, “labeled” doesn’t mean you have to actively rate the content or tag it with a keyword. Labels simply come from linking a piece of data with a specific outcome: bought versus didn’t buy, clicked versus didn’t click, watched until the end versus switched videos. Those labels—our purchases, likes, views, or lingering moments on a web page—are then used to train algorithms to recommend more content that we’re likely to consume.
Average people experience this as the internet “getting better”—that is, at giving us what we want—and becoming more addictive as it goes. But it’s also proof of the power of AI to learn about us through data and then optimize for what we desire. That optimization has been translated into massive increases in profits for established internet companies that make money off our clicks: the Googles, Baidus, Alibabas, and YouTubes of the world. Using internet AI, Alibaba can recommend products you’re more likely to buy, Google can target you with ads you’re more likely to click on, and YouTube can suggest videos that you’re more likely to watch. Adopting those same methods in a different context, a company like Cambridge Analytica used Facebook data to better understand and target American voters during the 2016 presidential campaign. Revealingly, it was Robert Mercer, founder of Cambridge Analytica, who reportedly coined the famous phrase, “There’s no data like more data.”
ALGORITHMS AND EDITORS
First-wave AI has given birth to entirely new, AI-driven internet companies. China’s leader in this category is Jinri Toutiao (meaning “today’s headlines”; English name: “ByteDance”). Founded in 2012, Toutiao is sometimes called “the BuzzFeed of China” because both sites serve as hubs for timely viral stories. But virality is where the similarities stop. BuzzFeed is built on a staff of young editors with a knack for cooking up original content. Toutiao’s “editors” are algorithms.
Toutiao’s AI engines trawl the internet for content, using natural-language processing and computer vision to digest articles and videos from a vast network of partner sites and commissioned contributors. It then uses the past behavior of its users—their clicks, reads, views, comments, and so on—to curate a highly personalized newsfeed tailored to each person’s interests. The app’s algorithms even rewrite headlines to optimize for user clicks. And the more those users click, the better Toutiao becomes at recommending precisely the content they want to see. It’s a positive feedback loop that has created one of the most addictive content platforms on the internet, with users spending an average of seventy-four minutes per day in the app.
ROBOT REPORTS AND FAKE NEWS
Reaching beyond simple curation, Toutiao also uses machine learning to create and police its content. During the 2016 Summer Olympics in Rio de Janeiro, Toutiao worked with Peking University to create an AI “reporter” that wrote short articles summing up sports events within minutes of the final whistle. The writing wasn’t exactly poetry, but the speed was incredible: the “reporter” produced short summaries within two seconds of some events’ finish, and it “covered” over thirty events per day.
Algorithms are also being used to sniff out “fake news” on the platform, often in the form of bogus medical treatments. Originally, readers discovered and reported misleading stories—essentially, free labeling of that data. Toutiao then used that labeled data to train an algorithm that could identify fake news in the wild. Toutiao even trained a separate algorithm to write fake news stories. It then pitted those two algorithms against each other, competing to fool one another and improving both in the process.
This AI-driven approach to content is paying off. By late 2017, Toutiao was already valued at $20 billion and went on to raise a new round of funding that would value it at $30 billion, dwarfing the $1.7 billion valuation for BuzzFeed at the time. For 2018, Toutiao projected revenues between $4.5 and $7.6 billion. And the Chinese company is rapidly working to expand overseas. After trying and failing in 2016 to buy Reddit, the popular U.S. aggregation and discussion site, in 2017 Toutiao snapped up a France-based news aggregator and Musical.ly, a Chinese video lip-syncing app that’s wildly popular with American teens.
Toutiao is just one company, but its success is indicative of China’s strength in internet AI. With more than 700 million internet users all digesting content in the same language, China’s internet juggernauts are reaping massive rewards from optimizing online services with AI. That has helped fuel the rapid rise of Tencent’s market cap—surpassing Facebook in November 2017 and becoming the first Chinese company to top $500 billion—and has allowed Alibaba to hold its own with Amazon. Despite Baidu’s strength in AI research, its mobile services lagged far behind Google. But that gap is more than made up for by upstarts like Toutiao, Chinese companies that are generating multibillion-dollar valuations by building their business foundation on internet AI. Massive profits will accrue to these internet companies as they become even better at holding our attention longer and harvesting our clicks.
Overall, Chinese and American companies are on about equal footing in internet AI, with around 50–50 odds of leadership based on current technology. I predict that in five years’ time, Chinese technology companies will have a slight advantage (60–40) when it comes to leading the world in internet AI and reaping the richest rewards from its implementation. Remember, China alone has more internet users than the United States and all of Europe combined, and those users are empowered to make frictionless mobile payments to content creators, O2O platforms, and other users. That combination is generating creative internet AI applications and opportunities for monetization unmatched anywhere else in the world. Add China’s tenacious and well-funded entrepreneurs into the mix, and China has a strong—but not yet decisive—edge over Silicon Valley.
But for all the economic value that the first AI wave generates, it remains largely bottled up in the high-tech sector and digital world. Bringing the optimization power of AI to bear on more traditional companies in the wider economy comes during the second wave: business AI.
SECOND WAVE: BUSINESS AI
First-wave AI leverages the fact that internet users are automatically labeling data as they browse. Business AI takes advantage of the fact that traditional companies have also been automatically labeling huge quantities of data for decades. For instance, insurance companies have been covering accidents and catching fraud, banks have been issuing loans and documenting repayment rates, and hospitals have been keeping records of diagnoses and survival rates. All of these actions generate labeled data points—a set of characteristics and a meaningful outcome—but until recently, most traditional businesses had a hard time exploiting that data for better results.
Business AI mines these databases for hidden correlations that often escape the naked eye and human brain. It draws on all the historic decisions and outcomes within an organization and uses labeled data to train an algorithm that can outperform even the most experienced human practitioners. That’s because humans normally make predictions on the basis of strong features, a handful of data points that are highly correlated to a specific outcome, often in a clear cause-and-effect relationship. For example, in predicting the likelihood of someone contracting diabetes, a person’s weight and body mass index are strong features. AI algorithms do indeed factor in these strong features, but they also look at thousands of other weak features: peripheral data points that might appear unrelated to the outcome but contain some predictive power when combined across tens of millions of examples. These subtle correlations are often impossible for any human to explain in terms of cause and effect: why do borrowers who take out loans on a Wednesday repay those loans faster? But algorithms that can combine thousands of those weak and strong features—often using complex mathematical relationships indecipherable to a human brain—will outperform even top-notch humans at many analytical business tasks.
Optimizations like this work well in industries with large amounts of structured data on meaningful business outcomes. In this case, “structured” refers to data that has been categorized, labeled, and made searchable. Prime examples of well-structured corporate data sets include historic stock prices, credit-card usage, and mortgage defaults.
THE BUSINESS OF BUSINESS AI
As early as 2004, companies like Palantir and IBM Watson offered big-data business consulting to companies and governments. But the widespread adoption of deep learning in 2013 turbocharged these capabilities and gave birth to new competitors, such as Element AI in Canada and 4th Paradigm in China.
These startups sell their services to traditional companies or organizations, offering to let their algorithms loose on existing databases in search of optimizations. They help these companies improve fraud detection, make smarter trades, and uncover inefficiencies in supply chains. Early instances of business AI have clustered heavily in the financial sector because it naturally lends itself to data analysis. The industry runs on well-structured information and has clear metrics that it seeks to optimize.
This is also why the United States has built a strong lead in early applications of business AI. Major American corporations already collect large amounts of data and store it in well-structured formats. They often use enterprise software for accounting, inventory, and customer relationship management. Once the data is in these formats, it’s easy for companies like Palantir to come in and generate meaningful results by applying business AI to seek out cost savings and profit maximization.
This is not so in China. Chinese companies have never truly embraced enterprise software or standardized data storage, instead keeping their books according to their own idiosyncratic systems. Those systems are often not scalable and are difficult to integrate into existing software, making the cleaning and structuring of data a far more taxing process. Poor data also makes the results of AI optimizations less robust. As a matter of business culture, Chinese companies spend far less money on third-party consulting than their American counterparts. Many old-school Chinese businesses are still run more like personal fiefdoms than modern organizations, and outside expertise isn’t considered something worth paying for.
FIRE YOUR BANKER
Both China’s corporate data and its corporate culture make applying second-wave AI to its traditional companies a challenge. But in industries where business AI can leapfrog legacy systems, China is making serious strides. In these instances, China’s relative backwardness in areas like financial services turns into a springboard to cutting-edge AI applications. One of the most promising of these is AI-powered micro-finance.
For example, when China leapfrogged credit cards to move right into mobile payments, it forgot one key piece of the consumer puzzle: credit itself. WeChat and Alipay let you draw directly from your bank account, but their core services don’t give you the ability to spend a little bit beyond your means while you’re waiting for the next paycheck.
Into this void stepped Smart Finance, an AI-powered app that relies exclusively on algorithms to make millions of small loans. Instead of asking borrowers to enter how much money they make, it simply requests access to some of the data on a potential borrower’s phone. That data forms a kind of digital fingerprint, one with an astonishing ability to predict whether the borrower will pay back a loan of three hundred dollars.
Smart Finance’s deep-learning algorithms don’t just look to the obvious metrics, like how much money is in your WeChat Wallet. Instead, it derives predictive power from data points that would seem irrelevant to a human loan officer. For instance, it considers the speed at which you typed in your date of birth, how much battery power is left on your phone, and thousands of other parameters.
What does an applicant’s phone battery have to do with creditworthiness? This is the kind of question that can’t be answered in terms of simple cause and effect. But that’s not a sign of the limitations of AI. It’s a sign of the limitations of our own minds at recognizing correlations hidden within massive streams of data. By training its algorithms on millions of loans—many that got paid back and some that didn’t—Smart Finance has discovered thousands of weak features that are correlated to creditworthiness, even if those correlations can’t be explained in a simple way humans can understand. Those offbeat metrics constitute what Smart Finance founder Ke Jiao calls “a new standard of beauty” for lending, one to replace the crude metrics of income, zip code, and even credit score.
Growing mountains of data continue to refine these algorithms, allowing the company to scale up and extend credit to groups routinely ignored by China’s traditional banking sector: young people and migrant workers. In late 2017, the company was making more than 2 million loans per month with default rates in the low single digits, a track record that makes traditional brick-and-mortar banks extremely jealous.
“THE ALGORITHM WILL SEE YOU NOW”
But business AI can be about more than dollars and cents. When applied to other information-driven public goods, it can mean a massive democratization of high-quality services to those who previously couldn’t afford them. One of the most promising of these is medical diagnosis. Top researchers in the United States like Andrew Ng and Sebastian Thrun have demonstrated excellent algorithms that are on par with doctors at diagnosing specific illnesses based on images—pneumonia through chest x-rays and skin cancer through photos. But a broader business AI application for medicine will look to handle the entire diagnosis process for a wide variety of illnesses.
Right now, medical knowledge—and thus the power to deliver accurate diagnoses—is pretty much kept bottled up within a small number of very talented humans, people with imperfect memories and limited time to keep up with new advances in the field. Sure, a vast wealth of medical information is scattered across the internet but not in a way that is navigable by most people. First-rate medical diagnosis is still heavily rationed based on geography and, quite candidly, one’s ability to pay.
This is especially stark in China, where well-trained doctors all cluster in the wealthiest cities. Travel outside of Beijing and Shanghai, and you’re likely to see a dramatic drop in the medical knowledge of doctors treating your illness. The result? Patients from all around the country try to cram into the major hospitals, lining up for days and straining limited resources to the breaking point.
Second-wave AI promises to change all of this. Underneath the many social elements of visiting a doctor, the crux of diagnosis involves collecting data (symptoms, medical history, environmental factors) and predicting the phenomena correlated with them (an illness). This act of seeking out various correlations and making predictions is exactly what deep learning excels at. Given enough training data—in this case, precise medical records—an AI-powered diagnostic tool could turn any medical professional into a super-diagnostician, a doctor with experience in tens of millions of cases, an uncanny ability to spot hidden correlations, and a perfect memory to boot.
This is what RXThinking is attempting to build. Founded by a Chinese AI researcher with deep experience in Silicon Valley and at Baidu, the startup is training medical AI algorithms to become super-diagnosticians that can be dispatched to all corners of China. Instead of replacing doctors with algorithms, RXThinking’s AI diagnosis app empowers them. It acts like a “navigation app” for the diagnosis process, drawing on all available knowledge to recommend the best route but still letting the doctors steer the car.
As the algorithm gains more information on each specific case, it progressively narrows the scope of possible illnesses and requests further clarifying information needed to complete the diagnosis. Once enough information has been entered to give the algorithm a high level of certainty, it makes a prediction for the cause of the symptoms, along with all other possible diagnoses and the percentage chance that they are the real culprit.
The app never overrides a doctor—who can always choose to deviate from the app’s recommendations—but it draws on over 400 million existing medical records and continually scans the latest medical publications to make recommendations. It disseminates world-class medical knowledge equally throughout highly unequal societies, and lets all doctors and nurses focus on the human tasks that no machine can do: making patients feel cared for and consoling them when the diagnosis isn’t bright.
JUDGING THE JUDGES
Similar principles are now being applied to China’s legal system, another sprawling bureaucracy with highly uneven levels of expertise across regions. iFlyTek has taken the lead in applying AI to the courtroom, building tools and executing a Shanghai-based pilot program that uses data from past cases to advise judges on both evidence and sentencing. An evidence cross-reference system uses speech recognition and natural-language processing to compare all evidence presented—testimony, documents, and background material—and seek out contradictory fact patterns. It then alerts the judge to these disputes, allowing for further investigation and clarification by court officers.
Once a ruling is handed down, the judge can turn to yet another AI tool for advice on sentencing. The sentencing assistant starts with the fact pattern—defendant’s criminal record, age, damages incurred, and so on—then its algorithms scan millions of court records for similar cases. It uses that body of knowledge to make recommendations for jail time or fines to be paid. Judges can also view similar cases as data points scattered across an X–Y graph, clicking on each dot for details on the fact pattern that led to the sentence. It’s a process that builds consistency in a system with over 100,000 judges, and it can also rein in outliers whose sentencing patterns put them far outside the mainstream. One Chinese province is even using AI to rate and rank all prosecutors on their performance. Some American courts have implemented similar algorithms to advise on the “risk” level of prisoners up for parole, though the role and lack of transparency of these AI tools have already been challenged in higher courts.
As with RXThinking’s “navigation system” for doctors, all of iFlyTek’s judicial tools are just that: tools that aid a real human in making informed decisions. By empowering judges with data-driven recommendations, they can help balance the scales of justice and correct for the biases present in even well-trained judges. American legal scholars have illustrated vast disparities in U.S. sentencing based on the race of the victim and the defendant. And judicial biases can be far less malicious than racism: a study of Israeli judges found them far more severe in their decisions before lunch and more lenient in granting parole after having a good meal.
WHO LEADS?
So which country will lead in the broader category of business AI? Today, the United States enjoys a commanding lead (90–10) in this wave, but I believe in five years China will close that gap somewhat (70–30), and the Chinese government has a better shot at putting the power of business AI to good use. The United States has a clear advantage in the most immediate and profitable implementations of the technology: optimizations within banking, insurance, or any industry with lots of structured data that can be mined for better decision-making. Its companies have the raw material and corporate willpower to apply business AI to the problem of maximizing their bottom line.
There’s no question that China will lag in the corporate world, but it may lead in public services and industries with the potential to leapfrog outdated systems. The country’s immature financial system and imbalanced healthcare system give it strong incentives to rethink how services like consumer credit and medical care are distributed. Business AI will turn those weaknesses into strengths as it reimagines these industries from the ground up.
These applications of second-wave AI have immediate, real-world impacts, but the algorithms themselves are still trafficking purely in digital information mediated by humans. Third-wave AI changes all of this by giving AI two of humans’ most valuable information-gathering tools: eyes and ears.
THIRD WAVE: PERCEPTION AI
Before AI, all machines were deaf and blind. Sure, you could take digital photos or make audio recordings, but these merely reproduced our audio and visual environments for humans to interpret—the machines themselves couldn’t make sense of these reproductions. To a normal computer, a photograph is just a meaningless splattering of pixels it must store. To an iPhone, a song is just a series of zeros and ones that it must play for a human to enjoy.
This all changed with the advent of perception AI. Algorithms can now group the pixels from a photo or video into meaningful clusters and recognize objects in much the same way our brain does: golden retriever, traffic light, your brother Patrick, and so on. The same goes for audio data. Instead of merely storing audio files as collections of digital bits, algorithms can now both pick out words and often parse the meaning of full sentences.
Third-wave AI is all about extending and expanding this power throughout our lived environment, digitizing the world around us through the proliferation of sensors and smart devices. These devices are turning our physical world into digital data that can then be analyzed and optimized by deep-learning algorithms. Amazon Echo is digitizing the audio environment of people’s homes. Alibaba’s City Brain is digitizing urban traffic flows through cameras and object-recognition AI. Apple’s iPhone X and Face++ cameras perform that same digitization for faces, using the perception data to safeguard your phone or digital wallet.
BLURRED LINES AND OUR “OMO” WORLD
As a result, perception AI is beginning to blur the lines separating the online and offline worlds. It does that by dramatically expanding the nodes through which we interact with the internet. Before perception AI, our interactions with the online world had to squeeze through two very narrow chokepoints: the keyboards on our computers or the screen on our smartphones. Those devices act as portals to the vast knowledge stored on the world wide web, but they are a very clunky way to input or retrieve information, especially when you’re out shopping or driving in the real world.
As perception AI gets better at recognizing our faces, understanding our voices, and seeing the world around us, it will add millions of seamless points of contact between the online and offline worlds. Those nodes will be so pervasive that it no longer makes sense to think of oneself as “going online.” When you order a full meal just by speaking a sentence from your couch, are you online or offline? When your refrigerator at home tells your shopping cart at the store that you’re out of milk, are you moving through a physical world or a digital one?
I call these new blended environments OMO: online-merge-offline. OMO is the next step in an evolution that already took us from pure e-commerce deliveries to O2O (online-to-offline) services. Each of those steps has built new bridges between the online world and our physical one, but OMO constitutes the full integration of the two. It brings the convenience of the online world offline and the rich sensory reality of the offline world online. Over the coming years, perception AI will turn shopping malls, grocery stores, city streets, and our homes into OMO environments. In the process, it will produce some of the first applications of artificial intelligence that will feel truly futuristic to the average user.
Some of these are already here. One KFC restaurant in China recently teamed up with Alipay to pioneer a pay-with-your-face option at some stores. Customers place their own order at a digital terminal, and a quick facial scan connects their order to their Alipay account—no cash, cards, or cell phones required. The AI powering the machines even runs a quick “liveness algorithm” to ensure no one can use a photograph of someone else’s face to pay for a meal.
Pay-with-your-face applications are fun, but they are just the tip of the OMO iceberg. To get a sense of where things are headed, let’s take a quick trip just a few years into the future to see what a supermarket fully outfitted with perception AI devices might look like.
“WHERE EVERY SHOPPING CART KNOWS YOUR NAME”
“Nihao, Kai-Fu! Welcome back to Yonghui Superstore!”
It’s always a nice feeling when your shopping cart greets you like an old friend. As I pull the cart back from the rack, visual sensors embedded in the handlebar have already completed a scan of my face and matched it to a rich, AI-driven profile of my habits, as a foodie, a shopper, and a husband to a fantastic cook of Chinese food. While I’m racking my brain for what groceries we’ll need this week, a screen on the handlebar lights up.
“On the screen is a list of your typical weekly grocery purchase,” the cart announces. And like that, our family’s staple list of groceries appears on the screen: fresh eggplant, Sichuan pepper, Greek yogurt, skim milk, and so on.
My refrigerator and cabinets have already detected what items we’re short on this week, and they automatically ordered the nonperishable staples—rice, soy sauce, cooking oil—for bulk delivery. That means grocery stores like Yonghui can tailor their selection around the items you’d want to pick out for yourself: fresh produce, unique wines, live seafood. It also allows the supermarkets to dramatically shrink their stores’ footprint and place smaller stores within walking distance of most homes.
“Let me know if there’s anything you’d like to add or subtract from the list,” the cart chimes in. “Based on what’s in your cart and your fridge at home, it looks like your diet will be short on fiber this week. Shall I add a bag of almonds or ingredients for a split-pea soup to correct that?”
“No split pea soup but have a large bag of almonds delivered to my house, thanks.” I’m not sure an algorithm requires thanking, but I do it out of habit. Scanning the list, I make a couple of tweaks. My daughters are out of town so I can cut a few items, and I’ve already got some beef in my fridge so I decide to make my mother’s recipe of beef noodles for my wife.
“Subtract the Greek yogurt and switch to whole milk from now on. Also, add the ingredients for beef noodles that I don’t already have at home.”
“No problem,” it replies while adjusting my shopping list. The cart is speaking in Mandarin, but in the synthesized voice of my favorite actress, Jennifer Lawrence. It’s a nice touch, and one of the reasons running errands doesn’t feel like such a chore anymore.
The cart moves autonomously through the store, staying a few steps ahead of me while I pick out the ripest eggplants and the most fragrant Sichuan peppercorns, key to creating the numbing spice in the beef noodles. The cart then leads me to the back of the store where a precision-guided robot kneads and pulls fresh noodles for me. As I place them in the cart, depth-sensing cameras on the cart’s rim recognize each item, and sensors lining the bottom weigh them as they go in.
The screen crosses things off as I go and displays the total cost. The precise location and presentation of every item has been optimized based on perception and purchase data gathered at the store: What displays do shoppers walk right by? Where do they stop and pick up items to inspect? And which of those do they finally purchase? That matrix of visual and business data gives AI-enabled supermarkets the same kind of rich understanding of consumer behavior that was previously reserved for online retailers.
Rounding the corner toward the wine aisle, a friendly young man in a concierge uniform approaches.
“Hi, Mr. Lee, how’ve you been?” he says. “We’ve just got in a shipment of some fantastic Napa wines. I understand that your wife’s birthday is coming up, and we wanted to offer you a 10 percent discount on your first purchase of the 2014 Opus One. Your wife normally goes for Overture, and this is the premium offering from that same winery. It has some wonderful flavors, hints of coffee and even dark chocolate. Would you like a tasting?”
He knows my weakness for California wines, and I take him up on the offer. It’s indeed fantastic.
“I love it,” I say, handing the wineglass to the young man. “I’ll take two bottles.”
“Excellent choice—you can continue with your shopping, and I’ll bring those bottles to you in just a moment. If you’d like to schedule regular deliveries to your home or need recommendations on what else to try, you can find those in the Yonghui app or with me here.”
All the concierges are knowledgeable, friendly, and trained in the art of the upsell. It’s far more socially engaged work than traditional supermarket jobs, with all employees ready to discuss recipes, farm-to-table sourcing, and how each product compares with what I’ve tried in the past.
The shopping trip goes on like this, with my cart leading me through our typical purchases, and concierges occasionally nudging me to splurge on items that algorithms predict I’ll like. As a concierge is bagging my goods, my phone buzzes with this trip’s receipt in my WeChat Wallet. When they’re finished, the shopping cart guides itself back to its rack, and I stroll the two blocks home to my family.
Perception AI–powered shopping trips like this will capture one of the fundamental contradictions of the AI age before us: it will feel both completely ordinary and totally revolutionary. Much of our daily activity will still follow our everyday established patterns, but the digitization of the world will eliminate common points of friction and tailor services to each individual. They will bring the convenience and abundance of the online world into our offline reality. Just as important, by understanding and predicting the habits of each shopper, these stores will make major improvements in their supply chains, reducing food waste and increasing profitability.
And a supermarket like the one I’ve described isn’t far off. The core technologies already exist, and it’s largely a matter now of working out minor kinks in the software, integrating the back end of the supply chain, and building out the stores themselves.
AN OMO-POWERED EDUCATION
These kinds of immersive OMO scenarios go far beyond shopping. These same techniques—visual identification, speech recognition, creation of a detailed profile based on one’s past behavior—can be used to create a highly tailored experience in education.
Present-day education systems are still largely run on the nineteenth-century “factory model” of education: all students are forced to learn at the same speed, in the same way, at the same place, and at the same time. Schools take an “assembly line” approach, passing children from grade to grade each year, largely irrespective of whether or not they absorbed what was taught. It’s a model that once made sense given the severe limitations on teaching resources, namely, the time and attention of someone who can teach, monitor, and evaluate students.
But AI can help us lift those limitations. The perception, recognition, and recommendation abilities of AI can tailor the learning process to each student and also free up teachers for more one-on-one instruction time.
The AI-powered education experience takes place across four scenarios: in-class teaching, homework and drills, tests and grading, and customized tutoring. Performance and behavior in these four settings all feed into and build off of the bedrock of AI-powered education, the student profile. That profile contains a detailed accounting of everything that affects a student’s learning process, such as what concepts they already grasp well, what they struggle with, how they react to different teaching methods, how attentive they are during class, how quickly they answer questions, and what incentives drive them. To see how this data is gathered and used to upgrade the education process, let’s look at the four scenarios described above.
During in-class teaching, schools will employ a dual-teacher model that combines a remote broadcast lecture from a top educator and more personal attention by the in-class teacher. For the first half of class, a top-rated teacher delivers a lecture via a large-screen television at the front of the class. That teacher lectures simultaneously to around twenty classrooms and asks questions that students must answer via handheld clickers, giving the lecturer real-time feedback on whether students comprehend the concepts.
During the lecture, a video conference camera at the front of the room uses facial recognition and posture analysis to take attendance, check for student attentiveness, and assess the level of understanding based on gestures such as nodding, shaking one’s head, and expressions of puzzlement. All of this data—answers to clicker questions, attentiveness, comprehension—goes directly into the student profile, filling in a real-time picture of what the students know and what they need extra help with.
But in-class learning is just a fraction of the whole AI-education picture. When students head home, the student profile combines with question-generating algorithms to create homework assignments precisely tailored to the students’ abilities. While the whiz kids must complete higher-level problems that challenge them, the students who have yet to fully grasp the material are given more fundamental questions and perhaps extra drills.
At each step along the way, students’ time and performance on different problems feed into their student profiles, adjusting the subsequent problems to reinforce understanding. In addition, for classes such as English (which is mandatory in Chinese public schools), AI-powered speech recognition can bring top-flight English instruction to the most remote regions. High-performance speech recognition algorithms can be trained to assess students’ English pronunciation, helping them improve intonation and accent without the need for a native English speaker on site.
From a teacher’s perspective, these same tools can be used to alleviate the burden of routine grading tasks, freeing up teachers to spend more time on the students themselves. Chinese companies have already used perception AI’s visual recognition abilities to build scanners that can grade multiple-choice and fill-in-the-blank tests. Even in essays, standard errors such as spelling or grammar can be marked automatically, with predetermined deductions of points for certain mistakes. This AI-powered technology will save teachers’ time in correcting the basics, letting them shift that time to communicating with students about higher-level writing concepts.
Finally, for students who are falling behind, the AI-powered student profile will notify parents of their child’s situation, giving a clear and detailed explanation of what concepts the student is struggling with. The parents can use this information to enlist a remote tutor through services such as VIPKid, which connects American teachers with Chinese students for online English classes. Remote tutoring has been around for some time, but perception AI now allows these platforms to continuously gather data on student engagement through expression and sentiment analysis. That data continually feeds into a student’s profile, helping the platforms filter for the kinds of teachers that keep students engaged.
Almost all of the tools described here already exist, and many are being implemented in different classrooms across China. Taken together, they constitute a new AI-powered paradigm for education, one that merges the online and offline worlds to create a learning experience tailored to the needs and abilities of each student. China appears poised to leapfrog the United States in education AI, in large part due to voracious demand from Chinese parents. Chinese parents of only children pour money into their education, a result of deeply entrenched Chinese values, intense competition for university spots, and a public education system of mixed quality. Those parents have already driven services like VIPKid to a valuation of over $3 billion in just a few years’ time.
PUBLIC SPACES AND PRIVATE DATA
Creating and leveraging these OMO experiences requires vacuuming up oceans of data from the real world. Optimizing traffic flows via Alibaba’s City Brain requires slurping up video feeds from around the city. Tailoring OMO retail experiences for each shopper requires identifying them via facial recognition. And accessing the power of the internet via voice commands requires technology that listens to our every word.
That type of data collection may rub many Americans the wrong way. They don’t want Big Brother or corporate America to know too much about what they’re up to. But people in China are more accepting of having their faces, voices, and shopping choices captured and digitized. This is another example of the broader Chinese willingness to trade some degree of privacy for convenience. That surveillance filters up from individual users to entire urban environments. Chinese cities already use a dense network of cameras and sensors to enforce traffic laws. That web of surveillance footage is now feeding directly into optimization algorithms for traffic management, policing, and emergency services.
It’s up to each country to make its own decisions on how to balance personal privacy and public data. Europe has taken the strictest approach to data protection by introducing the General Data Protection Regulation, a law that sets a variety of restrictions on the collection and use of data within the European Union. The United States continues to grapple with implementing appropriate protections to user privacy, a tension illustrated by Facebook’s Cambridge Analytica scandal and subsequent congressional hearings. China began implementing its own Cybersecurity Law in 2017, which included new punishments for the illegal collection or sale of user data.
There’s no right answer to questions about what level of social surveillance is a worthwhile price for greater convenience and safety, or what level of anonymity we should be guaranteed at airports or subway stations. But in terms of immediate impact, China’s relative openness with data collection in public places is giving it a massive head start on implementation of perception AI. It is accelerating the digitization of urban environments and opening the door to new OMO applications in retail, security, and transportation.
But pushing perception AI into these spheres requires more than just video cameras and digital data. Unlike internet and business AI, perception AI is a hardware-heavy enterprise. As we turn hospitals, cars, and kitchens into OMO environments, we will need a diverse array of sensor-enabled hardware devices to sync up the physical and digital worlds.
MADE IN SHENZHEN
Silicon Valley may be the world champion of software innovation, but Shenzhen (pronounced “shun-jun”) wears that crown for hardware. In the last five years, this young manufacturing metropolis on China’s southern coast has turned into the world’s most vibrant ecosystem for building intelligent hardware. Creating an innovative app requires almost no real-world tools: all you need is a computer and a programmer with a clever idea. But building the hardware for perception AI—shopping carts with eyes and stereos with ears—demands a powerful and flexible manufacturing ecosystem, including sensor suppliers, injection-mold engineers, and small-batch electronics factories.
When most people think of Chinese factories, they envision sweatshops with thousands of underpaid workers stitching together cheap shoes and teddy bears. These factories do still exist, but the Chinese manufacturing ecosystem has undergone a major technological upgrade. Today, the greatest advantage of manufacturing in China isn’t the cheap labor—countries like Indonesia and Vietnam offer lower wages. Instead, it’s the unparalleled flexibility of the supply chains and the armies of skilled industrial engineers who can make prototypes of new devices and build them at scale.
These are the secret ingredients powering Shenzhen, whose talented workers have transformed it from a dirt-cheap factory town to a go-to city for entrepreneurs who want to build new drones, robots, wearables, or intelligent machines. In Shenzhen, those entrepreneurs have direct access to thousands of factories and hundreds of thousands of engineers who help them iterate faster and produce goods cheaper than anywhere else.
At the city’s dizzying electronics markets, they can choose from thousands of different variations of circuit boards, sensors, microphones, and miniature cameras. Once a prototype is assembled, the builders can go door to door at hundreds of factories to find one capable of producing their product in small batches or at large scale. That geographic density of parts suppliers and product manufacturers accelerates the innovation process. Hardware entrepreneurs say that a week spent working in Shenzhen is equivalent to a month in the United States.
As perception AI transforms our lived environment, the ease of experimentation and the production of smart devices gives Chinese startups an edge. Shenzhen is open to international hardware startups, but locals have a heavy home-court advantage. The many frictions of operating in a foreign country—language barrier, visa issues, tax complications, and distance from headquarters—can slow down American startups and raise the cost of their products. Massive multinationals like Apple have the resources to leverage Chinese manufacturing to the fullest, but for foreign startups small frictions can spell doom. Meanwhile, homegrown hardware startups in Shenzhen are like kids in a candy store, experimenting freely and building cheaply.
MI FIRST
The Chinese hardware startup Xiaomi (pronounced “sheow-me”) gives a glimpse of what a densely woven web of perception-AI devices could look like. Launched as a low-cost smartphone maker that took the country by storm, Xiaomi is now building a network of AI-empowered home devices that will turn our kitchens and living rooms into OMO environments.
Central to that system is the Mi AI speaker, a voice-command AI device similar to the Amazon Echo but at around half the price, thanks to the Chinese home-court manufacturing advantage. That advantage is then leveraged to build a range of smart, sensor-driven home devices: air purifiers, rice cookers, refrigerators, security cameras, washing machines, and autonomous vacuum cleaners. Xiaomi doesn’t build all of these devices itself. Instead, it has invested in 220 companies and incubated 29 startups—many operating in Shenzhen—whose intelligent home products are hooked into the Xiaomi ecosystem. Together they are creating an affordable, intelligent home ecosystem, with WiFi-enabled products that find each other and make configuration easy. Xiaomi users can then simply control the entire ecosystem via voice command or directly on their phone.
It’s a constellation of price, diversity, and capability that has created the world’s largest network of intelligent home devices: 85 million by the end of 2017, far ahead of any comparable U.S. networks. It’s also an ecosystem built on the Made-in-Shenzhen advantage. Low prices and China’s massive market are turbocharging the data-gathering process for Xiaomi, fueling a virtuous cycle of stronger algorithms, smarter products, better user experience, more sales, and even more data. It’s also an ecosystem that has produced four unicorn startups within Xiaomi’s ecosystem alone and is driving Xiaomi toward an IPO predicted to value the company at around $100 billion.
As perception AI finds its way into more pieces of hardware, the entire home will feed into and operate off digitized real-world data. Your AI fridge will order more milk when it sees that you’re running low. Your cappuccino machine will kick into gear at your voice command. The AI-equipped floors of your elderly parents will alert you immediately if they’ve tripped and fallen.
Third-wave AI products like these are on the verge of transforming our everyday environment, blurring lines between the digital and physical world until they disappear entirely. During this transformation, Chinese users’ cultural nonchalance about data privacy and Shenzhen’s strength in hardware manufacturing give it a clear edge in implementation. Today, China’s edge is slight (60–40), but I predict that in five years’ time, the above factors will give China a more than 80–20 chance of leading the United States and the rest of the world in the implementation of perception AI.
These third-wave AI innovations will create tremendous economic opportunities and also lay the foundation for the fourth and final wave, full autonomy.
FOURTH WAVE: AUTONOMOUS AI
Once machines can see and hear the world around them, they’ll be ready to move through it safely and work in it productively. Autonomous AI represents the integration and culmination of the three preceding waves, fusing machines’ ability to optimize from extremely complex data sets with their newfound sensory powers. Combining these superhuman powers yields machines that don’t just understand the world around them—they shape it.
Self-driving cars may be on everyone’s mind these days, but before we dive into autonomous vehicles, it’s important to widen the lens and recognize just how deep and wide a footprint fourth-wave AI will have. Autonomous AI devices will revolutionize so much of our daily lives, including our malls, restaurants, cities, factories, and fire departments. As with the different waves of AI, this won’t happen all at once. Early autonomous robotics applications will work only in highly structured environments where they can create immediate economic value. That means primarily factories, warehouses, and farms.
But aren’t these places already highly automated? Hasn’t heavy machinery already taken over many blue-collar line jobs? Yes, the developed world has largely replaced raw human muscle with high-powered machines. But while these machines are automated, they are not autonomous. While they can repeat an action, they can’t make decisions or improvise according to changing conditions. Entirely blind to visual inputs, they must be controlled by a human or operate on a single, unchanging track. They can perform repetitive tasks, but they can’t deal with any deviations or irregularities in the objects they manipulate. But by giving machines the power of sight, the sense of touch, and the ability to optimize from data, we can dramatically expand the number of tasks they can tackle.
STRAWBERRY FIELDS AND ROBOTIC BEETLES
Some of these applications are already at hand. Picking strawberries sounds like a straightforward task, but the ability to find, judge, and pluck fruits from plants proved impossible to automate before autonomous AI. Instead, tens of thousands of low-paid workers had to walk, hunched over, through strawberry fields all day, using their eyes and dexterous fingers to get the job done. It’s grueling and tedious work, and many California farmers have watched fruit rot in their fields when they can’t find people willing to take it on.
But the California-based startup Traptic has created a robot that can handle the task. The device is mounted on the back of a small tractor (or, in the future, an autonomous vehicle) and uses advanced vision algorithms to find the strawberries amid a sea of foliage. Those same algorithms check the color of the fruit to judge ripeness, and a machine arm delicately plucks them without any damage to the berry.
Amazon’s warehouses give us an early glimpse of how transformative these technologies can be. Just five years ago, they looked like traditional warehouses: long aisles of sedentary shelves with humans walking or driving down the aisles to fetch inventory. Today, the humans stay put and the shelves come to them. Warehouses are covered with roving bands of autonomous beetle-like robots that scurry around with square-shaped towers of merchandise sitting on their backs. These beetles roam the factory floor, narrowly avoiding one another and bringing a handful of items to stationary humans when they need those goods. All the employees need to do is grab an item off that tower, scan it, and place it in a box. The humans stand in one place while the warehouse performs an elegantly choreographed autonomous ballet all around them.
All of these autonomous robots have one thing in common: they create direct economic value for their owners. As noted, autonomous AI will surface first in commercial settings because these robots create a tangible return on investment by doing the jobs of workers who are growing either more expensive or harder to find.
Domestic workers in the United States—cleaners, cooks, and caretakers—largely fit those criteria as well, but we’re unlikely to see autonomous AI in the home any time soon. Counter to what sci-fi films have conditioned us to believe, human-like robots for the home remain out of reach. Seemingly simple tasks like cleaning a room or babysitting a child are far beyond AI’s current capabilities, and our cluttered living environments constitute obstacle courses for clumsy robots.
SWARM INTELLIGENCE
But as autonomous technology becomes more agile and more intelligent, we will see some mind-bending and life-saving applications of the technology, particularly with drones. Swarms of autonomous drones will work together to paint the exterior of your house in just a few hours. Heat-resistant drone swarms will fight forest fires with hundreds of times the current efficiency of traditional fire crews. Other drones will perform search-and-rescue operations in the aftermath of hurricanes and earthquakes, bringing food and water to the stranded and teaming up with nearby drones to airlift people out.
Along these lines, China will almost certainly take the lead in autonomous drone technology. Shenzhen is home to DJI, the world’s premier drone maker and what renowned tech journalist Chris Anderson called “the best company I have ever encountered.” DJI is estimated to already own 50 percent of the North American drone market and even larger portions of the high-end segment. The company dedicates enormous resources to research and development, and is already deploying some autonomous drones for industrial and personal use. Swarm technologies are still in their infancy, but when hooked into Shenzhen’s unmatched hardware ecosystem, the results will be awe-inspiring.
As these swarms transform our skies, autonomous cars will transform our roads. That revolution will also go far beyond transportation, disrupting urban environments, labor markets, and how we organize our days. Companies like Google have clearly demonstrated that self-driving cars will be far safer and more efficient than human drivers. Right now, dozens of startups, technology juggernauts, legacy carmakers, and electric vehicle makers are in an all-out sprint to be the first to truly commercialize the technology. Google, Baidu, Uber, Didi, Tesla, and many more are building teams, testing technologies, and gathering data en route to taking human drivers entirely out of the equation.
The leaders in that race—Google, through its self-driving spinoff Waymo, and Tesla—represent two different philosophies for autonomous deployment, two approaches with eerie echoes in the policies of the two AI superpowers.
THE GOOGLE APPROACH VERSUS THE TESLA APPROACH
Google was the first company to develop autonomous driving technology, but it has been relatively slow to deploy that technology at scale. Behind that caution is an underlying philosophy: build the perfect product and then make the jump straight to full autonomy once the system is far safer than human drivers. It’s the approach of a perfectionist, one with a very low tolerance for risk to human lives or corporate reputation. It’s also a sign of how large a lead Google has on the competition due to its multiyear head start on research. Tesla has taken a more incremental approach in an attempt to make up ground. Elon Musk’s company has tacked on limited autonomous features to their cars as soon as they became available: autopilot for highways, autosteer for crash avoidance, and self-parking capabilities. It’s an approach that accelerates speed of deployment while also accepting a certain level of risk.
The two approaches are powered by the same thing that powers AI: data. Self-driving cars must be trained on millions, maybe billions, of miles of driving data so they can learn to identify objects and predict the movements of cars and pedestrians. That data draws from thousands of different vehicles on the road, and it all feeds into one central “brain,” the core collection of algorithms that powers decision-making across the fleet. It means that when any autonomous car encounters a new situation, all the cars running on those algorithms learn from it.
Google has taken a slow-and-steady approach to gathering that data, driving around its own small fleet of vehicles equipped with very expensive sensing technologies. Tesla instead began installing cheaper equipment on its commercial vehicles, letting Tesla owners gather the data for them when they use certain autonomous features. The different approaches have led to a massive data gap between the two companies. By 2016, Google had taken six years to accumulate 1.5 million miles of real-world driving data. In just six months, Tesla had accumulated 47 million miles.
Google and Tesla are now inching toward one another in terms of approach. Google—perhaps feeling the heat from Tesla and other rivals—accelerated deployment of fully autonomous vehicles, piloting a program with taxi-like vehicles in the Phoenix metropolitan area. Meanwhile, Tesla appears to have pumped the brakes on its rapid rollout of fully autonomous vehicles, a deceleration that followed a May 2016 crash that killed a Tesla owner who was using autopilot.
But the fundamental difference in approach remains, and it presents a real tradeoff. Google is aiming for impeccable safety, but in the process it has delayed deployment of systems that could likely already save lives. Tesla takes a more techno-utilitarian approach, pushing their cars to market once they are an improvement over human drivers, hoping that the faster rates of data accumulation will train the systems earlier and save lives overall.
CHINA’S “TESLA” APPROACH
When managing a country of 1.39 billion people—one in which 260,000 people die in car accidents each year—the Chinese mentality is that you can’t let the perfect be the enemy of the good. That is, rather than wait for flawless self-driving cars to arrive, Chinese leaders will likely look for ways to deploy more limited autonomous vehicles in controlled settings. That deployment will have the side effect of leading to more exponential growth in the accumulation of data and a corresponding advance in the power of the AI behind it.
Key to that incremental deployment will be the construction of new infrastructure specifically made to accommodate autonomous vehicles. In the United States, in contrast, we build self-driving cars to adapt to our existing roads because we assume the roads can’t change. In China, there’s a sense that everything can change—including current roads. Indeed, local officials are already modifying existing highways, reorganizing freight patterns, and building cities that will be tailor-made for driverless cars.
Highway regulators in the Chinese province of Zhejiang have already announced plans to build the country’s first intelligent superhighway, infrastructure outfitted from the start for autonomous and electric vehicles. The plan calls for integrating sensors and wireless communication among the road, cars, and drivers to increase speeds by 20 to 30 percent and dramatically reduce fatalities. The superhighway will have photovoltaic solar panels built into the road surface, energy that feeds into charging stations for electric vehicles. In the long term, the goal is to be able to continuously charge electric vehicles while they drive. If successful, the project will accelerate deployment of autonomous and electric vehicles, leveraging the fact that long before autonomous AI can handle the chaos of urban driving, it can easily deal with highways—and gather more data in the process.
But Chinese officials aren’t just adapting existing roads to autonomous vehicles. They’re building entirely new cities around the technology. Sixty miles south of Beijing sits the Xiong’an New Area, a collection of sleepy villages where the central government has ordered the construction of a showcase city for technological progress and environmental sustainability. The city is projected to take in $583 billion worth of infrastructure spending and reach a population of 2.5 million, nearly as many people as Chicago. The idea of building a new Chicago from the ground up is fairly unthinkable in the United States, but in China it’s just one piece of the government’s urban planning toolkit.
Xiong’an is poised to be the world’s first city built specifically to accommodate autonomous vehicles. Baidu has signed agreements with the local government to build an “AI City” with a focus on traffic management, autonomous vehicles, and environmental protection. Adaptations could include sensors in the cement, traffic lights equipped with computer vision, intersections that know the age of pedestrians crossing them, and dramatic reductions in space needed for parked cars. When everyone is hailing his or her own autonomous taxi, why not turn those parking lots into urban parks?
Taking things a step further, brand-new developments like Xiong’an could even route the traffic in their city centers underground, reserving the heart of town for pedestrians and bicyclists. It’s a system that would be difficult, if not impossible, to implement in a world of human drivers prone to human errors that clog up tunnels. But by combining augmented roads, controlled lighting, and autonomous vehicles, an entire underground traffic grid could be running at the speed of highways while life aboveground moves at a more human pace.
There’s no guarantee that all of these high-flying AI amenities will be rolled out smoothly—some of China’s technologically themed developments have flopped, and some brand-new cities have struggled to attract residents. But the central government has placed a high priority on the project, and if successful, cities like Xiong’an will grow up together with autonomous AI. They will benefit from the efficiencies AI brings and will feed ever more data back into the algorithms. America’s current infrastructure means that autonomous AI must adapt to and conquer the cities around it. In China, the government’s proactive approach is to transform that conquest into coevolution.
THE AUTONOMOUS BALANCE OF POWER
While all of this may sound exciting and innovative to the Chinese landscape, the hard truth is that no amount of government support can guarantee that China will lead in autonomous AI. When it comes to the core technology needed for self-driving cars, American companies remain two to three years ahead of China. In technology timelines, that’s light-years of distance. Part of that stems from the relative importance of elite expertise in fourth-wave AI: safety issues and sheer complexity make autonomous vehicles a much tougher engineering nut to crack. It’s a problem that requires a core team of world-class engineers rather than just a broad base of good ones. This tilts the playing field back toward the United States, where the best engineers from around the globe still cluster at companies like Google.
Silicon Valley companies also have a substantial head start on research and development, a product of the valley’s proclivity for moonshot projects. Google began testing its self-driving cars as early as 2009, and many of its engineers went on to found early self-driving startups. China’s boom in such startups really didn’t begin until around 2016. Chinese giants like Baidu and autonomous-vehicle startups like Momenta, JingChi, and Pony.ai, however, are rapidly catching up in technology and data. Baidu’s Apollo project—an open-source partnership and data-sharing arrangement among fifty autonomous-vehicle players, including chipmakers like Nvidia and automakers like Ford and Daimler—also presents an ambitious alternative to Waymo’s closed, in-house approach. But even with that rapid catch-up by Chinese players, there’s no question that as of this writing, the most experienced self-driving technologists still call America home.
Predicting which country takes the lead in autonomous AI largely comes down to one main question: will the primary bottleneck to full deployment be one of technology or policy? If the most intractable problems for deployment are merely technical ones, Google’s Waymo has the best shot at solving them years ahead of the nearest competitor. But if new advances in fields like computer vision quickly disseminate throughout the industry—essentially, a rising technical tide lifting all boats—then Silicon Valley’s head start on core technology may prove irrelevant. Many companies will become capable of building safe autonomous vehicles, and deployment will then become a matter of policy adaptation. In that universe, China’s Tesla-esque policymaking will give its companies the edge.
At this point, we just don’t yet know where that bottleneck will be, and fourth-wave AI remains anyone’s game. While today the United States enjoys a commanding lead (90–10), in five years’ time I give the United States and China even odds of leading the world in self-driving cars, with China having the edge in hardware-intensive applications such as autonomous drones. In the table below, I summarize my assessment of U.S. and Chinese capabilities across all four waves of AI, both in the present day and with my best estimate for how that balance will have evolved five years in the future.
The balance of capabilities between the United States and China across the four waves of AI, currently and estimated for five years in the future
CONQUERING MARKETS AND ARMING INSURGENTS
What happens when you try to take these game-changing AI products global? Thus far, much of the work done in AI has been contained within the Chinese and U.S. markets, with companies largely avoiding direct competition on the home turf of the other nation. But despite the fact that the United States and China are the two largest economies in the world, the vast majority of AI’s future users still live in other countries, many of them in the developing world. Any company that wants to be the Facebook or Google of the AI age needs a strategy for reaching those users and winning those markets.
Not surprisingly, Chinese and American tech companies are taking very different approaches to global markets: while America’s global juggernauts seek to conquer these markets for themselves, China is instead arming the local startup insurgents.
In other words, Silicon Valley giants like Google, Facebook, and Uber want to directly introduce their products to these markets. They’ll make limited efforts at localization but will largely stick to the traditional playbook. They will build one global product and push it out on billions of different users around the globe. It’s an all-or-nothing approach with a huge potential upside if the conquest succeeds, but it also has a high chance of leaving empty-handed.
Chinese companies are instead steering clear of direct competition and investing in the scrappy local startups that Silicon Valley looks to wipe out. For example, in India and Southeast Asia, Alibaba and Tencent are pouring money and resources into homegrown startups that are fighting tooth and nail against juggernauts like Amazon. It’s an approach rooted in the country’s own native experience. People like Alibaba founder Jack Ma know how dangerous a ragtag bunch of insurgents can be when battling a monolithic foreign giant. So instead of seeking to both squash those startups and outcompete Silicon Valley, they’re throwing their lot in with the locals.
RIDE-HAILING RUMBLE
There are already some precedents for the Chinese approach. Ever since Didi drove Uber out of China, it has invested in and partnered with local startups fighting to do the same thing in other countries: Lyft in the United States, Ola in India, Grab in Singapore, Taxify in Estonia, and Careem in the Middle East. After investing in Brazil’s 99 Taxi in 2017, Didi outright acquired the company in early 2018. Together these startups have formed a global anti-Uber alliance, one that runs on Chinese money and benefits from Chinese know-how. After taking on Didi’s investments, some of the startups have even rebuilt their apps in Didi’s image, and others are planning to tap into Didi’s strength in AI: optimizing driver matching, automatically adjudicating rider-driver disputes, and eventually rolling out autonomous vehicles.
We don’t know the current depth of these technical exchanges, but they could serve as an alternate model of AI globalization: empower homegrown startups by marrying worldwide AI expertise to local data. It’s a model built more on cooperation than conquest, and it may prove better suited to globalizing a technology that requires both top-quality engineers and ground-up data collection.
AI has a much higher localization quotient than earlier internet services. Self-driving cars in India need to learn the way pedestrians navigate the streets of Bangalore, and micro-lending apps in Brazil need to absorb the spending habits of millennials in Rio de Janeiro. Some algorithmic training can be transferred between different user bases, but there’s no substitute for actual, real-world data.
Silicon Valley juggernauts do have some insight into the search and social habits in these countries. But building business, perception, and autonomous AI products will require companies to put real boots on the ground in each market. They will need to install hardware devices and localize AI services for the quirks of North African shopping malls and Indonesian hospitals. Projecting global power outward from Silicon Valley via computer code may not be the long-term answer.
Of course, no one knows the endgame for this global AI chess match. American companies could suddenly boost their localization efforts, leverage their existing products, and end up dominating all countries except China. Or a new generation of tenacious entrepreneurs in the developing world could use Chinese backing to create local empires impenetrable to Silicon Valley. If the latter scenario unfolds, China’s tech giants wouldn’t dominate the world, but they would play a role everywhere, improve their own algorithms using training data from many markets, and take home a substantial chunk of the profits generated.
LOOKING AHEAD
Scanning the AI horizon, we see waves of technology that will soon wash over the global economy and tilt the geopolitical landscape toward China. Traditional American companies are doing a good job of using deep learning to squeeze greater profits from their businesses, and AI-driven companies like Google remain bastions of elite expertise. But when it comes to building new internet empires, changing the way we diagnose illnesses, or reimagining how we shop, move, and eat, China seems poised to seize global leadership. Chinese and American internet companies have taken different approaches to winning local markets, and as these AI services filter out to every corner of the world, they may engage in proxy competition in countries like India, Indonesia, and parts of the Middle East and Africa.
This analysis sheds light on the emerging AI world order, but it also showcases one of the blind spots in our AI discourse: the tendency to discuss it solely as a horse race. Who’s ahead? What are the odds for each player? Who’s going to win?
This kind of competition matters, but if we dig deeper into the coming changes, we find that far weightier questions lurk just below the surface. When the true power of artificial intelligence is brought to bear, the real divide won’t be between countries like the United States and China. Instead, the most dangerous fault lines will emerge within each country, and they will possess the power to tear them apart from the inside.