What are we talking about with regard to data?
For most people, the word data may mean figures on the monthly water, electricity, and heat bills, the red and green index on the K chart of stock, or the bunch of obscure source codes in a computer file.
From the standpoint of artificial intelligence, the meaning of data is far more extensive. Data spans human civilization, from the initial sounds, words, pictures, and figures, to every image, speech sound, and video of the electronic age; every mouse click in the Internet age; every finger slip on a mobile phone; every heartbeat and breath; and even all human actions and trajectories in economic production.
Nowadays, human beings are able to turn various things, big or small, into data records and make them part of their lives, including both the eternal gravitational wave and the complex and subtle DNA. Data has been immersed in every detail of our lives. Just as biologists believe that half of human tissues are made up of microbes, in the digital age, half of our lives are already data.
History always spirals forward. Let us go back to the past, long before the birth of artificial intelligence, when human beings also practiced the discovery, calculation, and utilization of data.
More than five thousand years ago, ancient Egyptians summed up the law by observing the position of the astrological signs: the Nile River began to flood when the wolf star (Sirius) appeared on the eastern horizon every year. They worked out plans for agricultural farming according to this and summarized the cycle to determine the solar calendar for a 365-day year. The distant Sirius has no causal relationship with the earth; when it appeared in that position, the earth moved into a certain solar term. This is the predecessor of the correlation calculation in the era of big data.
More than four thousand years ago, Stonehenge appeared on the territory of today’s Britain, large stones each weighing about fifty tons, forming a circular array. This is an original timepiece. When summer solstice comes, its main axis, the ancient road leading to the stone pillar, and the first sunshine in the morning will be on the same line; in the opposite direction, the last rays of the winter solstice will also cross the stone gate. The ancients used a cumbersome stone gauge for data measurement. This was the earliest data-visualization technology.
More than two thousand years ago, Claudius Ptolemy studied the motion of celestial bodies and established the three laws that laid the foundation for astronomy. His method is very interesting; in a nutshell, he conveyed a right idea in a wrong way. Originally, he mistakenly thought that the trajectory of celestial motion was circular, when it is in fact elliptical. In order to describe the actual motion curve of the celestial body with a circle function, he used multiple circular nested motions—as many as forty. That’s equivalent to fitting a total function with multiple circular-motion functions. His work is the earliest idea of fitting functions.
What is a fitting function? When there is a lot of data, we can imagine the data as a lot of points in a coordinate system. How do you find a function that allows its curve to cross as many points as possible? If these points are distributed regularly, such as in a linear distribution, they can be described by linear equations.
If the distribution points form a parabolic shape, then the function is also very easy to obtain, which is in the form of X2=2py. But, if these data points appear to be irregular, then it is difficult to find a single function. Modern people think to use a multifunction superposition method to simulate an overall function. The weight of each function is adjusted, so that the superposition function curve can pass through as many points as possible. Ptolemy recorded a large amount of data on celestial motion and then tried to simulate the function of the elliptical trajectory by superimposing multiple circular functions in order to include all the data he recorded. The fitting-function method is suitable for finding the law from a large number of discrete data records, which is the foundation of today’s artificial intelligence and the basic mathematical method of machine learning. Many of the modern basic mathematical methods have existed in the past, but they were not put into use due to limited capacity.
Today, humans can restore history by digital mapping. Even in a game like My World, the computer can calculate the angle and length of each tile and reproduce the perfect three-dimensional image of the ancient walls that were built thousands of years ago. You will feel that all the splendid history of ancient Egypt, ancient Greece, and ancient China are reconnected with us. But, compared to a fading golden crown and silver belt, ancient people’s wisdom of using data may be the most precious inheritance of mankind.
Data civilization is progressing, but most people are still unfamiliar about data. In everyday life, the concept of data is both familiar and unfamiliar to us. We get close to it, because everyone will learn some basic data and algorithms, such as addition, subtraction, multiplication, and division. After entering society, we inevitably deal with all kinds of documents, statements, and bills, no matter what job we take. But when we face various fashionable and complicated data, such as memory and high-tech resolution, we find understanding data, or even being aware of it, difficult. With the advent of big data, machine algorithms, and artificial intelligence, this abnormality has deepened.
So is data far away from our life? On the contrary, under the new technical conditions, the connection between data and our daily lives has never been so close. Our ancestors learned to store data in a structured way, but they were not as active and specific as we are today.
From calculators and cameras to home computers and smartphones, and to big data and artificial intelligence, we are constantly upgrading the way through which we collect and use data. Now, from the daily carbon emissions of a car to the monitoring of global temperature, from the analysis of everyone’s online comment to the prediction of the voting trend in the presidential election, from predicting the rise and fall of a stock to observing and evaluating the development of the entire economic system, we can do everything. Data connects person with person, as well as people with the world, thus forming a dense network. Everyone is affecting the world, and everyone is being influenced by others. This dialectical relationship from micro to macro is like the phenomenon of quantum mechanics that occurs in all human beings, which answers countless questions. Traditional statistical methods have been unable to process such interaction data. So what can we do? The answer is to let the machine handle the data itself, and then we learn from the data. This is the essence of contemporary artificial intelligence.
As early as sixty years ago, artificial intelligence was studied by scientists seriously. Even though ordinary people are interested in artificial intelligence, there has rarely been any breakthrough during the rapid development of human science and technology after World War II. That is, until today, when we suddenly see that artificial intelligence has sprung up, breaking into our lives with new features, such as big data, AlphaGo, and Baidu unmanned vehicles.
If we compare the technology of artificial intelligence to a heart, then we can say that the heart has suffered from two congenital deficiencies: First, before the Internet, the amount of data that artificial intelligence could use was too small, which was “insufficient blood supply.” Second, the lack of hardware led to inefficiency in computing power to solve complex problems, which is “insufficient mentality.” Data is like blood; hardware is like blood vessels. These two problems were unsolved until the Internet advanced by leaps and bounds, the computing power of computers doubled in one year, and revolutionary changes took place in computing architecture. The rushing data blood enters every corner of the physical body: image recognition, speech recognition, natural language processing. Their eyes and mouths are open, ears are alert, and the heart of the machine is alive!
We have been deeply immersed in data. Computers, smartphones, and various smart appliances are collecting our words and actions; they know more and more about us through computational modeling, and the simplest daily activities such as watching news, engaging in sports and fitness, eating, listening to songs, and traveling have become the grand data festivals.
A smartphone can produce 1G of data for his or her owner in one day. This probably equals the total capacity of thirteen sets of Twenty-Four History (a multivolume history of China). Every day we use data to write our vast life cycle.
Unlike traditional data record definitions, this type of data is “alive.” This kind of record is not an objective and absolute mathematical measure, nor is it a one-on-one historical writing. It is more like a natural extension of our body: intelligent agents listen to our speech, broaden our vision, and deepen our memories, even though it is forming another “me” in the form of data. If the smartphone has become a new organ of human beings, then the data is the “sixth sense” received by this new organ. The new brain that deals with this sixth sense is the rising artificial intelligence.
Humans have been using data for a long time, and since the industrial revolution, data has become more commonplace. Then why has the concept of big data emerged only in recent years? Can big data do anything apart from recording and calculating more data? Making big data functional involves several characteristics.
Size
First, consider the bigness of big data. Undoubtedly, compared to the traditional data-storage methods, bigness is not bigger on the same order of magnitude, but bigger in a geometric dimension. Think about the 72 billion daily positioning requests on Baidu Map, and think about the number of clicks on the Internet, the number of words and images sent on social media every day. The amount of data collected by various big data platforms in one day can surpass the sum of words and images that humans have accumulated for thousands of years.
Multidimensionality
Second, multidimensionality is a factor. Multidimensionality means that big data can describe a thing in multiple directions and thus be more accurate.
In the movie Jason Bourne, a big data company helped the US Central Intelligence Agency (CIA) quickly track and locate suspects based on data collected from various dimensions, such as Internet data, traffic data, and historical archives. In reality, Palantir Technologies of the United States has helped the US government track down Osama bin Laden and provided counterterrorism information and warnings of social crises. The company is more often used to discover financial fraud.
Take the financial credit application as an example. When conducting traditional credit reporting, traditional financial institutions generally collect about twenty types of data, including age, income, education, occupation, property, and borrowing history. After they get a comprehensive score on the customer’s repayment ability and willingness to repay, the credit limit is determined.
Internet companies adopt big-data methods, and the dimensions they acquire can actually startle traditional banks. BAT (Baidu, Alibaba, and Tencent) has opened its own financial services. Due to the comprehensive and huge user data, the group can query customers’ various online records, such as abnormal behaviors like bulk application for loans; it can also compare customer information with Internet global information. The records are scrutinized for fraud patterns; further, the customer’s consumption behaviors and habits can be analyzed, and then the company will know the applicant’s repayment ability with the reported income. Of course, respecting the privacy of the user, the data will not be disclosed. The convenience for users is that the wait time for the credit investigation is greatly shortened. Big data can retrieve and review the original information of more than ten thousand applicants in a few seconds and quickly check tens of thousands of indicator dimensions.
Examining the creditworthiness of a stranger has been similar to the story of a blind man feeling an elephant, touching only some part of it, and concluding what the elephant is like. The traditional method is based on the evaluation of a customer’s credit “elephant” through twenty “blind people,” whose understanding may be flawed. The multidimensionality of big data is like tens of thousands of people simultaneously “touching the elephant” and then putting all their feedback together. The more dimensions there are, the more accurate the conclusion is.
How Data Is Stored
Third is the ability to process unstructured data. The most basic numbers, symbols, etc., in structured data can be stored in the database with fixed fields, lengths, and logical structures and presented to humans in the form of data tables (think of the common Excel tables), which are very convenient to handle. However, the Internet era has produced a large amount of unstructured data. The data of pictures, videos, audio, and other content is huge in volume but has no clear structure. For instance, we can only regard image data as myriad pixels on a two-dimensional matrix. Unstructured data is growing rapidly, presumably to account for 90 percent of total data in the next ten years. Big-data technology can calculate and analyze a large amount of unstructured data through image recognition, speech recognition, natural-language analysis, and other technologies, thus greatly increasing the data dimension.
The amount of unstructured data is far greater than structured data; unstructured data takes up huge resources and has broad application prospects. For example, in the past, personal identification in public places such as airports could only be verified based on identity information provided by passengers. After the application of face recognition, speech recognition, and other technological measures, big data can directly check the passenger through the camera, increase the dimension of personal identity judgment, and conduct accurate and efficient security checks.
Duration
Fourth, big data is a continuous flow, like time. It never returns to the past, just as people could not step into the same river twice. This is because the amount of data is too large to be fully stored. On the other hand, big data is related to the actions of humans, which are constantly changing. Therefore, Baidu’s Big Data Lab proposed a concept called “time and space big data.”
The map is the mother of time and space big data. Baidu Map has a road congestion warning function. If the road is clear, it will be displayed in green; if it is congested, it will be displayed in red to notify the user to choose another route. This is a concise example of how we interact with data. If we have two routes A and B, and route A is congested, and route B is smooth, then we will choose route B; when more and more drivers choose route B, it will become congested, and route A will be smooth again. Everything keeps changing. Relying on the positioning function of smartphones, Baidu Map can change the current road condition monitoring result in real time and accurately report the current road condition to each user. Data-visualization techniques and various assessment methods can be used to describe the daily pulse of a city, such as changes in the flow of commute, as if the city’s breathing. Apart from recorded data, data is only valid at the moment. It is impossible to store all the data; even the land of the entire city may not be enough to pile all the hard disks. It can only be applied immediately, and it disappears after being used.
Keeping up with time data is really challenging. In November 2016, Baidu officially accessed the children’s missing-information emergency-release platform of the Ministry of Public Security. Whenever a child is missing, Baidu Map and the mobile phone Baidu app will accurately forward the important information, such as the name, appearance, and time of missing child, to users in the vicinity, so that they can participate in the search process without delay. After the child is found, Baidu Map and mobile phone Baidu app will also update the closing time to let people keep abreast of progress. The less time it takes to provide the information to the users, the more hope the anxious family will get.
Repetition
Last but not least, the bigness of big data is endless repetition. For speech recognition, people’s repetition of same statements helps the machine to fully grasp human speech by repeatedly identifying the nuances of human speech. Also, people’s recurring movements can help the system to capture the laws of urban movement. The mathematical meaning of repetition is “exhaustion.” In the past, human beings could not grasp the law of a thing through exhaustive methods. They could only use sampling to estimate or use simple and clear functions by observation to represent the law of things, but big data made the “stupid method” of exhaustion possible.
Quantitative change promotes qualitative change. In the field of machine intelligence, the amount of data and the speed of processing can directly determine the level of intelligence. Google’s story of improving the quality of translation through the amount of data is no longer a secret.
In 2005, the National Institute of Standards and Technology held the annual evaluation of machine-translation software. Many university institutions and large companies have applied for research funding for machine translation from the US government, and they needed to take this evaluation. Teams or companies that did not receive government funding have also joined, and Google was one of them. Other participants included IBM, the German Aachen Institute of Technology, and many famous companies in the machine-translation industry, all with strong experience in machine translation for many years, but Google was the first.
However, the results of the evaluation stunned people: Google won first place and scored much higher than other teams. In terms of Chinese-English translation, Google’s performance rated at 51.37 percent of the BLEU score, and the second- and third-place companies only scored 34.03 percent and 22.57 percent. Finally, Google announced its secret: use more data! Google didn’t just use twice as much as other teams, but more than ten thousand times more! Google can collect massive amounts of human bilingual data on the Internet through search engines. For the same sentence, there will be many different Chinese translations, and the computer will use this repetition to find the most commonly used translation. Without any other changes, Google has trained and transformed a product ahead of the other machine-translation methods over a generation, solely relying on increased data samples. In fact, the secret of Google’s success is its super exhaustion ability.
The data advantages of Internet companies such as Google and Baidu are wide-ranging. In addition to translation, these advantages can be easily copied to other fields, such as speech recognition and image recognition. As a small game, Baidu’s app for writing poems combines big data and artificial intelligence. According to Zhongjun He, Baidu’s chief architect and head of machine-translation technology, traditional poetry-writing software generally uses statistical models to generate the first verse by a given keyword and then generates the second verse, repeating the process until the whole poem is completed. Baidu writes poetry in another way: the user inputs any words or sentences, and the system combines the big data in Baidu’s search engine to deeply analyze and associate the user expression and derive the related keywords with high relevance. For instance, the user enters the word “West Lake” at random, and Baidu’s poetry system analyzes the data of a large number of poetry and prose to find out which keywords should be included in a poem describing “West Lake.” Here the keywords may include “broken bridge and unmelted snow,” “misty rain,” “drooping willow,” and so on. Next, a poem is generated by deep neural-network technology, based on each subject word. These keywords are equivalent to the outlines we often use in writing. Creation according to the outline can ensure that the whole poem is unified in the artistic conception and the content of the verses is logically smooth. In the past, people said that single sentences of the poems written by the machine seemed to be good but the overall artistic conception was far from satisfactory, and now that problem can be effectively remedied. Each sentence is generated by machine-translation techniques. The first sentence of poetry is “translated” to generate the second sentence, then the second sentence is “translated” to generate the third sentence, and so on. We use “West Lake” as input, and the seven-verse poem generated by Writing Poems for You seems beautiful and logical.
Humans are becoming increasingly picky, spoiled by technology products, and big data can provide a dazzling color relative to boring choices. In the past, TV sets did not respond to our emotions, but now video websites are patiently and carefully collecting every kind of feedback from us: downloads, closing or fast forward; all kinds of actions are recorded. Then big data calculates various indicators, such as our preferences and spending power.
The American TV series House of Cards was popular for a time. The politicians on TV are playing cards, but behind them is big data playing invisible chess. It is produced by the famous American Internet TV company Netflix, which is familiar with the power of big-data analysis. In addition to the foregoing user behavior described, it also tried to collect viewing time, viewing equipment, viewer number, and viewing scenes, analyzing the starring actors and directors of users’ favorite programs. Through big-data analysis, it was concluded that a show like House of Cards would be hot, so Netflix purchased the remake copyright from BBC (British Broadcasting Corporation) at a high price, and predicted Kevin Spacey to be the most suitable starring candidate. The result proves that Netflix’s bet on House of Cards was completely correct. When we sighed in front of the screen that the president played by Spacey had the wisdom to control everything, we did not realize the power of the “data president.”
The current US president, Trump, made full use of the data for the election. According to media reports by Bloomberg and other media, his technical team used public data on Facebook, Twitter, and other platforms, such as giving a thumb up, forwarding, and collecting behaviors, to accurately describe the portraits of voters and push individualized campaign ads to them. Each Twitter and Facebook message is sent targeted with different content directed to different netizens.
Baidu Brain is also good at developing accurate portraits of the users by big data. In 2016, the producer of the popular movie Warcraft, Legendary Pictures Productions, cooperated with Baidu Brain and accurately pushed the film ads to potential audiences according to the massive analysis of users. Although the film had a bad box-office draw in the North American market, it sold $221 million in China. When the Warcraft fans shouted, “For the Horde!” in the theater, perhaps it was big data that quietly gave them the Force.
Chinese people always say, “Food is god for the people.” Regarding movies, how to eat well is a hot topic for the entire people. In 2013, Baidu published “Ranking of China’s Top Ten Foodie Provinces and Cities,” which became very popular on the Internet. This list uses the big data of Baidu Knows and Baidu Search, according to netizens’ 77 million Q&As about eating. It summed up the different eating habits and characteristics in various places.
A lot of interesting phenomena have been unearthed in the massive data. “The fruit to help lose weight most quickly” has been asked by as many as three hundred thousand people. It seems that many netizens are still thinking about their bodies while eating. “The crab was still alive last night but died today; can I still eat it?” There are as many as sixty thousand responses to this question. It can be seen that Chinese foodies have a particularly high passion for crabs. Of course, there are more daily questions such as “Can I eat X?”” and “How to cook X?” Just the simple question “Can spinach and tofu be eaten together?” has triggered countless discussions.
These problems alluded to by the questions are huge and seemingly confusing. But repetition is exactly the beauty of big data. Big data can capture deeper meaning. For example, netizens in Fujian and Guangdong often ask questions about whether certain insects are safe to eat, while netizens from the Northwest are quite uncertain about how to cook seafood. Different users care about different ingredients and practices. Baidu Big Data summarized the attribute of foodies from each province and city. Behind this, big data considers the geographical location of netizens, the time of questioning and answering, the information provided about eating or cooking, and even the various dimensions of mobile-phone brands used by netizens.
In addition to portraying human attention to information, big data is even constructing our bodies. Nowadays, many people are familiar with wristbands, which analyze our health condition and make recommendations based on daily exercise data, such as walking steps, calorie consumption, sleep duration, etc. In the future, we can upload our personal data and use big data to detect the possibility or potential threat of various diseases in order to better prevent them.
There are many examples of big data in life. Most of the advanced Internet products we use today, whether computers or smartphones, are more or less related to big data. When we use these services without thinking, we have already invited big data into our lives. It silently looks at every detail of our lives, subtly encouraging and advising us to make choices and strengthen our role.
In 1950, Alan Turing created a test method for machines—the later famous Turing test. The legendary scientist believed that if a machine could talk to humans (via telex equipment) and could not be identified as a machine, then the machine could be considered intelligent. This simplification convinced people that thinking machines are possible, and the Turing test has been an important criterion for artificial intelligence until now.
This standard has diverged: as long as the machine behaves like a human, we don’t have to worry too much about the machine’s operating rules. Some people have proposed ways to let the machine learn the rules by themselves so humans don’t have to worry about them.
In 1949, Donald Hebb took the first step in machine learning based on the learning mechanism of neuropsychology, creating a method that was later called Hebbian learning rule. Hebb believed that the learning process of neural networks occurs at the synaptic sites between neurons. The intensity of synaptic connections changes with the activity of neurons in front of and behind the synapse. Correct feedback will strengthen the connection between the two neurons. This principle mechanism is similar to Pavlov’s conditioning experiment with dogs: each time before feeding, the experimenter rings the bell; after some time, the dog’s nervous system will connect the ringing with the food. Hebb used a set of weighting formulas to stimulate the human neural network, with the weights representing the strength of the connections between neurons. He created a set of methods for the machine to easily distinguish between things. For each piece of data, let the decision-tree program make a judgment, reward it if it is right (improve the weight of the function), and punish it if it is wrong (reduce the weight of the function). He used this method to create a classifier that can extract the statistical properties of the data set and classify the input information into several classes according to their similarity. It seems like how human beings observe and summarize and distinguish things when observing a certain phenomenon, but this “observation” of the machine is more like a conditional reflex achieved through training. It is not based on internal thinking like humans do but pays attention to the correlation relationship contained in the data, instead of the causal relationship in human thinking.
In the next decade, research on artificial intelligence became increasingly intense. In 1952, IBM scientist Arthur Samuel successfully developed a checker program that could improve itself. He coined the term machine learning and defined it as “a field of research that provides computing power without explicit programming.”
In 1957, Frank Rosenblatt proposed the perceptron algorithm, which became the basis for the development of neural networks and support vector machines (SVMs) in the future. Perceptron is a kind of algorithmic classifier, a linear classification model. The principle is to separate the data by continuously training trial and error to find a suitable hyperplane. (The hyperplane can be understood in this way: in the three-dimensional coordinate space, the two-dimensional shape is called a plane and can divide the three-dimensional space. If the data is multidimensional, then in the N-dimensional coordinate space, the N-1 dimension is a hyperplane, which can divide the N-dimensional space.) As you enter the two piles of balls labeled CORRECT and WRONG, the perceptron can find the dividing line between the two piles of different balls for you.
A perceptron is like a neural network with only one layer between input and output. When faced with a complicated situation, it is powerless. For example, when the “correct” and “wrong” balls are mixed with each other, or when a third kind of ball appears, the perceptron cannot find the boundary of the classification. This makes it difficult for the perceptron to make a breakthrough on some seemingly simple issues.
Nowadays, humans don’t have input rules (programming) but let the machine look for rules by itself, so that the machine can use its own intelligence. Today’s artificial intelligence is developed on the basis of machine learning, only its speed of growth being limited by hardware and methods.
If multiple computers and multiple chips are set up in a network for machine learning and have multiple chip network layers, then they will be categorized as so-called deep learning. In the late 1970s, Professor Geoffrey Hinton and others had discovered that if a multilayer neural network can be implemented, the pattern in pattern can be found step by step, allowing the computer to solve complex problems by itself. At that time, they developed a “backpropagation” algorithm neural network. However, the complexity of multilayer neural networks led to a significant increase in the difficulty of training, and the lack of data and hardware-computing capabilities became a constraint.
From the mid-1960s to the end of the 1970s, the pace of machine learning was almost stagnant. This situation did not improve until the 1980s. With the rapid development of computer performance and the advent of the Internet, artificial-intelligence research finally became more powerful. In the 1990s, modern machine learning was initially formed.
The Internet was put into commercial use in the 1990s, which led to the development of distributed computing methods. Supercomputers are expensive, and distributed-computing technology takes advantage of the large quantity, allowing multiple ordinary computers to work together, each computer undertaking part of the computing task and aggregating the whole results, which can outpace the supercomputer. The distributed structure is suited to the increasing amount of data.
As traditional artificial intelligence relies blindly on the rule of model input by scientists, it can only work effectively when solving problems with relatively clear rules. For example, Deep Blue, which defeated world chess champion Kasparov, is such an example of artificial intelligence. However, when faced with the simple problem of recognizing a picture, which humans can learn in the infant stage, such artificial intelligence would be in a quandary, because this kind of cognitive problem only has a vague concept, with no clear and simple rule available. The computer neural network does not require humans to declare the rules in advance—it can identify the patterns (rules) from a large amount of basic data by itself.
As the name implies, the neural network resembles the human brain and consists of many neurons. Each neuron is connected to several other neurons to form a net. A single neuron only solves the simplest problems, but, when combined into layers, neurons can solve complex problems.
Geoffrey Hinton believes that the traditional machine-learning method used only one layer of chip network, so its processing efficiency became very low when dealing with truly complex problems. The core idea of deep learning is to increase efficiency by adding the number of neural-network layers and to abstract and simplify the complex input data layer by layer; in other words, the machine solves the complex problem by dividing it into subsegments. Each layer of neural network solves its own problem, and the result of this layer is passed to the next layer for further processing.
With one layer of neural network, simple patterns can be found; with multiple layers of neural networks, patterns in patterns can be found. Take face recognition as an example. The first layer of the neural network only focuses on the image areas with sides of dozens of pixels, from which some shapes (shapes are patterns)—eyes, noses, and mouths—are recognized. Then these already-recognized shapes are handed over to the next layer of neural network, which finds a bigger pattern from the existing recognition results and combines them into adult faces. Stated more mathematically, the current popular deep neural networks can be divided into CNN (convolutional neural network) that responds to data with spatial distribution and RNN (recurrent neural network, also known as circular neural network) that responds to data with temporal distribution.
CNN is often used for image recognition. The first layer of the network is trained to accomplish a small target for recognizing local independent modules of the image, such as a square, a triangle, or an eye. At this level, humans input a large amount of image data, to allow the layer to only discern the basic edge of the local image, i.e., nothing besides a pixel. Each of the following layers looks for a higher-level pattern from the information derived from the previous layer. This method simulates the way that human eyes compose information, which discard minor details and prioritize certain salient patterns. For instance, several small pieces and a circle combine to form a face; no matter where it appears in the image, the human eye will first pay attention to the face, instead of all the single parts of the image.
RNN is often used for speech recognition and natural-language processing because speech and language are data distributed according to time—the meaning of the next sentence is related to the previous one. The RNN network can remember historical information. Suppose we need to develop a language model and use a preceding sentence to predict following words. Given that “I was born in China in 1976. My college major is mathematics. I speak fluent _____” the last word of this sentence is obviously Chinese (the language that Chinese people speak), which is very simple to grasp for humans. But computer neural networks need to get the previous information of “China” to finish the work, which requires a cycle of design, so that the neural network can have a temporal depth.
Deep neural networks have greatly optimized the speed of machine learning and made breakthroughs in artificial intelligence. On this basis, great progress has been made in image recognition, speech recognition, and machine translation. Speech sound input is much faster than typing; machine translation basically allows us to understand a piece of foreign-language information; image recognition can precisely find one person from a pile of adult photos by means of his or her photo as a teenager, and it can even restore very vague photos to very clear and accurate ones.
Artificial intelligence based on deep learning is different from the previous artificial-intelligence principle but shares similar logic with what we know about data mining: get results first and look back in reverse for patterns. This process is called training.
With simple mathematics knowledge, we can explain the basic thinking modes of machine learning, training, and deep learning. This method is comparable to the Copernican reversal in the field of mathematics. A simple function is used as an example to illustrate this reversal.
In the past, we solved mathematical problems by generally knowing formulas (functions), then using input data to find results. Take the function y=ax+b as an example. If you know y=2x+1 and let x=1, you can find y=3. Here x is input, and the resulting y is output.
A higher-order mathematical ability is to know the formula and the output and use them to find the input value. For example, if y=2x+1, let y=5, and find x.
One step further, we will touch machine learning. When we don’t know the coefficients a and b, but know the values of y and x, we need to find a and b; that is, knowing input and output, we can find the function coefficients. In the y=ax+b function, we only need to know the two sets of x and y values to confirm a and b.
Further, suppose we have a set of input and output data, but we don’t know the form of the function at all. What should we do? This requires a constructor. For example, it is known that x=2, y=5, and find f(x). This cannot be calculated with very few input and output data; f(x) may be 2x+1, 1x+3, or even x2+1, and countless other cases. However, if the number of x and y are sufficient, the mathematician can adjust the weight of the formula through the approximation calculation method and approximate the function.
The problem is that with extremely large and complex data, which is generated in our modern production and life, we need to be rather highly efficient if we want to obtain the functions contained in the data. The human brain is no longer competent, but it can be handed over to the computer. The fitting function shows its magic here. The deep-learning neural network simulates the neural nodes of the human brain. Each node is actually a function regulator, and numerous functions are connected to each other. Through various mathematical methods such as matrix, optimization, and regular expressions, the deep-learning process continuously adjusts the weight of each function coefficient. When the data is adequate and the construction principle is appropriate, the evolving function will be more accurate in fitting most of the data, then we can use this set of functions to predict what hasn’t happened yet. This process is what we call “training.”
When Andrew Ng was working at Google, he led his team to successfully train the famous computer cat-identifying system. If we use the old-fashioned symbolic artificial-intelligence method to program, then first we need to carefully define the cat, such as sharp ears, round eyes, straight beard, four legs, long tail, etc.; convert these characteristic definitions into proper functions; input these functions into computers; and then present a picture to the computer. The computer breaks down different elements of the picture and compares these elements with the rules defined in the program. If it conforms with the characteristics of sharp ears, round eyes, straight beard, four legs, long tail, etc., then the picture is of a cat.
The method of machine learning is quite different. Scientists do not write the definition of a cat in advance, but let the computer find it. Scientists just “feed” a large number of pictures to the computer and let the computer output tags, either cats or not cats. Numerous pathways are generated in the neural networks that can identify cats. Just like human brains, each pathway outputs its own result. If it is correct, scientists will increase the weight of this pathway (can be interpreted as a green light); if it is wrong, the weight will be reduced (can be interpreted as a red light). After enough attempts, such as testing one hundred thousand pictures of various cats, the weighted neural pathways form a recognition device (a complex set of function linkages). Then the cat in the new picture can also be identified without scientists telling the result of the identification. The more training data there is, the more complex but accurate the set of functions becomes.
This is “supervised learning”—relying on a large amount of tagged data. The cat-identifying project led by Andrew Ng can even learn from the scratch, and cats can be identified without labels. When researchers showed millions of static cat pictures to the neural network, the neural network obtained a stable model by itself. From then on, it identified the cat’s face without any hesitation, like all children.
Andrew Ng’s doctoral student Quoc Viet Le has written a paper based on this, which shows that machine learning can also identify the original unlabeled data and establish his own knowledge model. Its significance is by no means limited to the identification of cats.
More than two decades ago, intrigued by the beehive effect, Kevin Kelly narrated his opinion in his outstanding scientific book Out of Control. He used this method to predict the emergence of new technologies such as distributed computing, even when he may not have seen the machine-learning principle behind the beehive effect. Each bee’s movement is random, but the bees in the hive can always fly in one direction. A large number of bee’s respective actions (inputs) are aggregated into a total movement (output), and the logic (function) behind that is the beehive effect. The information movement in the computer neural network is like the supersonic flying bee colony collecting data pollen. In their seemingly frantic trajectory, a cat’s face is highlighted. Baidu’s ability to identify cats has gone far beyond human’s. It can even accurately distinguish different species of cats.
So, for humans, machine learning often forms a black box. Some people have warned that this kind of black box that transcends human understanding will bring danger because we don’t know how the machine thinks and whether it creates dangerous thinking. But, more often than not, deep learning can bring unexpected surprises.
An engineer at the Baidu Speech Recognition Development Team once related an interesting story: When a speech sound team member tested the speech-recognition program at home, he inadvertently sang a few lyrics, and the lyrics were accurately identified. This surprised him, because other companies’ speech-recognition technology can’t do that. The Baidu team members did not train for this form of unaccompanied singing, nor did they set that goal. They don’t know how the system does it. The training data must have reached a sufficient level. The program has cultivated this amazing skill in the process of continuous training and learning.
People tend to know about changes in the world slowly and feel behind the times. In the days without deep learning, the world seemed to be all right. But some of the invisible burdens are being silently undertaken by some people. Zhou Kehua, a serial killer, had come and gone like a shadow for more than a decade. In order to seize him, the public security department mobilized almost all video surveillance materials to discover his traces. At that time, how did the police officers retrieve videos? By naked eye! They needed to watch the video clips hundreds or even thousands of hours long; some police officers even fainted at work. However, visual recognition based on deep-learning techniques changes that. At present, advanced monitoring systems have strong artificial-intelligence support. After the training with big data, they can instantly recognize faces, license plates, models, etc., from the video and semantically facilitate human retrieval. So, just give the computer a few photos of the suspect, and the neural network can quickly find the suspect-related footage from the massive video for human reference. Security enterprise Yushi Technology has developed such a smart camera system; when combined with Baidu Map, it can quickly locate the path of suspects or vehicles.
Deep learning has already changed our lives in many invisible aspects. In order to collect and maintain map information, it is necessary to capture images along the road through collection vehicles. Traditionally, a collecting car needs two staff members. The collecting process is divided into two parts: internal work and external work. The external work is to drive the vehicle and record the things along the way. In addition to video recording, the copilot is responsible for recording with speech. Every time they pass a certain place, one person has to say that there is a probe in front, a traffic light, or four lanes, or the person needs to tell the driver to turn left, go straight, turn right, and so forth. Under this traditional way, the staff members need to record all the things they see in the form of video and sound and then send the data to the data-processing center. The center’s internal personnel record and compare the data by the minute and finally mark the elements of the road on the map, which is basically labor-intensive work.
After applying the intelligent image-recognition technology, we first use the deep-learning training machine to identify the road elements, such as traffic lights, lanes, probes, etc. After that, we only need to directly submit the panoramic image taken along the road to the machine for identification, and then we get the complete map information. This greatly saves manpower and greatly improves efficiency and accuracy.
In addition to software algorithms, there is a story about hardware on deep learning. There have been many inventions in history that deviated from their original intentions in later applications. For example, although an explosive, nitroglycerin can be used for first aid in heart disease. The original attempt to invent artificial synthetic substitutes for strategic materials ended up in the invention of plasticine. In the field of deep learning, the role of GPU has also been changed. The GPU was originally a graphics card that was used to render images and speed up graphics calculations, but later it became the main hardware for deep learning. Because the graphics chip has more floating-point computing power than the CPU and was originally used to process matrix data such as images, it is very suitable for the calculation of data in the machine-learning field. In the early days, when Andrew Ng’s team took the lead in using the GPU for machine learning, many people did not understand. But today it has become mainstream.
But the most impressive anecdote is still about a search engine.
Baidu’s focus on artificial intelligence caused some incomprehension in the past. Why does Baidu show a special preference toward artificial intelligence instead of countless popular fields from PC to mobile-Internet applications such as e-commerce, games, social media, and communication?
The answer to the question may be contrary to what many people believe. Instead of saying that Baidu chose artificial intelligence, we would rather say that artificial intelligence chose Baidu. This is the mission in Baidu’s genes. Failure to live up to this mission will be the loss of Baidu, China, and even the world.
Everything Comes from Searching
For general users, search engines are just a tool to help them to find the information they need; for websites that provide the content, search engines are mediums that help them to deliver their content to users in need. In this process, the search engine first “listens” to the user’s needs; that is, it determines what the keywords typed in the small search box want to find. Next, search engines “retrieve” a large amount of content and pick out the results that best meet the requirements.
Let’s examine this process. Is it very similar to the deep-learning model? Inputs and outputs are here, and even every search can be seen as a training exercise for search engines. So, who tells the search engine that the results shown are good or bad? The user does. The user’s click is an answer. If the user does not click on the top results, but clicks on the result of the second page, this is a demotion of the recommendation of the system.
In this process, the search engine not only improves the accuracy of the recommendation but also knows more about hits and misses of the included web pages and gradually learns to distinguish web pages like humans. Initially, it could only read page elements such as titles, keywords, descriptions, etc. But, now, search engines like Baidu can identify which is hidden false information, which are advertisements, and which is truly valuable content.
The action of people getting information through search engines is the process of dialogue between people and machines. Unlike previous human-computer interactions, this process is based on natural language. Compared with image recognition and speech recognition, natural language processing (NLP) is the core technology of search engines.
Wang Haifeng believes that the ability to think and acquire knowledge has made today’s human beings. This ability needs to find objects and methods of thinking through language and is externalized as our ability to see, listen, speak, and act. Compared to all these abilities, language is one of the most important characteristics that human beings use to distinguish themselves from other creatures. Visual, auditory, and behavioral abilities are not limited to human beings but belong to all animals. Many animals even have better visual, auditory, and behavioral abilities than humans, but language is unique to humans. Summarization, refinement, inheritance, and thinking of knowledge based on language are also unique to human beings.
From the beginning of human history, knowledge has been recorded and transmitted in the form of language, and the tools used to write language are constantly improving: from oracle bone to paper and then to today’s Internet. So, both Baidu and Google believe that natural-language processing is a very big challenge for the future of artificial intelligence. In contrast, speech recognition, such as speech to text, or text to speech, is actually a process of simple signal conversion, but language is not like this—it involves human knowledge and overall thinking rather than simple conversion.
Projects like AlphaGo astonish ordinary people, and we think it is a big achievement. But actually, we can’t ignore its characteristics that are based on complete information, clear rules, and closed and specific space. The intelligent system trained to play Go is not good at playing chess. In comparison, the processing of natural language is a more difficult problem to solve. For playing Go, there is almost no uncertainty as long as the computing power and data are sufficient, but there are too many uncertainties in language problems, such as the diversity of semantics. For computers to “understand” and generate human language, scientists have already done a lot of work. Based on the accumulation of big data, machine learning, and linguistics, Baidu has developed a knowledge map; built a system for question and answer, machine translation, and dialogue; and acquired the ability to analyze and understand queries and emotions.
The knowledge map can be divided into three categories based on different application requirements: entity graph, attention graph, and intent graph.
In the entity graph, each node is an entity, and each entity has many attributes. The connection between the nodes is the relationship between the entities. At present, Baidu’s entity graph already contains hundreds of millions of entities, tens of billions of attributes, and billions of relationships, which all derive from a large number of structured and unstructured data.
Now let’s look at an example: someone searches for the ex-husband of Dou Jingtong’s father’s ex-wife. (Dou Jingtong, a.k.a. Leah Dou, is a Chinese singer-songwriter.)
The relationship of the characters contained in this request is very complicated. However, our reasoning system can easily analyze the relationship among entities and finally get the correct answer.
Baidu’s natural-language processing technology can also analyze complex grammar and even identify ambiguities in sentences, not just literal matches.
Let’s look at an example: Who is Liang Sicheng’s son? Whose son is Liang Sicheng? (Liang Sicheng was a Chinese architect.)
If we use traditional keyword-based search techniques, then the two queries will get almost the same results. However, through the analysis of semantic understanding techniques, the machine can find that the semantics of these two sentences are completely different, and correspondingly, completely different answers can be retrieved from the knowledge map.
Consider a third query: Who are Liang Sicheng’s parents? Literally, this is different from the second query, but through semantic understanding, the machine can find that the two sentences are looking for the same object.
Deep-learning technology further enhances natural-language processing capabilities. Baidu has applied the deep neural network (DNN) model to search engines since 2013, and, so far, this model has been upgraded dozens of times. The DNN semantic feature is a very important feature in Baidu search. In fact, not only have the relevance of search results become higher, but also the chapter understanding, focus perception, and machine translation have been greatly improved.
The technical foundation required for searching is also required for artificial intelligence. For example, in terms of cloud computing, Zhang Ya-Qin, former president of Baidu, believes that searching is the largest cloud-computing application. Without the cloud, there would be no way to search. Baidu was born in the cloud.
Search Engine Continues to Evolve
With the rise of mobile Internet and artificial intelligence, the form of searching has changed a lot. For example, the search portal has changed. In addition to the search through the web search box, searches based on different platforms and hardware are also increasing, and speech sound or image search replaces part of the text search. While people are actively searching for information, information is also recommended to those who need it. Many people judge things from their appearance and think that this process is a challenge to search engines. But Wang Haifeng believes that developers of search engines have been simultaneously aware of this changing process.
Many Internet companies agree that “information looks for people.” However, people looking for information and information looking for people, or search and feed, are not one or the other, but complementary. They play different roles in different scenarios at different times, but they perform their own duties and cooperate with each other. For example, when you need to look for some information, sometimes your friend makes a recommendation, and sometimes the system guesses your preferences and makes a recommendation. Suppose someone recommends an article to you. When you find a word that you don’t understand well while reading it, you need to search to find the meaning of that particular word. Of course, the machine will also guess which words users may be interested in. When a piece of content is not so popular, you have to look for it in the search engine. In different scenarios, the user’s needs for search and feed will be converted to each other, and how to judge these scenarios is a test for the system’s intelligence. The more data and technology reserves there are, the better it may perform.
With technical reserves and data, it is technically not difficult to make a feed. But it is more difficult to start from feed and make up for the lack of search and data. The Baidu search engine collects and analyzes hundreds of billions of web pages. Such large-scale data is the necessary guarantee for Baidu to continuously improve the performance of its feed products.
Search engines continue to evolve with regard to data torrents, and feed is just the next step. Eventually, a ubiquitous search engine plus a recommendation will form. More and more intelligent machines can draw inferences about other cases from one instance. In the end, users will only say a few words, and the machine will know the whole meaning that the user wants to express. In addition, machines can automatically analyze the user’s location, identity, habits, etc., and by using this information machines can easily determine which search results should be provided to users. In the future, in many cases, we won’t have to start a search. A search-engine-based feed will guess and push the information we need. Imagine, for example, when having a meal at a restaurant, the search engine has inferred the user’s next arrangement based on the user’s previous search content. Even if the user has not asked, the search engine will voluntarily collect the information that the user might need later, such as what movies are currently on screen, where is the nearest movie theater, etc. Baidu has tried this idea in some of its products. Information of less interest does not appear in the feed, but will be stored reasonably, like an invisible library for users to explore later. Intelligent search engines are growing with us.
Searching Is the Largest Artificial-Intelligence Project
Search engines work without stop, like a mirror image of a human’s learning spirit, collecting and processing large amounts of data at all times, grabbing pages and content on the entire Internet, visiting everything, including e-commerce, social media, and news portals.
The search engine is a seeder, a laboratory, and a digital collider. Combining speech recognition, image recognition, and machine translation, it can collect more valuable data through the actual use of a large number of users, which in turn helps the neural network to optimize the training effect and to form a benign development loop.
The development of natural-language processing technology will bring more surprises in the future. Writing financial and sports news within a certain format will be possible; even in literature, the poetry of the Tang Dynasty written by the machine will be like the real poems. When watching basketball and football games, the commentary robot will not only quickly report the game but also answer many people’s questions at the same time. This is a bit like the smart program Samantha in the sci-fi movie Her, who falls in love with countless people at the same time. Love is probably the deepest language, thought, and emotional communication of human beings. Samantha is a high-level symbol of natural-language processing technology, depicting the deep relationship between humans and machines. Perhaps in the future, search engines will be like Samantha and exhaust symbolic information to break into the gap between language and meaning, which is beyond human imagination.
Strictly speaking, artificial intelligence is a kind of physical work, and it must have enough physical strength to withstand such huge data and calculations. In general, for colleges and universities or smaller Internet companies, the threshold of data volume and hardware cost limited the development of artificial intelligence a lot. Even if we exclude the purchase cost of hardware such as the CPU and GPU, the cost of running the hardware is high. AlphaGo costs $3,000 for electricity in one round of a Go game. In addition to traditional servers, bandwidth, and other infrastructure, Baidu now has hundreds of GPU servers that support artificial intelligence. Up to sixteen GPU cards can be installed on the highest-configured servers. On the basis of all this, the data reserve, hardware foundation, market scale, and talent team are coordinated to maximize the advantages. What we pursue is not the one-time gains, but the largest and most basic artificial intelligence platform, which works hard to help human beings achieve the “know more, do more, be more” initiative.
It can be said that artificial intelligence is an intrinsic appeal for the companies like Baidu and Google, and it is also the appeal of the Internet, mobile Internet, and data explosion itself. It is very difficult for other domestic companies in this field to compete with companies like Google and Microsoft with such large-scale advantages. It is the unshakable responsibility of Baidu to establish an infrastructure and talent base.
To pass the artificial-intelligence torch to more people, create real value, make life better, and make the nation stronger gives Baidu workers motivation and a reason why Baidu can assemble many artificial-intelligence scientists.
Lin Yuanqing originally studied artificial intelligence in the NEC American laboratory. The outstanding conditions, atmosphere, and academic power of NEC helped him focus on research and publication. But he finally left the familiar environment and chose Baidu. He says the most important reason is that as an artificial-intelligence researcher he feels it is a very important part to practically apply the deep-learning technology. At present, there are more than 700 million Internet users in China, and more than 1.2 billion mobile phone users, which tops the world. How can we let users enjoy the changes brought by artificial intelligence and participate in it? The value of this exploration can affect the lives of all people in China. He eagerly felt that “this is the best moment, the most promising opportunity for artificial intelligence. It is a pity to miss it.”
But artificial intelligence never rests. When human beings are sleeping, they are still rushing in the machine world, and in the endless cycle, they will succeed and fly to the world!
Here I want to end this chapter with a passage from a famous philosophical professor, written in the 1990s:
In heaven, humans are not human. More precisely, humans have not been placed on the road of human beings. Now, I have been thrown out for a long time, flying through the void of time in a straight line. In a deep place, there is still a thin rope that binds me, and the other end is connected to the paradise where the clouds are covered in the distance. The individual soul is not her own choice, but the thin line from paradise is tied to her, controlling her body. It is impossible for Violica to find a passion for life. She can only find her life enthusiasm from herself, which means to discover the thin line that binds her body to the shadow. The thin line from the paradise determines the life direction of Violica’s body and the burden of the individual soul, and makes her feel the individual destiny. The so-called individual destiny is nothing more than a person feeling that only such a passion for life can let her have a feeling of a beautiful life, and then she has the happiness of her own life, so that she must live like this.4