8
Are Robots and AI the Future of the Media?

In the fall of 2016, Komoroid and Otonaroid, two presenters in Japan, surprised their audience. Seemingly human and feminine, the two “journalists” were in fact robots who came to read the day’s headlines at a conference at the National Museum of Science and Technology in Tokyo. One of them even stammered. These two androids were installed in order to collect human reactions when faced with machines, not to replace a television presenter.

That being said, in early 2018, the robot Erica was presented as the next star of a TV news program in Japan. According to the Wall Street Journal (Bellini 2018), the 23-year-old metal-made journalist, as per her creator Iroshi Ishuguro, director of the Intelligent Robotics Laboratory in Osaka, should be the first robot on the news. Iroshi Ishuguro even believes that his robot, which can express simple emotions, will soon have an independent conscience.

These examples illustrate the advances in the field of journalistic robots in recent years.

8.1. Robot journalists are already in action

On July 1, 2014, the Associated Press (AP) used the Wordsmith1 robot platform for the first time in order to write articles on companies’ quarterly financial results. Wordsmith then used the Zacks Investment database (Colford 2014). The AP explained that journalists had previously processed about 300 quarterly financial reports, but that the use of database processing and a robot could now allow 4,400 financial reports to be covered with short articles of 150 to 200 words. The AP points out that these are financial documents and that journalists remain the only ones with the ability to analyze these figures. The following year, in 2015, the AP extended this method of operation to sports information.

Furthermore, in 2014, the Los Angeles Times left its mark by publishing an article on an earthquake which was written automatically. The newspaper also uses robots to cover homicides. Robots collect the data, write the descriptive part of the article and forward it to the journalists who write the comments and analyses.

The Guardian, on the contrary, has robots writing longer articles for the weekend section called “The Long Good Read” (Moses 2014). The technology is based on two algorithms: one that extracts the data from the database, and another that uses this extraction to write an article.

During the 2015 regional elections in France, many sites such as Le Monde, France Bleu (Radio France), Le Parisien and L’Express used robots, or more precisely algorithms, in order to produce texts on the results city by city. One of the main advantages of these robots is that they can process a large amount of information in a short amount of time, which in this case was information on 36,000 municipalities, something that an army of journalists could probably not cover so quickly.

More than just raw data, the robot writers have produced short texts, which can also be referenced by Google. According to Syllabs, a French company specializing in robotic journalists (which obtained €2 million in 2018), the cost of two election nights varied on average between €20,000 and €40,000, in other words, well below the cost of employing journalists for several months.

Some newspapers have gone even further. In Sweden, since 2017, the MittMedia press group has been experimenting with robots in order to enrich its content. Robots have taken an important place in the newsroom with articles on sports competition reports, stock market information and real estate information. “They are even our most efficient employees!”, enthused Robin Govik, digital director of the group, who edits approximately 30 newspapers, as reported by Eric Scherer, Director of Forecasting at France Télévisions. Journalists would also be delighted to work with them. Robin Govik says:

“Before, they were the robots […]. With the 25% drop in the number of staff in Swedish newsrooms in recent years, the machines are producing the content that journalists have stopped providing but that people are demanding”.

“Until now we thought that journalists, traditionally conservative, hated robots and considered them unreliable. In fact, this is not true”, said Finnish journalist Hanna Tuulonen, a researcher at the University of Helsinki on information automation2.

Specifically, the machines operate in several segments: the automatic publication of information with little journalistic added value directly on the Web, or the sending of messages to the editorial staff, who then choose whether or not to use it (information on road traffic, etc.).

According to Eric Scherer, a study conducted by MittMedia showed that more than a third of readers did not see any difference, even though the articles written by robots were indicated.

These examples remind us that in the space of a few years, the robotization of information has developed considerably.

8.2. What is artificial intelligence?

Artificial intelligence (AI) will play an important role in the future. AI consists of using complex computer algorithms that mix large amounts of data to produce reasoning or learning (machine learning). This artificial intelligence can be incorporated into “machines” that are designed to perform tasks or provide assistance. Each of us already uses AI every day, such as in our mobile phones, which have a personal assistant that understands our voice, responds to our requests and can, in some cases, anticipate them.

Artificial intelligence can be “weak” or “strong”.

“Weak” AI is what we are currently and increasingly dealing with more and more: a car that drives instead of us, a refrigerator that orders missing products online instead of us, a personal assistant like Google Home in our house, etc. The examples will multiply with the development of connected objects.

Intelligence is said to be “strong” when it is not limited to reproducing human intelligence faster or with less error, but when it develops consciousness and emotions. Recent developments in algorithms are based on the concept of biologically inspired “neural networks”, hence the term bio-cybernetics, which is sometimes used. This evolution is also based on recent advances in quantum computing, which allows quantum computers to make calculations from the combinations of 0 and 1, just like a conventional computer that uses bits, as well as the superposition of 0 and 1 (called a qubit – a quantum bit). The superposition of layers multiplies computing capabilities. The possibility of a strong AI is discussed among scientists and raises major philosophical questions, as in Professor Jean-Gabriel Ganascia’s book (Ganascia 2017). Since the future can only be a source of multiple questions, publications and essays are multiplying, developing within the context of a technophilic future or a technophobic apocalypse. All raise the point regarding the threat to humanity, as well as the hope of a trans-humanity, going so far as to declare the “death of death” (Alexandre 2011).

The “weak” AI promises us a Smart Life; in other words, to help our farmers optimize watering, to help individuals manage their waste or energy consumption, to help a doctor read an X-ray without error or to make a diagnosis, etc. A smart and green life. However, in the background, there is also the concern about robots making jobs disappear (Decanio 2017), in addition to a life where journalists could be replaced by robots producing articles for them.

8.3. Research on automatic journalism

2014 was the year of robotic journalism, and also the year of research on “automatic journalism“ (Montal and Reich 2017). Although many articles on the role of algorithms in information production were published before this date, the research took a step forward following Philip Napoli’s seminal article (Napoli 2014).

The research was initially more interested in the consequences of the explosion of mass data and took up the problem of content creation through the prism of big data and data journalism (Le Cam and Trédan 2017; Lewis 2015; Lewis and Westlung 2015; De Maeyer et al. 2015; Trédan 2014). The algorithm is therefore a technical tool that facilitates the journalist’s work and allows them to say more interesting things about the quantitative data at their disposal, which them sometimes may have missed in the past due to the lack of easy and efficient analysis tools. The algorithm is often coupled with formatting tools that present digital information in the form of graphs or visuals that are enlightening, simplifying and attractive.

For Napoli, reflecting an article by Daniel Orr (Orr 1987) that presented the media as institutions with respect to their position in society, algorithms are institutions in the sense that they modify the production and consumption of information.

“Algorithms can, in many ways, embody the complex mix of human and non-human factors that is at the heart of the perspective of player-network theory on institutions” (Napoli 2014, p. 344).

8.3.1. From quantitative journalism to robot journalism

However, before the robots, there were several steps:

Mark Coddington (Coddington 2015) refers to the notion of quantitative journalism, that is, journalism that uses the processing of information in the form of data. Quantitative journalism is divided into three categories corresponding to three periods:

  • – computer-assisted journalism;
  • – data journalism;
  • – computational journalism.

The author retraces the main historical steps. Computer-assisted journalism is a “precision journalism” that was born in the 1950s with the arrival of office automation and the deployment of statistical tools for the general public. From the 2000s, the use of new technologies allowed the deployment of the data journalist, who mines3 data in digitalized numbers (big data) and transforms the information that has been excavated from the databases by statistical processing into journalistic writing. The journalist’s statistical ability gives him access to the new source of raw data.

The third step is computational journalism, which is seen as an evolution of computer-assisted journalism. This expression, which appeared in 2006, describes an increased significance of computer technology in the writing of articles, which nevertheless remain part of the initiative both in terms of the choice of subjects and in the editorial style of the journalist.

Sylvain Parasie, a researcher at the University of Paris-Est, and Eric Darigal, a researcher at the University of Paris René Descartes, speak about a “journalist programmer” (Parasie and Darigal 2012). The boundaries between the three categories (quantitative, computer-assisted and computational) identified by social science researchers are permeable because journalistic practices are permeable too. This permeability means that the general public, and even researchers, use different expressions in different ways.

This confusion of terminology is due to the rapid changes in editorial practices and journalistic practices, as shown in the ICFJ (International Center for Journalists4) study published in October 2017. This is a survey of 2,000 journalists in 130 countries. The disparities between countries regarding the role of technologies are huge. In 82% of U.S. or European newsrooms, 18% of roles are dedicated to digital technology and a large part of them are occupied by computer specialists who specialize in databases or professionals in community management or multi-platform content distribution, most often without journalism training. Digital newsrooms are also resulting in a team of significantly younger people.

The most recent step, the fourth step in the evolution of article production, is the arrival of robots or automated (or automatic) journalism. Researchers classify journalistic robots into three categories according to the tasks they are assigned:

  • – robot 1.0 without narrative creation;
  • – robot 2.0 with narrative creation. These first two versions exist and are commonly used in the press;
  • – robot 3.0, which currently still remains a hypothetical version, but one that computer scientists are considering: it would have a narrative capacity exceeding financial or sports comments as in the case of a robot 2.0. This implies, on the one hand, that the robot is able to decide to process a subject and, on the other hand, has the analytical capacity as well as the editorial capacity to process it; in other words, it has the ability to judge.

8.3.2. Do readers and advertisers enjoy articles that have been written automatically?

There is a lot of research that tests the eligibility of articles for the public. In line with the famous Turing test, tests are performed with audiences divided into groups, who are given articles that have been written by both journalists and robots. The subjects covered are the same, but only the authors are different. The studies are relatively recent and date back to around mid-2010.

Christer Clerwall (Clerwall 2014) thus shows that the articles produced by the robot are considered to be more descriptive, sometimes even boring, but they are also more objective than the articles written by journalists. Andréas Graefe, from the University of Columbia, and Mario Haim, Bastian Haarmann and Hans-Bernd Brosius (Graefe et al. 2016) from LMU in Munich on the contrary show that it is very difficult to differentiate between the two types of articles. Mario Haim and Andreas Graefe (Haim and Graefe 2017) are interested in the expectations of readers who believe a priori that articles written by journalists have a better style and are of better quality than those written by robots; contrary to what they imagined, readers see little difference when reading between the two types of articles.

Yair Galily (Galily 2018) recounts the evolution of attitudes towards “automatic articles” in the field of sport. The author refers to “backtracking” because although readers are seduced by the speed of automated information, they still appreciate finding comments from journalists with whom they are familiar. As a result, journalists’ personal blogs have been widely read, testifying to a “lack of humanity”.

8.3.3. The impact of robotization

Elisabeth Blankespoor and Christina Zhu, Stanford researchers, and Ed Dehaan, from the Foster School of Business at the University of Washington, offer an economic analysis of automatic journalism (Blankespoor et al. 2017).

More precisely, the authors show that automatic journalism allows an increase in the number of articles on companies and that companies receive benefits in the form of changes in their share price. For this purpose, the authors study the case of Inventure Foods. They analyzed the company’s financial results in 2015 after the Associated Press developed the first journalist robot system in 2014, increasing the production of articles on companies’ financial results from 400 per quarter to 4,000. The company’s case is interesting because it did not receive any media coverage until 2014. The authors monitored the evolution of the company’s market capitalization through the Zacks database5. The time between the publication of results and the publication of automatic articles in the AP news feed is 2 hours 30 minutes. Once the article is in the AP news feed, it is immediately used by redistributive information platforms such as Yahoo Finance. The authors then used RavenPack6, a content curation company, in order to verify that the financial information about the company was instantly published on CNBC, NBCNews.com7 and Investor’s Business Daily.

From this case, the authors extended their analysis to 4,292 listed companies between 2012 and 2015; 56% of them had press coverage within the AP before the transition to automated articles, which led the authors to refocus on the remaining 34% or 2268 companies. The results are an increase in trading volumes in corporate shares. Better known and more widely reported in the press, corporate shares have a more important place in investors’ strategies and are better valued. The study is in line with the extensive research on the relationship between media and finance, which shows that the media improve the functioning of financial markets by informing investors and exposing fraud and rumors.

8.3.4. What do human journalists think about it?

An important issue regards the reasons that drive a newspaper to adopt robot journalists and the impact of adopting robot journalists on “human journalists”.

Two South Korean researchers, Daewon Kim and Seongcheol Kim, interviewed 42 leaders in 24 media groups (Kim and Kim 2017). The two authors indicate that the first time the term “robot journalist” was used was in 1998. The concept of a robot journalist relates to that of a “computational journalist”. The difference between the two lies in the degree of autonomy. The robot journalist is a player that the authors describe as active in the production of information. Once the robot’s programming is completed, the robot continues the role which it was programmed to do autonomously, without human intervention. The algorithm is the key to the process, as well as the database in which the algorithm can be used as a raw material. As a result, the human journalist is no longer the only producer of information and articles. Robots are currently mainly used when databases are objective (such as statistics on sports results) and for articles that need to be published quickly. The choice of using them depends on the willingness to reduce costs (decrease or remission in the number of human journalists), the type of information in the newspaper (rapid information against information with analysis) and the resistance of active journalists.

The very rapid development of robot journalists is causing great concern, leading some, such as Noam Latar, to question the end of human journalism (Latar 2015).

The two researchers from the University of Korea, then analyzed the attitudes of 47 journalists working in 17 newspapers in South Korea (Kim and Kim 2018). There were three types of reactions from journalists. Using the terms chosen by the authors of the article, the first type of journalist suffers from “Frankenstein syndrome”, predicting a catastrophic future for journalists and the quality of information. The second type of journalist is called the “elitist”, in other words, one who thinks that the robot has abilities far below those of a human journalist and that it cannot replace them, except very marginally. The third type is “neutral”, in the sense that they can identify advantages and disadvantages with the arrival of robot journalists in the editorial offices. Human journalists’ opposition to the implementation of robots is decreasing between the three groups, but remains strong even in the latter case. The concerns among journalists follow on from other concerns already felt during the deployment of data journalism, as Sylvain Parasie notes (Parasie 2015).

Journalists’ concerns are all the more significant because they have been confronted with continuous technological innovations over the past 20 years.

Studies in this area are multiplying, reflecting the upheavals and concerns associated with the proliferation of robots in information production. Neil Thurman, Konstantin Doerr and Jessica Kunert, for example, interviewed 641 London journalists in several organizations such as the BBC, CNN, Trinity Miror and the Thomson Reuters agency, highlighting similar results by emphasizing that journalists are insisting on the necessary “human angle” in an article. Journalists also point out three consequences of the robotization of information (Thurman et al. 2017). Information is produced at a lower cost, at an accelerated speed and uses less journalist time (what some identify as a concern, which is related to the fear of a reduction in the number of journalists, and what others see as additional time to focus on other subjects or to investigate further).

Konstantin Dörr (Dörr 2017) discusses the ethical issues of automated journalism. While the author acknowledges that journalism has always been confronted with technological innovations, the arrival of algorithms and artificial intelligence is, according to the author, a different kind of change. Ethical issues related to technology are common, such as those related to automatic trading (robots that buy or sell financial shares on the stock markets). Dörr shows the specificity of the problem regarding the ethics of robot journalism by breaking it down into three sub-sets: media ethics, individual ethics (which he links to the attitude and morals of the journalist in his professional activity) and the ethics of the audience, which is part of the social sphere.

This creates ethical challenges for journalists. The main one is related to the shift in the focal point of responsibility within the media. With automated journalism, the human journalist is no longer the major moral player in the information production process. An essential subject that is not taken into account sufficiently is the undoubtedly growing relationship between the journalist and the coder.

In a 2017 paper, Jaemin Jung, Haeyeop Song, Youngju Kim, Hyunsuk Im and Sewook Oh, all researchers from Seoul, focus on analyzing the views of both readers and journalists regarding the work of robots (Jung et al. 2017). The paper recalls the strong criticism against journalists in South Korea following the scandal that was linked to a shipwreck in April 2014, when the public pointed to the media for their outrageous stories, in addition to accusations that they had partly hidden the truth. The credibility of the journalists was then questioned.

For their tests, the researchers selected an article on a baseball game completed by an algorithm developed in Korea, and another written on the same subject by a human. The study was conducted with 201 people: each person had to read both papers and rate them on several criteria (writing quality, clarity, credibility, etc.), without knowing the source. Verdict: the difference in perception is very small.

In a second step, another sample of 400 people was interviewed on both articles, but this time with the knowledge of who had written it (a human or a robot). The results here show that the public gives a higher rating to the article written by an algorithm when the article is presented as such. However, they are less positive about an article written by a robot, but which is indicated as being written by a journalist. Similarly, they give a high score to a paper made by a journalist but mentioned as being written by an algorithm. In other words, they prefer the work of “robots”.

The researchers also conducted the test on a panel of journalists (164 professionals). Contrary to their initial hypotheses (imagining a certain resistance of journalists to robots), the researchers also show that journalists give a higher rating to algorithms than to their peers.

Nicholas Diakopoulos and Michael Koliska (Diakopoulos and Koliska 2016) analyze the reasons why it is difficult for journalists to improve their understanding of algorithms. To do this, they interviewed about 50 people, professionals (28) and academics (22), who were used to working in an environment with robots in order to gather their experiences. The result is that the understanding of algorithms is limited by two phenomena: the reservation of information from manufacturers and the large volume of complex information to be processed by users of algorithms.

Kevin Hamilton (Hamilton 2014) talks about the need for research on “interactions with the invisible”.

However, the “robot” phenomenon goes beyond the media world.

Software suppliers, such as Arria8, Narrative Science9, Ax Semantics10, Retresco and Automated Insights initially created tailor-made and media-specific products. In order to increase their market share and boost their business model, software manufacturers have developed a consumer offer that is easy to access via web interfaces, APIs11 and plug-ins integrated into an Excel spreadsheet. The users concerned are in many economic sectors such as e-commerce and financial technology. For example, bank sites can install a media API to integrate with a news feed, such as the BBC’s (the BBC API is called JUICER12), that scans multiple free sources of information in order to extract a list of articles.

8.4. How do these editorial algorithms work?

Everything is based on NLG (natural language generation) technology, in other words, automatic text-generation software.

In an article in Science magazine published in 2015, Julia Hirschberg, a researcher in the Department of Computer Science at Columbia University, and Christopher Manning, in Language Department at Stanford University, point out that the NLG has made significant progress in recent years (Hirschberg and Manning 2015). Natural language processing uses computational techniques to learn, understand and produce content in human language. The first computational approaches to language research focused on automating the analysis of the linguistic structure of language, and developing basic technologies such as machine translation, speech recognition and speech synthesis. Today’s researchers are refining and using these tools in more complex applications, such as simultaneous speech-to-speech translation systems, the use of information contained in social networks to extract health or financial information, or the identification of feelings and emotions towards products or services.

Bhargavi Goel (Goel 2017) summarizes the progress of NLP in three main periods. NLP emerged in the 1940s with Turing’s work and saw the first period of structured research develop over the 1960s and 1970s. During this period, Joseph Weizenbaum of the MIT Artificial Intelligence Laboratory developed ELIZA, which simulates a conversation between a psychoanalyst and a patient. In addition, at MIT, Terry Wienigrad developed SHRDLU, which simulates a conversation with a computer regarding identification of various geometric shapes14. An important step was taken in the early 1980s with the development of algorithms that made predictions from data and the development of automatic learning. Automatic learning models are based on new probability processing models (hidden Markov models – HMM). The scientific advances of this period are particularly significant in Europe, in connection with the Eurotra research program that was funded by the European Community.

Research accelerated from the 2000s onwards, thanks to the development of the Internet and the digitalization of data, which gave access to an unlimited playground. Recent advances have focused on the ability to process multiple and complex data, in addition to taking the context into account. This requires powerful new statistical tools such as Word2vec, a two-layer neural network model developed by Google. There are five main NLP applications: information retrieval (IR), information extraction (IR) from one or more databases, question answering (QA) with a machine using a natural language, machine translation (MT) (which is the primary purpose of NLP) and automatic summarization (AS), which allows essential information to be extracted from a database. This technique involves providing the algorithm with abstraction capabilities, in other words, the ability to link information in the database to words that are not in the database.

Bhargavi Goel concludes his historical approach to NLP by making a prediction about the possibility of extracting a “permanent survey” of our political opinions or opinions about the products we consume.

“In the coming years, NPL will have a major impact on the Big Data economy. The technology would no longer be limited to enriching the data, but could eventually predict the future. From understanding, the technology would become predictive. Through the advancement of related technologies such as cognitive computing and deep learning, the NPL will offer a competitive advantage to companies in the field of digital advertising, legal, media and medical science services. Price patterns could be predicted and advertising campaigns assessed by data mining. It would become possible to predict the appeal and performance of candidates in elections by searching political forums. Social networks could be examined to find indicators of influence and power. Medical forums could be studied to uncover common questions and misconceptions about patients and diseases, so that the information on the site could be improved” (p. 27).

David Caswell and Konstantin Dörr, both from the University of Zurich, are interested in the latest forms of automatic journalism, in other words, when a robot produces an article that goes beyond writing a few descriptive lines based on simple data such as sports results (Caswell and Dörr 2017). While the research massively focused on readers’ impressions and showed that the reader generally makes little distinction between automatic articles and those produced by a journalist, as in the case of short articles, the researchers suggest changing the research framework by shifting the study of algorithm performance (do they work as well as a journalist can?) to the study of how databases are built.

In other words, algorithms are becoming increasingly efficient. However, the way in which they are supplied with raw materials has not been sufficiently studied. The issue that remains to be addressed is therefore that of the quality of the databases, which are now not only made up of texts, but also of sounds, images, videos, etc. This complexity makes them both rich and limited. An algorithm processes the data provided to it. The authors talk about algorithmic authority. Who provides the databases? How do the media choose the databases used? How about the platform they will use? The fields of analysis are therefore still vast.

Konstantin Dörr provides an economic description of the market for companies that offer NLG solutions (Dörr 2016). They are generally recent (the oldest one dates back to 2010) and, according to the author, of a limited number (the author counts 13 of significant size at the date of publication of the article, that is, 201715, in addition to the Chinese company called Tencent that is entering the market). The market is therefore not very competitive because the number of players is still small, but it is becoming bigger as companies are multiplying rapidly. Such an example is the recent company Urbs Media, with its remarkably ambiguous slogan: “written by a human, produced by a robot”. This company specializes in automated local information16 and is developing a new NLG tool with AP called RADAR (Reports and Data and Robots)17. It is financed by the Google Digital News Initiative program.

Lastly, researchers are interested in algorithms that are inaccessible because they are “proprietary”. Proprietary algorithms cannot be searched because they are the core of the software or platform developer’s business model. Algorithms are then highly protected and frequently modified, making them even more difficult to understand.

If artificial intelligence has not yet fulfilled expectations, the future remains open. The press must therefore accelerate its transformation and control its data in order to become a player that is capable of properly valuing its contribution to the digital economy.

Finally, to conclude, let us take up Eric Scherer’s deliberately provocative question (Scherer 2017): what if the media were to become intelligent again?