Third-Wave AI

I decided to join the fray. I put together a plan for a new artificial intelligence engine to detect campaign finance fraud and investigate privacy. It’s a human-in-the-loop system that automates the process of discovering new investigative story ideas. Like many AI projects, it works beautifully, but it also doesn’t work well. Looking at how the project was built is a way of gaining insight into why AI is wonderful and useless at the same time.

Some investigative stories are like shooting fish in a barrel, and these are the perfect stories on which to deploy AI. To use a computer to find a story, you first need to be fairly sure there’s a story to be found. Stories abound in pools of money. Whenever you have a big pool of money, there’s inevitably someone trying to steal it. Hurricane recovery, economic stimulus packages, no-bid contracts: if you want to find someone up to no good, it’s usually easy to find them lurking wherever there’s a lot of cash.

The big pile of money that is federal political campaigns always attracts a few bad actors. In general, politicians are excellent stewards of public funds and are devoted public servants. Sometimes, they are not. In general, the sentiment among data journalists was that we should keep a close eye on campaign funding in advance of the 2016 presidential election.

I had built AI software for the textbook project I mentioned in chapter 5. I was curious to see if I could apply the software, which I called a Story Discovery Engine, to a different context. In tech, we talk a lot about the value of iteration—building something, then rebuilding it even better. I wanted to iterate. I had built a tool that visualized problem sites in one school district. Could I build a tool that would visualize problem sites in the ultimate district, Washington, DC?

With the generous support of a grant from the Tow Center for Digital Journalism at Columbia Journalism School, I decided to develop a new Story Discovery Engine focused on campaign finance. The tool would let reporters quickly and efficiently uncover new investigative story ideas in campaign finance data. The previous engine was built around the idea of helping reporters to write one story—about books in schools. This time, I wanted to build something that would help reporters find a wide variety of stories in a general topic area. I wanted to build a bigger system, and I wanted it to automate more of the grunt work of investigative journalism. It was well in advance of the election, so there would be plenty of time to build the tech and roll it out and use it for reporting on the election.

I had heard a lot about dark money and super PACs in the wake of the 2010 Citizens United decision, but I knew there was a vast amount I didn’t understand about this complex system. Like public education, campaign finance was a complex bureaucratic system with abundant data. It was a good test case to develop a new engine against. I wondered: Could I identify lawmakers who weren’t following their own rules?

I started with a design-thinking approach. In other words, I talked to people who knew a lot about the thing I wanted to do and was guided by what they said their world was like. I talked to experienced reporters and campaign finance experts. I interviewed a diverse range of campaign finance data experts: journalists, Federal Election Commission (FEC) officials, lawyers, people who run campaign finance watchdog groups. Particularly helpful were the designers and developers who worked for 18F, the government’s rapid-response technology team.

While I was building my tool, 18F was building a new user interface for the outdated FEC website. FEC.gov is the primary distribution channel for all US campaign finance data. It’s traditionally been difficult to navigate and thus difficult to understand. The new interface, rolled out gradually, made information more visible. However, it didn’t surface exactly the information that journalists would need to find stories. Instead, it was focused on simple and effective distribution of FEC data (a noble goal). Therefore, I focused my efforts on designing an interface that would do for journalists what 18F’s new website didn’t. My most essential informant was Derek Willis at ProPublica, a journalist who (probably) knows more about campaign finance data than people who work at the FEC. Willis, who has been reporting on campaign finance for decades, has created a full slate of helpful automated tools for campaign reporting: OpenElections, Politwoops, and others. His work is so good that there’s no point in redoing it. I wanted to make something in the margins, something that would add to the discovery tools (like Willis’s) out there but would make the reporting process faster. In addition, I read. The most challenging part was reading hundreds of pages of US Code and FEC legislation and policies. I took notes on the common themes that emerged. I paid close attention to the vocabulary that people used.

The first step was designing the architecture of the system. Software has underlying architecture, just like buildings do. The Story Discovery Engine is an AI system, but it doesn’t rely on machine learning. It comes from a different branch of AI programs called expert systems. The original idea, back in the 1980s, was that an expert system would be like an expert in a box. You would ask the box a question, like you ask a question to a doctor or a lawyer, and the box would give you an informed answer. Expert systems never worked, unfortunately. Human expertise is too complex to be represented in a simple binary system (which is what computers are). However, I decided to hack the expert system idea, and turn it into a human-in-the-loop system that ran based on rules taken from reporters’ subject matter expertise. It worked well. I didn’t make a box that told me the answers, but I did make an engine that helped me as a reporter to find stories faster.

I decided that the rules of the new Story Discovery Engine would be determined by the rules of the real-world political system. This was both a smart decision—because I wouldn’t have to create new computational rules myself—and a flawed one because the rules for campaign finance in the United States are of Talmudic complexity. I’ll attempt to cover them briefly: Each candidate for federal office has an authorized committee. Individual citizens are limited in the amount they can give to individual candidates through an authorized committee; this limit is currently $2,700 per election. Other political action committees (PACs) may fundraise and may donate to candidates’ committees. There are also limits to what PACs may say and may give. Super PACs, or independent expenditure-only committees, may raise and spend unlimited funds on behalf of a candidate. However, they may not coordinate such spending with the candidate or the candidate’s official committee. Other groups of interest are leadership PACs, Carey committees, joint fundraising committees, 527s, and 501(c)s, all of which collect, spend, or engage in electioneering on behalf of or in opposition to one or more candidates. Committees and PACs are required to report their expenditures and receipts to the FEC. 527s and 501(c)s are required to report to the Internal Revenue Service (IRS).

Say what you will about American government bureaucracy, but it truly is well-suited to database modeling. Bureaucracy is a byzantine maze of rules and regulations, carefully spelled out. Fraud, or at least shenanigans, happen in the cracks between the rules; computer code is a giant set of rules. Therefore, if we get a little creative in how we express the rules computationally, we can efficiently model how things are supposed to work in campaign finance. Then, we can figure out where to find things that went wrong. I put together a diagram that modeled the entities and the relationships between them. The entities became objects.

Campaign finance fraud is a useful phrase; it’s only the top level, however. There’s actually very little fraud in campaign finance because very little is illegal anymore. In the 1970s, the United States put in place very strict limits on how much candidates could raise and spend and from whom they could raise money. Landmark decisions since then have rolled back those limits. In 2002, the Bipartisan Campaign Reform Act made it so that the limits on contributions to federal candidates and political parties increase every few years. The 2010 Citizens United decision made it so that outside groups like super PACs can raise and spend unlimited funds on behalf of candidates, so long as the super PACs don’t coordinate these expenditures with candidates. Another 2010 decision, Speechnow.org v. FEC, removed restrictions on what outside groups like 527s can raise. These groups now merely need to disclose their donors. In 2014, McCutcheon v. FEC removed the upper limit on what an individual can give to candidates, political parties, and PACs combined.² A full explanation of campaign finance is beyond the scope of this book, but I highly recommend reading the website of the Center for Responsive Politics, which offers an excellent primer on what laypeople need to know about campaign finance.

After I talked to all the experts, I extracted the common elements from the conversations. All the experts had certain types of anomalies that they looked for when they were looking for (or at) campaign finance shenanigans, certain red flags that kept coming up over and over again. Administrative overspending was one such red flag. To understand administrative overspending, we need to start with a definition: All political committees are technically nonprofit corporations. Unlike regular nonprofits, however, political committees file financial reports with the FEC rather than the IRS. At every nonprofit corporation, some of the organization’s money is spent on its purpose, and some is spent on keeping the organization running. The purpose-driven expenses are called program expenses. The internal expenses are called administrative expenses. At a political committee, program expenses might be electioneering expenses: the costs of buying television, print, and digital ads; the costs of purchasing yard signs; or donations to political candidates. Administrative expenses are expenses like salaries, office supplies, or the costs of organizing a fundraiser. The ratio of administrative expenses to overall expenses is a measure of health for any nonprofit. Many people use this ratio to evaluate whether the nonprofit is well-run when they’re deciding who to donate to.

Another example of something to look for is a vendor network. Let’s say that candidate Jane Doe is running for president. Supporter Joe Biggs wants to give a million dollars to Doe’s cause. Donated money doesn’t go directly to the candidate, remember; it goes to the candidate’s primary campaign committee, Jane Doe for President (JDP). However, Biggs can’t give a million dollars to Doe’s campaign committee; the personal donation limit is $2,700. However, Biggs is welcome to give that million dollars to a super PAC, Justice and Democracy Political Action Committee (JDPAC), which can spend the money however it likes in order to get Doe elected. JDPAC spends Biggs’s money on what are called independent expenditures. The catch with an independent expenditure–only group (like a super PAC) is that it can’t coordinate with the official campaign committee at all—so JDPAC is prohibited from coordinating its efforts with JDP.

Now, let’s say that JDP hires a graphic design firm in Wichita to create its campaign ads. That firm’s name, Wichita Design, will show up in JDP’s spending reports filed with the FEC. Let’s also say that JDPAC happens to hire the same graphic design firm. This will also show up in JDPAC’s spending reports filed with the FEC. It’s possible that there’s no coordination. The graphic design firm could have really good industrial hygiene: it could set up a firewall internally and educate its staff that there isn’t supposed to be any coordination, and it could do a good job of keeping the two accounts separate. This is entirely possible, and it’s legal and appropriate. It’s also the case that many, many committees use the same vendors for ordinary tasks. There are only a limited number of payroll-processing companies in the United States, for example. Most campaigns and outside groups use ADP for their payrolls, and it’s not a story. However, it’s equally possible that there’s coordination happening at the vendor level. Therefore, if a journalist can easily see that JDPAC and JDP are using the same graphic design firm in Wichita, which happens to be run by Jane Doe’s college roommate, the journalist definitely is going to follow up and see if there is illicit coordination happening. It’s quite likely to be a story.

It’s traditional to give a software project a name, like naming a pet. It provides a shared reference point that the people on the project can use. I decided to name my project Bailiwick. Bailiwick has two definitions, according to Merriam-Webster: “the office or jurisdiction of a bailiff” or “a special domain.” Both definitions seemed to fit, especially since a bailiff is “an officer in a court of law who helps the judge control the people in the courtroom.” I imagined that my program would fulfill the role of the tall, bald bailiff named Bull or the wisecracking bailiff Roz on the 1980s TV show Night Court. It would carry documents back and forth between the people and the data, and it would serve a quasi-official function as an intermediary. I also liked that the word bailiwick sounded kind of cute and playful. In my world, anything that can make campaign finance data more playful is entirely welcome.

On a more practical note, a software application must have a name because you have to put it into a directory on your computer, and that directory has to have a name. It’s essential to pick the name at the beginning, just as with a baby. On the other hand, if you name your baby Joseph and decide two days later that you want to call your baby Yossi, you just start calling the baby Yossi and write “Yossi” inside his t-shirts. With a computer program, if you change the name of the base directory, you can create major headaches inside your code.

Which brings us to the development process. Some of the difficulties I faced during the project are emblematic of the kinds of ordinary challenges that arise during any coding project. For example, I decided to hire someone to help me code because my deadline was tight. Hiring a developer is not unlike hiring a lawyer: The good ones are insanely expensive. They are also hard to find, because they don’t advertise. They don’t need to. There are a few directories, sure, but for the average person it’s quite difficult. I searched online for “hire Django developer.” I got a big pile of garbage. Here’s a sample of one of the search results:

Searching online for a developer was just too hard. Instead, I found myself working my personal networks for recommendations. Online hiring for professional services is an example of an area in which technology was supposed to make things easier, but it actually made things harder. The algorithmic layer on top, which can be manipulated for profit, interferes with the average individual’s ability to do something simple like find a software developer. The same problem arose when I tried to find a handyman to fix something in my house. It reminded me why curation is so useful. In an online world in which everyone is supposed to find their own truth, it can sometimes take forever to do simple things. The paradox of choice can be a burden.

Regrettably, I found myself in the same position as the nineteenth-century mathematicians who needed more human computers and couldn’t find them. I wanted to hire an entire team of women and people of color. I worked all my networks; it was far more difficult than I anticipated. I talked with a developer who runs her own shop who is a woman of color; I couldn’t afford her. Nor could I afford the heavily discounted services offered by a friend’s software firm. Eventually, I hired a woman and three men, all independent contractors, bringing the project total to a 2:3 women:men ratio. On a small team with a very close deadline, that would have to do.

It’s an open secret in project management that nobody knows how to estimate time for a software project. Part of the problem is that writing computer code is more like writing an essay than like manufacturing. Original code hasn’t been written before, so there’s not really a good way to estimate how long it will take to make it—especially if the code is intended to do something that hasn’t been done before. Another problem is that people, not machines, write code. People are bad at estimating time and effort: they go on vacation; they spend the afternoon messing around on Facebook instead of programming. In short, they are people. They are variables, not constants.

Representing the complex relationships of the campaign finance world in a simple, easy-to-discover manner was a challenge. I worked with a user-interface expert, Andrew Harvard, who designed a set of pages that allowed reporters to efficiently organize and sort through the information that mattered to them. State reporters generally care about finding stories relevant to races in their states. National reporters generally focus on the presidential race plus key state races. Regardless, the system lets you select which races and candidates you care about. These are shown in the favorites list when you log in. Figure 11.1 shows what a reporter sees if she has favorited 2016 presidential candidates Hillary Clinton, Donald Trump, and Bernie Sanders. Clicking one of the names takes you to a page that shows a candidate. Each candidate files a series of financial reports with the FEC. A reporter can use Bailiwick to scroll through and read the individual financial reports or see the financial report totals organized in a convenient way.

We tend to think of donations as being for and against. However, the campaign finance laws divide the donations into groups. Remember that there are authorized donations and independent expenditures? Bailiwick parses these reports and organizes them into supporting and opposing groups. This saves time and effort. It’s also easy to skim through and see relevant names.

The inside and outside groups form the substance of a tree map, a common type of data visualization, that appears at the bottom of each candidate’s page. It’s hard to parse numbers; it’s far easier to see patterns in the data when each expenditure is grouped into categories. In a treemap, each category is a rectangle. The relative size of each rectangle matters, as does the number of donors and the total donation amount. I can click any rectangle to see more detail. As of the inauguration, Great America PAC was the group that spent the most in independent expenditures (see figure 11.2). Clicking in, we can see that this donor spent $12.7 million dollars in support of Trump’s campaign, in dozens of separate transactions over the course of the race.

Data visualizations often trigger story ideas. For example, the first time I saw the tree map of Trump’s campaign committee spending pattern, I saw there was a fairly large rectangle devoted to hats (see figure 11.3). As of December 2016, the campaign had spent $2.2 million on hats from a company called Cali-Fame (see figure 11.4).

I didn’t know anything about Cali-Fame in the fall of 2016, but it seemed to me like there might be a story in Trump’s spending on hats. Reporter Philip Bump had the same idea. On October 25, 2016, he published a story in the Washington Post titled “Donald Trump’s Campaign Has Spent More on Hats than on Polling.”³ Not only that, but the Trump campaign also spent $14.3 million on t-shirts, mugs, stickers, and freight, all with a single company, Ace Specialties LLC, which specializes in workwear for the oil and gas industry. The company owner, Christl Mahfouz, is on the board of the Eric Trump Foundation.⁴ Does this mean anything? I don’t know. If I were a reporter on the political beat, that would be another story idea to run down.

Andrew Sheivachman, a reporter for a travel-industry site called Skift, had a different perspective. He used the tool to develop a story called “Clinton vs. Trump: Where Presidential Candidates Spend Their Travel Dollars.” In it, he analyzed how Trump was using campaign funds to pay his own company, TAG Air, for campaign travel.⁵ This is not illegal, but it is notable. It’s also an opportunity to talk about the many things in campaign finance that are legal, but perhaps not appropriate. The only way we’re going to launch a public conversation about these issues is by telling stories. Telling stories is how we understand the world. There aren’t easy answers. We need a public conversation, a conversation that includes diverse voices, to resolve these questions in a democratic manner.

The Story Discovery Engine is a human-in-the-loop system rather than an autonomous system. The difference between a human-in-the-loop system and an autonomous system is like the difference between a drone and a jet pack. The difference matters for effective software design. If you expect the computer to do magical things, you’ll be disappointed. If you expect it to speed things up, you’ll be fine. This attitude of preferring humans assisted by machines is catching on in the $2.9 trillion US hedge fund business, which has always been on the cutting edge of using quantitative methods. Billionaire Paul Tudor Jones, head of Tudor Investment Corp, famously told his hedge fund team in 2016: “No man is better than a machine, and no machine is better than a man with a machine.”⁶

A more general way of thinking about how the tool works is that it surfaces the difference between what is and what should be. What should be is that a group’s administrative expenses should be less than or equal to 20 percent of its total expenses. What is, is whatever percentage of the annual expenses are categorized as administrative according to financial-reporting documents filed with the FEC. If there’s an anomaly—if the administrative expenses are greater than 20 percent—then there’s an opportunity for a story.

Note that I say opportunity. There isn’t definitely a story, because there can be a perfectly good reason for having a large amount of administrative expenses in any given quarter. We don’t want to create a machine that says there’s a 47 percent chance that a political group is acting unlawfully because its administrative expenses are 2 percent higher this month than last month. That would be absurd—and possibly libelous.

Often, when I talk to computer scientists, they suggest looking at the five highest results, the five lowest results, and the average values in a dataset. This is a good instinct, but it isn’t always interesting from a journalistic perspective. Let’s say that we pull a list of salaries for employees of a school district. The five highest-paid employees are likely to be the superintendent and the highest-ranking executives. The five lowest-paid employees are likely to be nonunionized, part-time employees. This isn’t news. It might be surprising or mildly interesting to someone who hasn’t seen a lot of salary scales, but that’s different than being newsworthy. In journalism, we have an obligation to be both accurate and interesting to a mass audience. Computer scientists have the liberty to be interesting on a smaller scale to a highly trained audience (which is something that makes me eternally jealous). The threshold for interestingness is radically different in each field.

If I was going to look at groups that had large administrative expenses, I would probably look at the ones that had really large administrative expense percentages first. The outliers are the low-hanging fruit. I would look at the groups with the highest percentages and the lowest percentages, and I would see if there was something interesting there.

I made one major modification to the Story Discovery Engine. When I tried to explain the textbook engine, people often asked me, “You mean that you made a machine that spits out story ideas?” I explained that it wasn’t a machine that spit out story ideas, that it was subtler, and I talked about automation. Most people’s eyes glazed over at that point. So, for the second Story Discovery Engine, I decided I would try to make an actual machine that spit out story ideas. Figure 11.5 shows what the feature looks like.

I should specify that unlike other features, the story ideas feature is a minimum viable product (MVP). It works, and you can see an actual result—but only for one case, not for all the cases that we planned. We say this very specifically in the documentation. It works well enough for me to feel confident claiming that it works; from my perspective as a developer, it’s a solved problem. But in software, things can work without really working well. It’s not a binary situation. A person can’t be a little bit pregnant, but a software program can be a little bit functional. The point of an MVP is to get the product working well enough that you can demonstrate it to people and get customers or get funding for the next round of development. This isn’t good design, nor is it good practice, nor is it good for users to have half-functional pieces of software out there in the world. However, it’s evolved to be standard practice. I think we can do better. The problem, most of the time, is the problem I ran into with Bailiwick: the team ran out of money, and thus development time, before we could finish the story ideas feature.

Here’s another example of a problem that’s quite typical for the development process, but can have wide-ranging effects unless caught. One day, my code was throwing an error that I didn’t understand. I decided to make a new database and test the code by loading in all my 3.5 million records again from scratch. It worked great for the first ten seconds—then I got a different error. I fixed something that I thought was the problem. Then, I tried to load the data again. It didn’t work. I changed something else that I thought was the problem. That made it worse. I switched back to the first database and tried to recreate the error. I got a totally different error. I realized I wasn’t going to be able to fix the first database—ever—so I switched over to the second database for good. I felt bad; the other people on my team were using the first database, and by having it available but corrupted I was getting in the way of other people’s ability to code. It was a mundane version-control problem, but because precision matters in computation, the errors I caused were probably creating a cascade of other inexplicable, frustrating errors for other people.

These are the kinds of obstacles that get in the way of adopting technology in the newsroom. By working through the obstacles on a small scale, it’s possible to see how a large-scale effort could work. It’s also possible to see why the large-scale effort might be failing. We can also see why writing code is not the kind of thing that can be accomplished in an assembly line. There’s the factory model—with the assembly line—and the small-batch model. In a factory, you look at all the tasks and decide which ones can be automated and are repeatable. In small-batch production, you do the same thing—but you still do some stuff by hand. Think of computational journalism as being like the slow-food movement.

Thus far, the impact of the tool has been small but mighty. I don’t track how many reporters use the tool for stories, but I do use the tool regularly in my classes. I teach about thirty students every semester. This means that at least sixty stories a year can be produced out of the tool. This isn’t bad for what it cost to develop the tool. If the tool were used regularly in a newsroom, each story written out of it could generate revenue via the advertising placed next to it on the page. It’s never going to be a blockbuster source of revenue, but it would be a drop in the bucket. The tool wouldn’t generate as much money as mass-produced assembly-line products, but it would be a revenue-generating, artisanally produced product.

For now, my campaign finance tool doesn’t generate any money. In financial terms, it doesn’t have a path to sustainability. Bailiwick has value as a teaching tool, as a model for investigative projects, and as an example of applied research (meaning “the opposite of theoretical research”) in computational journalism. Much to my chagrin, that intangible value doesn’t help with the $1,000 a month it costs to keep Bailiwick’s servers running. This is another secret of the tech world: innovation is expensive. If I’d known that the project would cost this much, I might have made different choices along the way—but because nobody had created this type of software before, there was literally no way to forecast expenses. I had a kind of blind spot when it came to the project’s operating expenses. This is the kind of blind spot that almost always happens when you build new technology: you need to have faith that you can invent the thing you’re trying to make and have faith that the financials will work out. Engineering is sometimes a thrilling leap into the unknown.

11 Third-Wave AI

Notes