-A TINKER BY ANY
OTHER NAME-

August, 2008.

An enormous number of people had found their way onto “The List.” When people talked about being on a watch list, what they usually meant, though they likely didn’t know it, was either TIDE, the Terrorist Identities Datamart Environment, or the FBI’s domestic terrorist watch list. Although TIDE, maintained by the NCTC, had been repeatedly subject to newer cleverer acronyms, the main characteristics stayed constant:

 

1. You don’t want to be on it.

2. If you’re on it, you probably don’t know it.

3. There were many reasons you could be put on it and only a tiny number of reasons for you to ever be removed from it.

4. If you’re on it, you probably know others who are on it.

5. If you’re not on it, you probably know others who are on it.

6. There are a lot of people on it, and it will double in size before it gets a new acronym.

 

It was #6 that troubled those who worked with this list, not the fact that its name would change. The list grew too quickly. The problem was not in finding people to add to the list, it was in finding reasons to keep people off the list.

Within three years of the 9/11 atrocities, in not-so-fringe segments of society, on college campuses in parts of the U.S., and in much of Europe, it was, sad to say, entirely fashionable to empathize with the terrorists, even while fastidiously denouncing their tactics. Youths, Muslim or not, were finding that the “angry teenager rebelling against the establishment” phase of life that had only twenty years ago meant a spiked haircut and too much colored hair gel, had come to mean “rebelling against American values.” For too many, it meant at least empathizing with the world’s frustrations toward America.

Fortunately, these frustrations typically amounted to bouts of angst and even self-discovery, but rarely to any action. Unfortunately for anyone in the business of eavesdropping, it was typically all talk. This meant the triggers for the first level of alarms that were in place to mark a person as “interesting” were perpetually being tripped. Further, with the easy availability of all types of information on the Internet, too many details and trigger words were readily available for the fastidious rebel to stumble upon. These words, when said by a person who had just activated a level-one alarm, would set off the second level of alarms. The more well-versed and intelligently you spoke of your dissatisfaction and frustrations with America, the more alarms you triggered.

The result was that the watch lists were enormous and expanding rapidly. The gargantuan amounts of data on “these people” far exceeded the capacity to analyze it. Data collection inside the NSA, FBI, CIA, NCTC, and everywhere else, was not the problem. The problem was knowing what to do with it. Until that was addressed, the mounds of evidence that may or may not be contained in the data collected were left untapped.

Insights on how to tackle this problem came from, curiously, one of the academic-outreach programs that the NSA had sponsored. In exchange for funding to support a graduate student, professors would reconfigure their work, or at least their student’s work, to match the goals of the funding agency. In fact, such had been the case for Molly; her advisor’s funding had come directly from the Department of Defense, one of the largest funders of anthropological research.

image

At the same time that Harry Chaff walked into Professor Aore’s lab at Georgia Tech, the blue skies of the last few precious days between summer’s end and the start of classes swiftly turned to a dismal grey. Harry Chaff was the newly appointed Dean of the College of Computing and had been making his rounds of the faculty members’ labs, getting to know his department. Dean Chaff loudly and awkwardly introduced himself from the doorway and added, “I’d like to hear about the work you’re doing, is right now a good time?”

Had it been a few days later, when classes were in full swing, Professor Aore would have had the ready excuse of preparing for classes to avoid being disturbed. Instead, Professor Aore had no choice but to drop everything to accommodate Dean Chaff. Within ten minutes, he found himself demoing his latest research, trying to impress. Professor Aore, his students, and Professor Mikens from the linguistics department had been working on a new approach to online video recommendations. Based on a user’s personal video viewing history (for example, DVDs rented or online videos watched), their newly created system would recommend other videos that the user would enjoy. The system, though steeped in the well-studied graph-theoretic mathematical techniques of finding stationary points in large graphs, was constructed with a far more accessible goal: to sell their invention to a dot-com and reap a portion of the lucrative rewards that were so easily handed out in Silicon Valley.

Professor Aore started his demo to Dean Chaff with a few compelling examples. “Imagine you’ve just seen a few movies and want to find another one to watch.” He clicked on a few videos, The Break-Up, The Holiday, and Monster-in-Law.” Then he sat back and let his system take over.

The screen sputtered out a few cryptic lines of gibberish, at the bottom of which appeared the automatically recommended movie titles. Professor Aore happily explained, “We’ve figured out what your tastes are and we’ve recommended that you watch Bridget Jones’s Diary.” Professor Aore watched as Dean Chaff sat rocking slowly back and forth in his chair, expressionless.

There really wasn’t more to show at this point. Professor Aore tried to spell out the accomplishments for Dean Chaff, “To make this simple recommendation, we figured out that you’re most like a twenty-five to thirty-five-year-old female,” he said, pointing to some words in the lines of gibberish that had just scrolled by. “Then, we looked at the actors in these movies and matched them to similar actors.” He pointed to another line, “We looked at the movies’ genres, directors, viewers’ reviews, professional reviews, and then we synthesized all of this information to give you movies you would most like,” he concluded. Still nothing from Dean Chaff, just a slow back and forth in his squeaky chair.

Professor Aore waited expectantly before continuing. “Why don’t you tell me some movies you like, and we’ll see what it comes up with?”

To this, Dean Chaff responded quickly, with no hesitation, “In alphabetical order, my top three favorite movies are Apollo 13, Close Encounters of the Third Kind, and Contact.” Professor Aore hurriedly found the movies and entered them into his system.

A moment later, Professor Aore was reading the results aloud. “It decided that you are likely a male—88 percent probability, and that you are over forty-five years old—64 percent probability. It recommended 2001: A Space Odyssey for you.”

Dean Chaff nodded his head approvingly in time with his rocking. “I’ve already seen that one, though,” he replied. “What else?”

Professor Aore scanned further down the page, “Battlestar Galactica?”

The rocking and nodding was discernibly faster. “Really? The original or the remake?”

Professor Aore returned to his screen, trying to contain his quickly growing annoyance. The anger soon turned to alarm when he realized that nothing on his screen indicated which version of the movie was recommended. “Remake,” Professor Aore guessed, voice cracking as he failed to suppress his anxiety.

“Well done!” Dean Chaff replied enthusiastically. He was impressed enough to give it further thought—but not by the underlying mathematics or even the potential for a sale to a dot-com. Instead, he imagined an untapped pipeline carrying copious amounts of funding straight from the NSA to Georgia Tech’s College of Computing. “Does this thing work for anything other than movies?”

“Of course it does.” Professor Aore stated defensively, as if merely questioning the system’s broad usefulness was a direct insult cast upon the intellect of generations of his family. “The mathematics behind it are solid. Naturally, it’ll work on any type of data.”

“Perfect, I’d like you and Dr. Mikens to apply for a grant with the NSA—to help them catch terrorists. I think that this is just what they need . . .”

Though Dean Chaff never had it in him to find the right words to explain his notions enough to enthrall Professor Aore the way he would have liked, it was his job to find the connections between people and projects that others would normally miss. What the NSA needed, simply put, was to figure out the problem of ranking. Given that a million people were in the TIDE list, they needed to figure out which ones should be prioritized higher than others—which ones should make it to the top? Which ones were more likely to act rather than to just complain? What the NCTC needed was their own version of tinkers to help rank potential terrorists. And this, like ranking movies that were similar to each other, is exactly what Dean Chaff hoped Professor Aore’s system could do. Though all of this should have been done from the start within the NSA or the NCTC, to do so demanded in-house experts with access to enormous amounts of data, years of hands-on experience and innovation, and most of all unwavering dedicated focus and support. And as Rajive had described in his presentation, these were not always cultivated in-house as they should have been.

Within a few months, Professor Aore and Professor Mikens had adapted their algorithm and submitted a grant that had been funded directly by the NSA. They had proposed completing a tightly directed morphological analysis of well-publicized web sites of known terrorism organizations and of terrorist supporters. They had matched the words, phrases, and idioms found on those web sites to pamphlets, newspapers, and other web sites from around the world. Based on the ones that matched closely, they derived a list of books, authors, pamphlets, and web sites that should be actively monitored. It was no different than finding and recommending similar movies. With this information, the NSA only had to find the people who read those pamphlets, had those books, or visited those web sites, and use this information to help prioritize the one million and growing names in the TIDE list.

Dean Harry Chaff may have been awkward, too abrupt, and unable to stay still, but his vague notion proved right. At the completion of their grant, the NSA offered Professors Aore and Mikens follow-up funding. This time, they were asked to create a tool that could be deployed autonomously behind the secure walls of the NSA, with no access given to the professors. The NSA wanted to repeat the same process on a confidential list of currently monitored, but far less publicized, web sites. These were the web sites on which the activities (postings, conversations, e-mails of the participants) were of interest to both the NSA and NCTC.

Enter Rajive.

Rajive took charge of the project. In a joint collaboration between the NSA and NCTC, their system was created and deployed behind secure walls.

The content of the web sites was analyzed, and, using the good professor’s program, it was automatically matched to a list of books. From this, CL-72 was born, a list of books that were the closest matches to the documents that their program was asked to analyze. This was just one of the many CLs to emerge from the program. Each CL detailed some of the attributes and activities to look out for, besides just books and reading patterns, when hunting down would-be terrorists.

Perhaps most interesting was the fact that nobody could accurately articulate why any of the books or other attributes were actually on the list. The two professors who created the program didn’t know anything about the conversations, e-mails, and web sites upon which the program was run (they had just handed over the program; nobody inside the NSA or NCTC told them the actual data that was going to be used) and therefore had no idea of what lists were created, nor the content of the lists (nor did they even know the existence of such lists). The government scientists, who actually ran the computer program, hadn’t taken the time yet to fully understand the specifics of the algorithms. Needless to say, the expectations weren’t high for the project.

image

“I have a meeting in an hour, so just give me the highlights. Tell me what I should tell everyone about your project. I’ll have about ten minutes,” Alan said impatiently.

Sure. I’ll spoon feed you months of work so you can gloss over it at your meeting. No problem, Rajive thought. “The most important thing you can tell them is that Professor Aore’s and Mikens’s work is completed, and it’s deployed. We have 119 candidate lists, CLs, that we expect to get out of this. Once we have the people who match the lists, we can re-rank all the people in TIDE and any other terrorist lists we may have floating around—according to their importance to us.”

“Sound like a good start. What do you call this program?”

“We don’t have a name for it,” Rajive replied.

“I can’t present it without a name. Just make one up.”

“How about Tide-Sorter?”

Alan contorted his face. “Tide-Sorter? Tide-Sorter? Come on Rajive, sounds like a laundry detergent. You’re the whiz-kid. Think of a decent name. Give me an acronym. We need an acronym.”

“I’ll think about it.”

“Fine. You were saying something about 119? 119 what?”

“We have 119 attributes of potential terrorists that we examine. The more attributes the would-be terrorists match, the more likely they are to be of high interest to us. Like CL-45, for example; it’s a list of talks, lectures, and concerts. If the suspect went to those talks, his importance gets notched higher. CL-72 is a list of sixty books—read the books on the list, and that will up your importance to us, too. Some other examples—CL-11, that’s a list of travel destinations. That’s a particularly good one. Go to places on that list—like Iraq, Afghanistan, that will add a few more points to your name. We have 119 of these lists.”

“That’s what you’ve been working on? I can’t present that. I’m going to walk into the meeting and tell them that we’re finding terrorists by looking at what books they read? If they read one of your sixty books then they’re a terrorist? You really want me to tell them that?”

“That’s not all that there is to—”

Alan interrupted, “Or, better yet, if someone went to some concert or talk that you listed on CL-45, then we should be watching them? Rajive, what are you talking about? This is what you’ve been working on?”

Disgust, amusement, whatever the emotion on Alan’s face, it wasn’t appreciation. Unfortunately, Alan wasn’t alone in this reaction. Rajive had been fighting resistance continuously throughout the development of the CLs. CL-1 through CL-119 were lists of TV shows, DVDs, symposiums, textbooks, universities, travel destinations, birthplaces, web sites and search queries, religious orientations, music, and so on that had also been matched with the good professors’ program. When seen individually, the reaction to any of these lists ranged from skepticism to pure disbelief, even among the staff who ran the programs to create the lists.

“I know it sounds crazy. But think about all of these together, Alan. Individually, each one of these lists isn’t important, but together, they’re phenomenal. Together, these lists provide a comprehensive profile of known terrorists and terrorist supporters. When we aggregate the results, the decision of how to prioritize a person is simple: If the same person matches too many lists, for example by traveling to some place on CL-11, and using the phrases on CL-13, and searching for web sites on CL-91, logically they should be prioritized higher in the TIDE list. The more a person matches, the higher their priority is raised. It’s about looking at the complete profile of the person. Not just one or two characteristics by themselves.”

At least the disgust from Alan’s face was dissipating. Alan asked, “And, where do you plan on getting all this information about these people? We don’t have anything close to complete profiles on the million people on our lists.”

“We’ve already started gathering the information. We’re handing out individual CLs to a number of our partners. We’re just asking them to return the names of anybody who matches them. For the airlines, who are always happy to work with us, we’ve given them the list of places on CL-11; they’re returning a list of people who fly to any of those places. A list of stores, CL-61, and a list of products, CL-62, were handed over to credit card companies; the name of anyone who shops at any of the stores in CL-61 or buys an item from CL-62 will be given to us. CL-89 was given to phone companies; they’re always more than happy to hand us phone records. TV shows on CL-19 were handed to cable and satellite providers. CL-83 was handed out to—”

“You have partners for all 119 CLs?”

TIPS!” Rajive exclaimed proudly. “Let’s call the system TIPS—Terrorist Identity Profiling and Sorting.”

“Sure. Whatever,” Alan replied, irked to have his train of thought interrupted—even if it was to answer one of his own questions—though he did take a moment to write down the name. “So,” he asked again, “do you have contacts for all the CLs?”

“No, not all of them. Anything related to the Internet is tough. If we can’t gather it ourselves by our own online traffic surveillance, it’s been difficult to find partners to help. No Internet company, except for a handful of minor players hoping to gain some small favor from us, is willing to hand out any information.”

“So, what are you going to do?”

“We’ve got a number of middle men working with us to make contact into the companies where we don’t have access. We’ve hired the usual contractors for the task.

“They’re already in play?”

“Of course.”

“Fine. Make sure you check them out carefully.”

“Such sage wisdom and insight... That’s why you’re the boss,” Rajive wanted to say. Instead, he replied, “Good idea. Thanks.”

image

Enter Sebastin.

When the National Counterterrorism Center, NCTC, came looking for discreet, patriotic contacts in Silicon Valley for a potentially lucrative payout—at least in terms of Washington, DC dollars—in exchange for some lightweight data, Sebastin’s name popped up. Cory Waxman, a mutual acquaintance who worked in a venture capital fund that incubated companies primarily created to be acquired and integrated into DC’s numerous intelligence agencies, had helped broker a small acquisition deal for iJenix to the Mahabishi Keiretsu. It was a deal of last resort. Nobody else, especially in DC, had wanted to acquire the company. Though the reasons weren’t clear to Cory, what was obvious enough was that Sebastin had fared far less well than the other founders. To throw a few dollars his way, Cory gave Sebastin’s name to Rajive. For Rajive, the fact that Sebastin was now associated with ACCL was workable. The non-profit’s reputation as the poster child for vocal, trendy, white-hot, liberal charities might even be an asset to acquire the data from Silicon Valley techies.

Originally, Sebastin had been slated for handling a different CL, CL-91, one that listed web sites and search terms to look for in people’s activities. However, someone more appropriate was found for that CL. Instead, Sebastin was eventually given the book list, CL-72. His goal was simple: Find all the people who purchased, looked for, read either online or in print, or ever were in any way interested in reading, the books on CL-72B.

The B designation was an indication that the list had been intentionally obfuscated. If anyone looked at CL-72, they would have seen a list that contained numerous book titles with very easily recognizable and unmistakable themes: terrorism, Middle East and Islamic studies, security, and extremist politics. To supplement the original books that were on CL-72, CL-72B was created. In CL-72B, numerous random books were mixed into the list to provide the viewer no immediately observable pattern of topics. It was Rajive who was tasked to select the books since he was in the room at the time it was decided that CL-72 needed to be augmented and since he, as he readily admitted, was no longer technical enough to help in the actual hard work being done.

To augment CL-72, Rajive simply scanned Amazon.com to find books with interesting covers that also matched the criteria given to him—not too popular, a wide variety of topics, and none related to the subject at hand. He added 900 books to the 60 originally on the list, just as requested. The obfuscation had worked well on Sebastin, too. Until Stephen had told Sebastin that there seemed to be two groups of books, Sebastin never noticed or thought to look that deeply. For Sebastin, the plan would have worked exactly as anticipated.

Enter Stephen.

Sebastin would have continued to be an ideal candidate for the task laid out had it not been for Stephen’s desire to impress him. In the planning stages at NCTC, when it was decided to use outside sources to determine the people who matched the CLs, the possibility of the lists being intercepted was, of course, considered. The hypothetical adversaries may have been just as self-servingly motivated as Sebastin, but certainly did not have the resources and support that Stephen had to uncover the full potential of the lists. The combination of motivation and resources led to unanticipated outcomes. The results that Stephen came up with far surpassed finding the buyers of books that CL-72B had been intended to discover. He found the people who not only bought the books, but visited the same web sites, talked about the same subjects, and had similar profiles in many more ways than simply their choice of reading material. In short, he managed to recreate numerous other CLs, and in doing so, figured out how to take into account the evidence locked within them.

All of the data was put into Stephen’s graph, and the connections between people, books, web sites, chats, and phone calls were naturally represented. Each connection revealed vital, if individually minute, pieces of evidence. The amount of data was massive and the computational requirements to do something meaningful with it even more gargantuan. But it was all completed in only one of Ubatoo’s datacenters in India, while the local audience there slept and left the machines sitting idly waiting for someone to put them to good use. In these datacenters in India, resided the names of the people that NCTC would, beyond a doubt, find “interesting.” At the moment of discovery, all these people were happily living their lives in the middle of the day, oblivious to their habits, personalities, and desires being systematically scrutinized and seconds later being marked for further examination by a program an intern had deployed on a farm of computers an ocean away.

Enter Molly.

These “interesting” people, like Molly, were flagged for review by Stephen, who was connected to Atiq, who was connected to Sebastin, who had found a connection into Rajive’s plan.

And now, back to Rajive and Alan.