Massively Collaborative Mathematics

TIMOTHY GOWERS AND MICHAEL NIELSEN

On 27 January 2009, one of us—Gowers—used his blog to announce an unusual experiment. The Polymath Project had a conventional scientific goal: to attack an unsolved problem in mathematics. But it also had the more ambitious goal of doing mathematical research in a new way. Inspired by open-source enterprises such as Linux and Wikipedia, it used blogs and a wiki to mediate a fully open collaboration. Anyone in the world could follow along and, if they wished, make a contribution. The blogs and wiki functioned as a collective short-term working memory, a conversational commons for the rapid-fire exchange and improvement of ideas.

The collaboration achieved far more than Gowers expected, and showcases what we think will be a powerful force in scientific discovery—the collaboration of many minds through the Internet.

The specific aim of the Polymath Project was to find an elementary proof of a special case of the density Hales-Jewett theorem (DHJ), which is a central result of combinatorics, the branch of mathematics that studies discrete structures (see ‘Multidimensional noughts and crosses’). This theorem was already known to be true, but for mathematicians, proofs are more than guarantees of truth: they are valued for their explanatory power, and a new proof of a theorem can provide crucial insights. There were two reasons to want a new proof of the DHJ theorem. First, it is one of a cluster of important related results, and although almost all the others have multiple proofs, DHJ had just one—a long and complicated proof that relied on heavy mathematical machinery. An elementary proof—one that starts from first principles instead of relying on advanced techniques—would require many new ideas. Second, DHJ implies another famous theorem, called Szemerédi’s theorem, novel proofs of which have led to several breakthroughs over the past decade, so there was reason to expect that the same would happen with a new proof of the DHJ theorem.

The project began with Gowers posting a description of the problem, pointers to background materials and a preliminary list of rules for collaboration (see go.nature.com/DrCmnC). These rules helped to create a polite, respectful atmosphere, and encouraged people to share a single idea in each comment, even if the idea was not fully developed. This lowered the barrier to contribution and kept the conversation informal.

Building Momentum

When the collaborative discussion kicked off on 1 February, it started slowly: more than seven hours passed before Jozsef Solymosi, a mathematician at the University of British Columbia in Vancouver made the first comment. Fifteen minutes later a comment came in from Arizona-based high-school teacher Jason Dyer. Three minutes after that Terence Tao (winner of a Fields Medal, the highest honour in mathematics) at the University of California, Los Angeles, made a comment. Over the next 37 days, 27 people contributed approximately 800 substantive comments, containing 170,000 words. No one was specifically invited to participate: anybody, from graduate student to professional mathematician, could provide input on any aspect. Nielsen set up the wiki to distil notable insights from the blog discussions. The project received commentary on at least 16 blogs, reached the front page of the Slashdot technology-news aggregator, and spawned a closely related project on Tao’s blog. Things went smoothly: neither Internet ‘trolls’—persistent posters of malicious or purposefully distracting comments—nor well-intentioned but unhelpful comments were significant problems, although spam was an occasional issue on the wiki. Gowers acted as a moderator, but this involved little more than correcting a few typos.

Progress came far faster than anyone expected. On 10 March, Gowers announced that he was confident that the Polymath participants had found an elementary proof of the special case of DHJ, but also that, very surprisingly (in the light of experience with similar problems), the argument could be straightforwardly generalized to prove the full theorem. A paper describing this proof is being written up, along with a second paper describing related results. Also during the project, Tim Austin, a graduate student at the University of California, Los Angeles, announced another new (but non-elementary) proof of DHJ that made crucial use of ideas from the Polymath Project.

The working record of the Polymath Project is a remarkable resource for students of mathematics and for historians and philosophers of science. For the first time one can see on full display a complete account of how a serious mathematical result was discovered. It shows vividly how ideas grow, change, improve and are discarded, and how advances in understanding may come not in a single giant leap, but through the aggregation and refinement of many smaller insights. It shows the persistence required to solve a difficult problem, often in the face of considerable uncertainty, and how even the best mathematicians can make basic mistakes and pursue many failed ideas. There are ups, downs and real tension as the participants close in on a solution. Who would have guessed that the working record of a mathematical project would read like a thriller?

Broader Implications

The Polymath Project differed from traditional large-team collaborations in other parts of science and industry. In such collaborations, work is usually divided up in a static, hierarchical way. In the Polymath Project, everything was out in the open, so anybody could potentially contribute to any aspect. This allowed ideas to be explored from many different perspectives and allowed unanticipated connections to be made.

The process raises questions about who should count as an author: it is difficult to set a hard-and-fast bar for authorship without causing contention or discouraging participation. What credit should be given to contributors with just a single insightful contribution, or to a contributor who is prolific but not insightful? As a provisional solution, the project is signing papers with a group pseudonym, ‘DHJ Polymath’, and a link to the full working record. One advantage of Polymath-style collaborations is that because all contributions are out in the open, it is transparent what any given person contributed. If it is necessary to assess the achievements of a Polymath contributor, then this may be done primarily through letters of recommendation, as is done already in particle physics, where papers can have hundreds of authors.

The project also raises questions about preservation. The main working record of the Polymath Project is spread across two blogs and a wiki, leaving it vulnerable should any of those sites disappear. In 2007, the US Library of Congress implemented a programme to preserve blogs by people in the legal profession; a similar but broader programme is needed to preserve research blogs and wikis.

New projects now under way will help to explore how collaborative mathematics works best (see go.nature.com/4ZfIdc). One question of particular interest is whether the process can be scaled up to involve more contributors. Although DHJ Polymath was large compared with most mathematical collaborations, it fell short of being the mass collaboration initially envisaged. Those involved agreed that scaling up much further would require changes to the process. A significant barrier to entry was the linear narrative style of the blog. This made it difficult for late entrants to identify problems to which their talents could be applied. There was also a natural fear that they might have missed an earlier discussion and that any contribution they made would be redundant. In open-source software development, this difficulty is addressed in part by using issue-tracking software to organize development around ‘issues’—typically, bug reports or feature requests—giving late entrants a natural starting point, limiting the background material that must be mastered, and breaking the discussion down into modules. Similar ideas may be useful in future Polymath Projects.

Multidimensional Noughts and Crosses

To understand the density Hales-Jewett theorem (DHJ), imagine a multidimensional noughts-and-crosses (or tic-tac-toe) board, with k squares on a side (instead of the usual three), and in n dimensions rather than two. Any square in this board has n coordinates between 1 and k, so for instance if k = 3 and n = 5, then a typical point might be (1,3,2,1,2). A line on such a board has coordinates that either stay the same from one point to the next, or go upwards or downwards. For instance, the three points (1,2,3,1,3), (2,2,3,2,2) and (3,2,3,3,1), form a line. DHJ states that, for a very large number of dimensions, filling in even a tiny fraction of the board always forces a line to be filled in somewhere—there is no possible way of avoiding such a line. More than this, there is no way to avoid a ‘combinatorial line’, in which the coordinates that vary have to vary in the same direction (rather than some going up and some going down), as in the line (1,2,3,1,1), (2,2,3,2,2) and (3,2,3,3,3). The initial aim of the polymath project was to tackle the first truly difficult case of DHJ, which is when k = 3.

Towards Open Science

The Polymath process could potentially be applied to even the biggest open problems, such as the million-dollar prize problems of the Clay Mathematics Institute in Cambridge, Massachusetts. Although the collaborative model might deter some people who hoped to keep all the credit for themselves, others could see it as their best chance of being involved in the solution of a famous problem.

Outside mathematics, open-source approaches have only slowly been adopted by scientists. One area in which they are being used is synthetic biology. DNA for the design of living organisms is specified digitally and uploaded to an online repository such as the Massachusetts Institute of Technology Registry of Standard Biological Parts. Other groups may use those designs in their laboratories and, if they wish, contribute improved designs back to the registry. The registry contains more than 3,200 parts, deposited by more than 100 groups. Discoveries have led to many scientific papers, and a 2008 study showed that most parts are not primitive but rather build on simpler parts ( J. Peccoud et al. PLoS ONE 3, e2671; 2008). Open-source biology and open-source mathematics thus both show how science can be done using a gradual aggregation of insights from people with diverse expertise.

Similar open-source techniques could be applied in fields such as theoretical physics and computer science, where the raw materials are informational and can be freely shared online. The application of open-source techniques to experimental work is more constrained, because control of experimental equipment is often difficult to share. But open sharing of experimental data does at least allow open data analysis. The widespread adoption of such open-source techniques will require significant cultural changes in science, as well as the development of new online tools. We believe that this will lead to the widespread use of mass collaboration in many fields of science, and that mass collaboration will extend the limits of human problem-solving ability.