Accessing Government Data: Open Distribution Versus Jealous Control

Much like Josh Tauberer’s GovTrack.us, the Washington Post’s Congress Votes database allows users to easily search and sort through a database of congressional bills and votes. When the Post was building its site, the House offered its roll call votes in XML, a standard machine-readable format, while the Senate did not. This forced the Post, like GovTrack, to rely on cumbersome “screen scraping” of Senate web pages to make their roll call votes usable (see Chapter 18).

In 2007, however, co-creator Derek Willis was poking around the Senate website when he discovered a directory of XML files of vote data for past sessions. This demonstrated that the Senate had the ability to make its votes available online in a structured format. Willis was elated at the thought that perhaps there was easy access to Senate vote data after all. He wrote to the Senate webmaster asking whether structured voting data was available for the current session and, if so, whether this data would be made public. The telling response read, in part, as follows:

A few representative votes (only a few from the early congresses) were published out to the active site during some testing periods. I really need to remove them from the site.

We are not authorized to publish the XML structured vote information. The Committee on Rules and Administration has authorized us to publish vote tally information in HTML format [not a structured format]. Senators prefer to be the ones to publish their own voting records. As you know, looking at a series of vote results by Senator or by subject does not tell the whole story. Senators have a right to present and comment on their votes to their constituents in the manner they prefer. This issue was reviewed again recently and the policy did not change.

Senators doubtlessly would “prefer to be the ones to publish their own voting records.” But jealous control over information by government is anathema to democracy. Looking at a series of votes by a senator does in fact tell the “whole story” of that senator’s voting record, and despite what the webmaster said, senators do not have a “right” to present their votes to the public “in the manner they prefer.” Of course, this only motivated hackers such as Willis and Tauberer further.

When third parties make government data available, it demonstrates that it is possible to do so cheaply and efficiently. In some cases, officials can be unaware of what is technically possible, or they may believe that state-of-the-art technology is prohibitively expensive. Freeing information can also generate an awareness and demand for the newly accessible data among citizens. This can lead to embarrassing questions for government: why isn’t it making the data available itself? Why are citizens forced to hack the data in order to access it? Also, hacking government data can demonstrate to cautious officials that when information is made accessible and useful, the world does not end.

In fact, since GovTrack.us and the Washington Post Congress Votes database brought attention to the issue, the Senate Rules Committee finally relented and has recently begun to make roll call votes available in XML. Two years after Derek Willis was rebuffed by the Senate webmaster, a group of seemingly embarrassed senators wrote to Committee Chairman Chuck Schumer demanding a repeal of the prohibition on XML.

“This policy has created a situation where outside groups are forced to create databases that are more likely to contain errors and omissions,” they wrote. “The suggestion that the Senate would intentionally hamstring the distribution of roll call votes so Senators could put a better spin on them is concerning. The public is capable of interpreting our votes on its own.”