Give Us the Data

Guardian, 7 October 2011

Bad things happen when problems are protected by a force field of tediousness. Here is an example. Data is the fabric of the modern world: just as we walk down pavements, so we trace routes through data, and build knowledge and products out of it. The government has lots of data that has already been collected, because it was needed to run the country properly: simple stuff like maps, postcode areas, land ownership, procurement data, endless weather readings, and so on.

Right now a fight is happening in Whitehall between two factions in government: one group thinks we should give this data away for free, as a matter of principle, because it will make good things happen; the other thinks we should restrict access, and sell it. A consultation is under way. Despite a positive ministerial introduction, each of the three options it gives for releasing data is foolishly restrictive. Here’s why that’s a problem.

As things stand, much everyday government data is locked down so hard that nerds are forbidden to repurpose it. You could have a map of who owns what in your town, on your screen, at a click. You could find out what company boards someone sits on, and map their relationships and overlaps with all the other directors in the country. You could download transcripts of court proceedings that affect you. All this is blocked by the government’s restrictive data policies.

There are areas where access has been won by the shame of a simple moral argument. Hansard is a record of everything that happens in Parliament. TheyWorkForYou.com is a repurposing of that data which adds huge value, not just by being more usable than Hansard, but by identifying patterns in MPs’ voting behaviour. When it first came out, Hansard argued – embarrassingly – that this was an illegal breach of copyright.

But there are also straight commercial applications. If you’re making services or things that you sell to government, then seeing what they use and need helps you sell them stuff. That data is even internally useful: if you can see what everyone else is paying for toilet paper, you might get a better deal for your own department.

All this data has to be created, regardless of whether or not it gets sold, simply in order to run the country. You could ‘sweat the asset’, and charge money for access; but if you release it for free, at barely any cost to yourself, without fiddliness, in its raw form, the benefits are potentially huge.

This becomes especially clear when you notice how the restrictions extend beyond specific realms of data, and into the kind of core structural information that is needed as a civic skeleton for simple, everyday activity. The Royal Mail still owns all our postcode information, and you can’t get the house-number boundaries of each specific postcode without paying. All the most interesting data projects involve linking one dataset with another, and for addresses, that often means using postcodes, as a commonly used structural spine (I’m willing to bet that you don’t know your house’s latitude and longitude). This kind of framework data is the pavement of data space, and if you’re not allowed to use it, projects go unmade.

The economic loss is almost impossible to measure: if any of the projects I’ve already described sound trivial to you, remember that this is a crippled field, where innovators have barely had a chance to get their eyes in. Amazing things happen when you pull individual pieces of information together into larger linked datasets: meaning emerges, as you produce facts from figures. If you’ve ever wished you were born in the nineteenth century, when there were so many obvious inventions and ideas to hook for yourself, then I seriously recommend you become a coder, because future nerds will look back on this time with the exact same envy. But that leap forward will be tediously retarded if we don’t make the government allow us to use the pavements.