Chapter 8

The challenge of data

IN MARCH 2007, Nick Pearce was running the British think tank the Institute for Public Policy Research. That month, one of his young interns, Amelia Zollner, was killed by a lorry while cycling in London. Amelia was a bright, energetic Cambridge graduate, who worked at University College London. She was waiting at traffic lights when a lorry crushed her against a fence and dragged her under its wheels. Two years later, in March 2009, Pearce was head of Prime Minister Gordon Brown’s Number 10 Policy Unit. He had not forgotten Amelia and wondered to a colleague if the publication of raw data on bicycle accidents would help. Perhaps someone might then build a website that would help cyclists stay safe?

The first dataset was put up on 10 March. Events then moved quickly. The file was promptly translated by helpful web users who came across it online, making it compatible with mapping applications. A day later, a developer e-mailed to say that he had ‘mashed-up’ the data on Google Maps. (Mashing means the mixing together of two or more sets of data). The resulting website allowed anyone to look up a journey and instantly see any accident spots along the way. Within 48 hours, the data had been turned from a pile of figures into a resource that could save lives, and which could help people to pressure government to deal with black spots.

Now, imagine if the British government had produced a bicycle accident website in the conventional way. Progress would have been glacial. The government would have drawn up requirements, put it out to tender, and eventually gone for the lowest bidder. Instead, within two days, raw data had been transformed into a powerful public service.

Politicians, entrepreneurs, academics, even bureaucrats spend an awful lot of time these days lecturing each other about data. Every minute, YouTube users upload 300 hours of new video, Google receives over 3.5 million search queries, more than 150 million e-mails and over 450,000 tweets are sent, Facebook users share almost 700,000 pieces of content. Facebook users are prolific in another sense: in 2016, the average Facebook user, according to Mark Zuckerberg, spent fifty minutes a day on the site and its sister platforms Instagram and Messenger. As The New York Times points out, that is around one sixteenth of their waking hours, every day of the year. Roughly 500 petabytes (500 million gigabytes) of information were stored. There is big data, personal data, open data, aggregate data, and anonymised data. Each variety has issues: where does it originate? Who owns it? What it is worth?

Not all of the data in the extraordinary masses of datasets will be open. Much data is owned by corporations, which use it to perform sophisticated analyses of their customers’ behaviours. They are understandably reluctant to open any channel to it that might be used by their competitors. Nevertheless, conditional access, for health researchers to review food purchasing in supermarkets, for example, could be immensely useful.

In many fields, so much data is now gathered that radical new techniques are needed to process it. Governments and corporations are waking up to the potential of this exponential growth. If a business has customers, it can predict what they want and supply it. If it has a supply chain, it can make it more efficient. If it uses raw materials and fuel, it can scan multiple sources and prices. Strategic managers ought, in principle, to be able to map out multiple ways forward for their business, adjusting them on the basis of myriad incoming data streams. The trouble, however, is that in many business environments, competitors are doing just the same, just as rapidly, intelligently, and comprehensively, and so are their suppliers. Businesses get better at the game, but the game itself gets more complicated. This can be dangerous, as in the personal finance and home loans industry, where too much innovative complexity, too quickly, broke several banks and dragged whole economies backwards. But there are benefits, too: the personal and home electronics industry is as competitive as any in the world, and its range of products has never been more powerful.

*

The World Wide Web is a gloss that covers billions of documents, videos, photos, music files. By enabling people to rapidly identify, and move to, almost endless resources, all of them within a simple address system, it has been a powerful catalyst for other changes. Much the same principles can be applied to machines seeking out data. A similar codification of addresses for datasets enables them to be linked to every other dataset. Machines can and do trawl them to locate and match cognate sets. The World Wide Web, comprehensible to humans, is extended by that semantic web of linked data, comprehensible to machines, which will decode it for us. The semantic web, as it comes about in the next few years, will greatly amplify the web’s capacities. Every object in the digital ape’s life, every database that touches on that life, will in principle be able to talk with every other.

*

On the face of it, open data is an idea too simple and right to fail. Assuming that the correct safeguards around private and personal information are in place, then the vast information hoards held by central and local government, quangos, and universities should form a resource for entrepreneurs who wish to start new businesses; private suppliers of goods and services who believe they can undercut the prices of existing contractors; journalists and campaigners who wish to hold power to account. Economic innovation and democratic accountability would both benefit. Bureaucrats would learn more about how their organisations function, and manage them better.

A good start has been made in publishing previously untapped public datasets, with some impressive early benefits. In the US, the federal government established data.gov, while in the UK data.gov.uk and the Open Data Institute were launched. Transport for London (TfL), which runs London’s tube trains and buses, and manages the roads, began to publish masses of information, much of it real time, about their services. This enabled developers to quickly build applications for smartphones telling travellers about delays and jams. Commuters and goods deliverers could plan their journeys better. An estimate for TfL puts the savings as a result at over £130 million per year. The Home Office, on the back of falling crime rates across the UK, was emboldened to publish very detailed, localised crime statistics. Analyses of prescriptions for drugs written by GPs show hundreds of millions of pounds worth of cases where cheaper and better drugs could have been prescribed.

The fast crunching of numbers by outsiders new to a field does not guarantee good results. The fact that family doctors prescribe the wrong things has been known for decades: so has the difficulty of imposing any rational management on doctors, who remain a powerful professional elite. Hospital doctors rightly point out that the publication of raw death rates for individual specialists can be misleading. It might look like a good plan to go to the heart specialist with the highest patient survival rate. But the best surgeons often get the most difficult cases, who are by definition more likely to die. ‘Transparency’ can mislead.

Open data also raises important questions about intellectual property. Patents and copyright have been great engines of innovation. It does not, however, seem right that the Ordnance Survey and the Royal Mail, both run for centuries by the government, should insist on their strict Intellectual Property rights over, respectively, mapping data and postcode addresses, compiled at the public expense. At the moment they do.

As a matter of practice, open data may in part be catching on by a peculiar default, for the wrong motives even. Policy-makers and executives who make decisions about open data are usually the same people who own data strategies generally. Data has been a puzzling, dangerous area for a few years now. Companies and public agencies face real risks, risks which may affect the reputations and affordable lifestyles of those decision makers. An expensive investment in big data could be the wrong move, fail to pay off. A failure to grasp a big data opportunity could be fatal, or at least embarrassing. Now, open data is easier to effect, cheaper, and the top web people, led by Tim Berners-Lee himself, promote it. I’ve no idea really what my own outfit would win or lose from it: but I’d like to seem cutting-edge. If I do it, I’ll look as if I’m actively part of the new wave. Seems safer than that other big, chancy corporate thing.

Those of us in favour of open data would prefer a more sophisticated understanding of its value. Nevertheless, these are respectable enough motives in their way, and come to the right result. The single thing that every citizen and every corporate decision maker needs to understand is that the enormous data stores that government, government agencies, corporations, trusts, and individuals hold are as much a key part of national and international infrastructure as the road network. Countries take national responsibility for ensuring that transport infrastructure is fit for purpose and protected against the elements and attack. They should take the same responsibility for data infrastructure. The digital estates of the modern nation and the modern corporation are vast. Much of the architecture is designed to be inward looking in the case of the nation, and cash generating in the case of the corporation. They are not merely poorly tapped information libraries — although better access, for citizens, entrepreneurs, and researchers, is important. They also enable much of everyday life to happen. Most of us would prefer our doctors to have our medical records when they treat us. Most of us would prefer not to lose the accumulated data on our friendship and business networks held by Google, Facebook, Microsoft, and their wholly owned applications. Property ownership is only as good as the national and local government ownership records. Wealth and income are only as good as the databanks of financial institutions. Some of this should be open, at least as metadata. Much of it should be utterly private, at least in the detail. All of it needs to be protected from attack, decay, accident.

Attack is not merely a theoretical danger, as we have all come to know. Judicious leaking of e-mails from the Democrat camp by Russian state-sponsored hackers was a feature of the 2016 US election. Here is a typical news report, of a kind we will see increasingly:

WannaCry malicious software has hit Britain’s National Health Service, some of Spain’s largest companies including Telefónica, as well as computers across Russia, the Ukraine and Taiwan, leading to PCs and data being locked up and held for ransom. The ransomware uses a vulnerability first revealed to the public as part of a leaked stash of NSA-related documents in order to infect Windows PCs and encrypt their contents, before demanding payments of hundreds of dollars for the key to decrypt files … When a computer is infected, the ransomware typically contacts a central server for the information it needs to activate, and then begins encrypting files on the infected computer with that information. Once all the files are encrypted, it posts a message asking for payment to decrypt the files — and threatens to destroy the information if it doesn’t get paid, often with a timer attached to ramp up the pressure.

Alex Hearn and Samuel Gibbs, ‘What is WannaCry ransomware and why is it attacking global computers?’, The Guardian, 12 May 2017

It is unclear whether WannaCry was initially state-sponsored, although once launched it spreads like a virus, not like a target-seeking missile. It operates in dozens of languages, translated by computer program; experts believe only the Chinese version was written by a native speaker. But North Korean groups closely linked to the government have certainly had a hand in much cyber trouble, possibly including this one. Old-fashioned amateur — but very expert — hacking also still abounds.

*

Linked to the idea of data as essential infrastructure is the notion of net neutrality. The internet is a key feature in the lives of everyone who lives in the developed economies. Even the minority who do not personally access it every day, at home, at work, on their phones, depend on the businesses and services it enables. A free and open society should look askance at any attempt to restrict access to it. The internet service providers — mostly big corporations — can be sorely tempted to give priority to the users or organisations they like best: those prepared to pay more. And slow down or otherwise inhibit the service provided to the rest of us. Those corporations should be helped to stay neutral by government; to resist the temptation to sell priority, not encouraged.

*

A brief word about a technical question for economists, of some importance to all of us. Is open data a public good? And, if so, what does that imply about when and whether and at what level prices should properly be attached to it? Obviously open data is a good thing, we think, for a whole set of good reasons. But that is not the same as being a public good as the economics textbook would have it. And establishing, as some commentators would, that open data is a public good then enables the wholesale importation of a bunch of classic arguments about who should pay for public goods, how and why to deal with free rider problems etc. etc. There are serious and strong arguments that many large datasets owned by both public and private agencies should be opened up to private companies and citizens to exploit. It could be a diversion into dry academicism to dispute the issue of public goodness, but a solid view could lead to an even more helpful characterisation.

In the textbook, public goods are ‘non-excludable’; ‘non-rivalrous’, and ‘non-rejectable’. Non-excludable in the sense that if a country has protection against nuclear attack, that protection will apply to individual citizens who didn’t pay for it: children, poor people, tax dodgers. The same is broadly true of clean air and street lights and traffic controls and the state providing last resort care for indigent or old or otherwise vulnerable people. Non-rivalrous, in the sense that you breathing clean air, or being protected by nuclear weaponry, does not prevent me breathing and being protected. Non-rejectable in the sense that, once a government implements a public good, or indeed if it just happens without state intervention, then the individual cannot, generally, opt out. A citizen may not approve of a government salting clouds with iodine to make rain, but they can’t opt out of the water falling from the sky.

Datasets are not an obvious easy fit to the classic definition of a public good. Governments can and do easily and normally exclude citizens from data: the distinction ‘open data’ only means something because the option to close data is easily implemented. The overwhelming mass of individuals have no personal direct ability to engage with, manipulate, or understand data. In that sense, they are excluded, if you like, by their own nature, talents, and experience. (We advocate that children should code from an early age, and become familiar with the joys of data.) Data stores are certainly surrounded by rivalry — many app products, based on open data, compete for our attention. And I can easily reject, and we hope will increasingly be offered the choice of rejecting, my data being available to anyone else.

So the entity which may be supposed to be a public good is the collection of data bordered on one side by governmental and commercial secrecy and on the other by personal privacy and aptitude.

Datasets do have an interesting characteristic, as cut and come again cake. That person over there in Boston interrogating any particular file does not prevent this person over here in Belgrade interrogating the same file. But also, the data collection as such remains the same size. Data is some of the time more like a flow, an activity, than a stock — one does not have a bit of data, nor use it up. One uses it, without corrupting or destroying it. We don’t need it, we need access to it to enable our desired activity. Infinitely reusable and endlessly copyable.

But the same is true these days of words and music as of data. There is an obvious parallel with patents and copyright as concepts, and even more now in practice, given that music, television, and films can travel about the world in digital form. Productisation of data is not dissimilar to productisation of banjo playing. You can charge for access to data, you can charge for rental of music delivery software, or CDs, or digital downloads.

There are physical limits to do with the real costs of capture, storage, processing, and end use. There is a useful comparison with other powerful non-physical economic attributes: intellectual capital, for instance, and the skills of employees.

There is even a parallel to some renewable resources. In many countries, hydroelectric power stations in a hilly region provide power to cities hundreds of miles away in the plain. In the nineteenth century, the English Lake District was given another beautiful water-filled valley to add to Windermere, Coniston, Ullswater. The new one, Thirlmere, was created to provide Manchester with water, and still does. The rain that fuels these enterprises is free. The cost arises from planning, then building, then maintaining the massive infrastructure. Much the same is actually true of data capture and reuse.

Well then, should we not ensure that the price of reusing data recovers only the marginal cost? This is an argument associated with the Cambridge economist Rufus Pollock. We think, no, make it free. Briefly, marginal cost theory says that price should equal only the additional costs of getting the stuff to you, not the capital already invested. The capital cost is history; adding it in today just distorts today’s economy, makes people, for instance, use longer, more socially inefficient, routes rather than pay a toll on a bridge. People will use data less if they are overcharged for it, and that will be damaging to the economy as a whole. Mmm, hold on a second. Know any countries that charge only marginal cost for the electricity created out of free rain, or for the rain itself when it comes out of the tap? Countries that never impose tolls on long bridges or highways? (Extra credit answer: the former Soviet Union operated just so. How did that work out?) The reasons are simple enough: first, if the capital is not funded by loans repaid, eventually, by the individual customers, then in a free market economy it has to be found from taxation or other means. Those means often distort the economy even more than charging for the stuff. Second, if electricity is ‘free’ or very cheap, nobody turns their lights off. When the tolls on the Dartford Bridge crossing the Thames Estuary in England had, over 40 years, finally paid off the construction debt in 2003, the traffic authorities kept them in place. The already massively overcrowded London Orbital Motorway, the M25, would have ground to a halt if they were removed.

Now, there is a specific data argument that the immortal nature of numbers means that the resource is never used, and there is some truth in that. (There are marginal cost issues here, but also anti-profit ones, and they get muddled.) And in a perfect political economy it might be a powerful one. But why should old people for whom a local authority is responsible have to pay fees and charges for their own residential or community care, while the same local authority gives valuable mapping data away for free to Google or the Ordnance Survey or the private owners of the postcode, so they can charge their customers for it? Well, the response may come, those big beasts should be tamed, two at least of them should be in public ownership and not hoarding and charging for … Yes, yes, but since they are, as a matter of fact, in private ownership, they should either charge socially responsible prices, or, much preferably in our view, make their data freely available, neither of which is likely to match marginal cost.

Public datasets should definitely be open to all comers, subject to privacy and security concerns. On the whole, we do think that public data should be non-profit, provided you deliver the right quantity of free public services consistent with using price properly to ration them. Marginal cost is strictly irrelevant in a second-best world. It’s not the cost recovery, it’s the profit, and how that fits in to the global patterns of costs, that must be thought about.

*

‘Sunlight is the best disinfectant’ began to have its current vogue as a widely used phrase a dozen years ago. It was coined in the early years of the twentieth century by Judge Louis Brandeis, the reforming member of the Supreme Court. David Cameron used it in speeches about transparency as UK prime minister. (Little bandied about, one suspects, at his domestic hearth. His family, and his wife’s, have benefitted from extensive offshore tax avoidance schemes.) The Sunlight Foundation, an admirable not-for-profit which has been a key player in the United States, was established in 2006. The phrase brings earthy, green, planet-friendly common sense to a complex public policy issue.

Such a pity that, as a patent matter of fact, sunshine is a rather poor disinfectant. Know any young parents who leave their baby’s bottle in the garden to sterilise it? Anyone hark back to the lavatory in the yard, just leave the door open and the sun will clean it? Opening sunroofs above hospital wards to sort out the scalpels and doctors’ hands?

Yes, it is the case that water without any brackish lumps in it decanted into a transparent plastic bottle and left long enough in the solar glare in hot countries becomes much more potable. The effect of ultra-violet radiation. Useful, but not a dependable water supply.

To further belabour the point, Cameron and others have used the phrase in an international context. The UK, US, and the EU pride themselves, with the beginnings of some justice, on having embraced at least the basic principles of open government. (Everyone agrees there is quite some way to go.) It does seem genuinely odd to lecture much more closed places in equatorial Africa and tropical South America that what they need is more sunlight. They have world-beating quantities of that already.

*

One next big thing amongst the many next big things is self-describing data. This requires metadata — data about the data itself. How, when, and where it was generated. How often it is updated, its presumed accuracy, or inherent uncertainty. Which does not mean data that can work out itself what it means. It means data basically arranged so that as well as (a) a pile of numbers, there is also (b) a lot of descriptive categorical canonical data, a large expansion at any rate of the metadata which normally accompanies any data file. The advantages are plain if the system is up and running and widely complied with. Different piles of data already find it easy to connect with each other, and the technical means to enable that is increasing by the day. If data can (as it were) refer to and seek out other data, and merge their usefulnesses, that will lead, it is argued, to exponential increases in the power and uses of that data.

The possible problems are two-fold. First, the cost, perhaps primarily labour cost, to data managers. This is probably small for each single heap of data: somebody who knows how to build a data file and publish it pretty much by definition knows how to follow logical rules about arrangement and description of numbers, and almost (but not quite) certainly knows roughly what it is that the data describe, or can find out. Second, imagining success for the scheme takes a leap of faith, a heroic assumption about the general desire amongst data holders for openness and cooperation. Again, whilst it is possible to imagine a negotiated settlement with, and some regulation of, big business and government holders of data about private individuals, enabling the success of radical approaches to personal data, which we will discuss in detail in a moment, it is really difficult to see why or how a government or other regulator would step in here. And equally difficult at present to see where a strong decisive commercial interest would emerge. Data professionals will be interested and will comply with the kind of voluntary standards which have worked very well in the industry because a big part of their job is to connect kit with kit. That will begin to develop a useful resource. Products which exploit that will emerge. Will they include ones so popular there will be intense pressure, or financial motivation, on …

Of course, a description only makes sense within a framework of assumptions, and an operating language or translation process, common to the compiler of the data store and the potential readers. The World Wide Web Consortium (W3C), as leader of a coalition of many other bodies, have devised such frameworks.

*

Data analytics is a powerful set of new instruments in many, perhaps unexpected, fields. Franco Moretti and many others, for instance, have brought new insight to literary criticism. As has sentiment analysis, which applies mathematics to words and draws conclusions about what the speaker or writer might have felt or meant. So, in a recent example, the TheySay consultancy, a spin-off from Oxford University, examined entries in a UK-wide children’s story competition run by the BBC. Here is how TheySay describe their work:

Using advanced computational linguistics and machine learning techniques TheySay was able to unearth fascinating information on the emotional signals detected in the stories and highlight how these change in different age groups and locations. The text from all submitted stories was analysed and data was collected around the positive, neutral, and negative sentiment as well as the emotional content of every story. TheySay also determined what entities, ideas, or opinions appeared most frequently in a positive or negative context in the entire set of submissions.

Overall, the stories submitted were complex tales that contained both negative and positive sentiment, with happiness and fear being the most common emotions. There was a significant drop of average positive sentiment with age. In fact, a 20% drop of average positive sentiment was detected from the youngest age group to the oldest one, showing that older children submitted stories that on average were darker, more complex, and multi-layered.

Happiness peaked in stories submitted by 7-year-old children, with a noticeable drop after that. The detected levels of fear and anger rose in stories submitted by children in the older age groups, perhaps a result of teenage angst. There was also a small difference between the sentiment levels in stories submitted by girls and those submitted by boys. On average, girls’ stories contained slightly higher levels of positive and neutral sentiment than those written by boys. Similarly, there was variation observed in the levels of related emotions: boys’ stories expressed more fear and anger while girls’ stories had higher levels of happiness and surprise.

Perhaps surprisingly, the words ‘school’ and ‘teacher’ were among those used in a positive context most frequently. Schools were often mentioned in association with happiness and excitement. The words ‘adventure’, ‘heart’, and ‘chocolate’ were also very popular words associated with positive sentiment and happiness. On the other end of the spectrum, the word ‘door’ was used most often in a highly negative context; many of the submitted stories talked about ‘locker’ or ‘creaky doors’, or doors behind which scary creatures like dragons or monsters were hiding.

Intriguing differences appeared between stories submitted from different parts of the country. There were more mentions of scary or unpleasant aunts in stories from Northern Ireland than any other region; aunts in other parts of the UK were presented as mostly harmless. The word ‘maths’ was used in a highly positive context much more frequently in stories submitted in Scotland compared to those submitted elsewhere. In stories written by English children, the words ‘refugees’ and ‘Syria’ were among those most frequently used in association with positive sentiment. Interestingly, these words appeared most often in stories that expressed high levels of hope and happiness, with the children’s attitude towards refugees being largely positive and empathetic.

Finally, TheySay was able to provide a heat-map of happiness, showing how happiness in the children’s stories varied by post-code. The highest average happiness levels were detected in stories submitted in Llandudno, Wales.

The insights provided by TheySay around the sentiment and emotions contained in the children’s stories gave a new layer of understanding of children’s language and a unique look at how age, gender, and location can affect children’s writing.

Andy Pritchard, ‘500 Words Competition: TheySay collaborates with OUP on BBC’s 500 words competition’, TheySay website

That’s an intriguing picture, if difficult to use as the basis for any kind of policy. (One suspects that, if the rest of us moved to Llandudno, we would soon drag their kids down to our level.) Other sentiment analysis often seems to manage to use massive datasets to produce tendentious conclusions, as if the authors assume that big data just must lead to big insights. The University of Warwick ran its algorithms over the 8 million books in Google’s database published in the UK since 1778, counted the positive words, and gave every year a happiness rating for the UK population, claiming, for instance, that in the twentieth century they were happiest in 1957 and most miserable in 1978. But clearly there is no reason to suppose that the happiness of people who both write books and have the social connections to get them published is the same as that of the population as a whole. And every reason, for instance, to doubt that the landed gentry and factory owners (who wrote books) thought the same about the agricultural and industrial revolutions of the nineteenth century as the masses (who didn’t).

Managers of medium to large clerical and administrative workforces can plausibly anonymise the thousands of e-mails their staff send each other, and look to learn about staff morale. Twitter accounts can be used to predict voting behaviour, or, indeed, estimate how people voted earlier in the day.

*

The nineteenth century philosopher Jeremy Bentham stipulated in his will that his head and skeleton should be preserved as an ‘auto-icon’. They are displayed at University College London. Bentham is best known as the founder of modern utilitarianism, which argues that we should judge ideas and actions by whether they cause the greatest happiness for the greatest number. In his lifetime, he spent much energy advocating his own design for institutional buildings, such as prisons, called the Panopticon. In the buildings, inmates would be housed according to a circular plan, so that a few attendants at a central ‘inspection house’ could efficiently oversee large numbers of prisoners, who would not know at any moment whether they were actually being watched. The result, Bentham predicted, would be that prisoners would always behave as if they were being watched. Bentham never quite managed to arrange for the authorities in any country to take up the physical idea, although after his death many institutions around the world adopted some of the principles. He was also a founder of one of the earliest police forces, the Thames River Police. The Panopticon is a powerful metaphor for urban society today, where activity in most public spaces in big cities is unobtrusively watched and recorded, via closed circuit television. Research suggests this is one reason for falling crime rates across the western world.

Mass surveillance by the US National Security Agency (NSA) and its British equivalent the Government Communications Head Quarters (GCHQ) is not new as a concept. Novels and movies have assumed for decades that it is possible to track characters like Jason Bourne in real time across the globe. As so often, fiction preceded science, and in this case it seems to have helped us become accustomed to a shocking concept many years before it was feasible. Now that this technology is largely real, most people seem to be indifferent to it (except when we use it to track a lost phone, or a wandering child, when we feel pleased).

The digital age was born in conflict. The original code-breaking machines of Bletchley Park were the forerunners of modern digital computers. The radar-guided artillery of the last days of the Second World War was grounded in cybernetics. Today’s conflicts have given rise to new digital capabilities. These capabilities, in turn, pose significant challenges. Governments — at present, largely western governments — now have drones that acquire targets after consulting their databases. The order to kill is given by humans, who work within a framework of law and legitimacy — but a framework that is secret. Terrorism may well be a material threat to our way of life. So far, it has been much less lethal in the western world than the motorcar, and were that to continue, perhaps western governments may find it harder to justify expensive anti-terror measures. The scene will change forever if a terror group ever sets off some kind of weapon of mass destruction.

The digital ape needs urgently to debate and define the reasonable boundaries for the collection and analysis of information by government agencies in the age of terror. Restraints and accountability are essential. It is absolutely clear that if hyper-complex liberal republics are to work, they will still need police forces and clandestine security services. They will need to monitor us, although a better phrase would be, we will need to monitor us. This is not because there is the slightest prospect of a communist insurrection in Milwaukee, or that agents of a future Islamic Caliphate will one day soon overwhelm the stockbrokers of the English home counties. It is because the mayhem and pain that tiny groups or individuals with grievances can cause has increased dramatically in the past decades, and will continue to do so.

Moore’s Law means that even obscure governments with moderate means a long way away from the great decision capitals of the world will soon have the processing muscle and memory to compete with NSA and GCHQ. That new world has already become a Panopticon in which we trade our our privacy for security. Processing power has become yet another weapon, and we badly need conventions that curb the continued weaponisation of the digital realm.

*

‘The past is a foreign country; they do things differently there’, wrote L. P. Hartley in The Go-Between. Like all foreign countries, it is easily reached now. Photographs of ourselves doing things differently haunt us online. So, too, do pictures or videos of departed relatives, of former friends we argued with, and of damp houses and dismal haircuts. Indeed, more photos were taken per day in 2016 than were taken per year in 1960, and 2016 will rapidly seem a long while ago. Letters take time to write, can be regretted and torn up before they reach the post box, or can be discarded by the recipient. E-mails and texts are quick to compose, sent instantly, and are then immortal. Grief is essential to humans: a world without it would be a travesty of our values. But grief is also a transition, a difficult and painful journey between steadier states. Perhaps Queen Victoria, in a world of royal privilege, never did recover from the early death of her beloved Prince Albert. Fortunately, everybody else did, and nineteenth-century Britain steamed ahead. Perhaps when we are able to live with an electronic version of a departed loved one, a posthumous companion forever, or for a while, grief will change some of its qualities. Contemporary humans must devise new ways to forget, and enforce historic privacy, if they are to also move on with their lives after setbacks.

A study by Nominet, which manages the register of internet domains, showed that in 2016 young parents posted an average of 1500 images of their children on social media by the time a child starts primary or grade school. The context of the study is interesting. Nominet wanted to know about parents’ awareness of privacy, and commissioned The Parent Zone to undertake a study called ‘Share with Care’, of 2000 parents with children under 13. It showed, perhaps unsurprisingly in the light of the Cambridge Analytica scandal, that many parents have little or no grasp of how privacy settings on their favourite social media site work. Nominet says:

The study found that 85% of parents last reviewed their social media privacy settings over a year ago and only 10% are very confident in managing them. In fact, half of the parents said they understood only the basics when it comes to managing the privacy settings of their most used social network while 39% are unsure on how to do so.

After testing parents’ knowledge of Facebook’s privacy settings in a ten question quiz, 24% of parents answered all of the true/false questions incorrectly. The questions which caused particular confusion amongst parents included:

If you post a photo and tag other people in it, strangers could see it even if you’ve only allowed it to be viewed by your friends. The answer is true but 79% of parents answered incorrectly or didn’t know the answer

You can set individual privacy settings for each photo album you have. The answer is true but 71% of parents answered incorrectly or didn’t know the answer

It is possible for people that aren’t on Facebook to see your profile picture and cover photo. The answer is true but 65% of parents answered incorrectly or didn’t know the answer

‘Parents “Oversharing” Family Photos Online, But Lack Basic Privacy Know-How’, Nominet, 5 September 2016

They also found that parents had an average of 295 friends on Facebook. Many admitted that half were not real friends.

We leave digital fingerprints all over the web. Perhaps DNA might be a more modern metaphor? Facebook and LinkedIn are new kinds of hybrid space — semi-private, semi-public. Future potential spouses see silly pictures on Facebook pages; future possible employers see incautious comments on the Twitter accounts of hopeful job applicants. Full career CVs are written up on LinkedIn, and opened to wide networks of people who we feel proud to have made contact with, but may not actually know in any ordinary sense of the word. All, in practice, are the polar opposite of privacy or anonymity.

There are some private things the web does better, such as finding a life companion, or a partner for casual sex. The web also makes it easier for researchers to pose and answer questions about the nature of our social interaction, since the transactions are all online, readable, and measurable.

*

There is no contradiction between the desire to live in a society that is open and secure, and the desire to protect privacy. Open and private apply to different content, handled in appropriately different ways. One of the authors of this book, Nigel Shadbolt, along with Tim Berners-Lee, is researching new forms of decentralised architectures that present a different way of managing personal information than the monolithic platforms of Google, Amazon, eBay, and Facebook.

The present authors hope we are at the start of a personal asset revolution, in which our personal data, held by government agencies, banks, and businesses, will be returned to us to store and manage as we think fit. (Which may, of course, be to ask a trusted friend or professional — a new branch of the legal profession perhaps — to manage it for us.) Several companies have practical designs that offer each individual their own data account, on a cloud independent of any agency or commercial organisation. The data would be unreadable as a whole to anyone other than the individual owner, who would allow other people access to selected parts of it, at their own discretion. The owners will be able to choose how much they tell (say) Walmart or Tesco about themselves. One person might choose the minimum — here is my address to send the groceries to, here is access via a blind gate to my credit card. Another might agree that in return for membership goodies, it is okay for the company to track previous purchases, as long as it provides helpful information in return.

This has radical implications for public services, to begin with. Health, central government, state, and local authority databases are already huge. They contain massive overlaps, not least because of the Orwellian implications of building one huge public database, which have made the public and politicians very wary. The personal data model is one way to produce a viable alternative. There are obviously problems: how would welfare benefit claims be processed, if the data were held by the claimant, not by the benefit administrators? How would parking permits work? School admissions? Passports? We are certain these are solvable problems. There are real gains to be made if citizens hold their own data and huge organisations don’t. The balance of power, always grossly in the big guy’s favour, tilts at least somewhat in each case towards the little guy. A lot of those small movements, added up over a lot of people, can transform the relationship.

There are encouraging signs here. Some government departments seem to be up for it in principle and doing a small amount in practice. The public sector will start to do it where there are advantages for the politicians, bureaucrats, or organisations involved. If a team of bright civil servants proves it’s cheaper for the state to let citizens hold their own tax records — because that way the citizens pay for the server time, the checking, the administration — a shine will rightly be added to the careers of those bureaucrats.

It seems to us that this requires very senior political leadership from the start, and probably regulation at the end. The key, though, and where legislation may well be decisive, will be in the private sector, where widespread, perhaps wholesale, adoption would be needed, on which the public sector would, in part, piggyback. Citizens concerned about data rights are unlikely to take to the streets in their millions. There are constitutional, democratic balance of power gains for citizens who manage their own public sector data, and we strongly advocate them, but there are cash gains and exciting new applications for the same people in their role as consumers. Mass opting in to that will drive the change, if it is fostered by the powers that be.

How likely is it that this can be achieved? Indulge for the moment a contrary metaphor. The World Wide Web raced like a brush-fire across the internet in the 1990s. One of the many reasons was the simplicity of hyperlinks. In the early days of the web version of these, the convention arose that they would be underlined and coloured. Tim Berners-Lee doesn’t remember who choose blue. But that colour emerged and stuck: like any successful mutation, it outlived other mutations, met the challenges of, therefore became an integral part of, its environment. Partly perhaps because the colour choice was a happy one. Mild colour blindness is common, and affects perception of red and green much more often than blue. Stand-alone links are, these days, predominantly, but by no means exclusively, blue, everywhere. Practically every designer will use blue hyperlinks, much of the time. The main brands of word-processing software, if asked to create a hyperlink, will underline it and colour it blue.

And yet there is no law or regulation in any territory on earth requiring hyperlinks to be blue. UK zoning and licensing laws specify the font size of text in statutory notices informing neighbours about late night drinking and loft extensions. President Obama signed the Plain Writing Act into law in 2010, requiring government agencies to use clear and easy styles of expression in public documents. Germany, Sweden, Japan, and others have rules about names babies can be given, often using an official list. Myriad countries have laws about which language must be used in which social context: French sociolinguist Jacques Leclerc has collected over 470 such laws from around the world, including the requirement that business is conducted in French in his own country’s workplaces. Still nobody anywhere legislates the colour of hyperlinks, and still they are mostly blue. The logic is plain. A useful convention emerged. If a web page author wants their link to be noticed, they do it that way. Most designers want their link to be noticed; so they use the convention; so it becomes more established and a yet more effective signal. If the intended style of their page is different, and they don’t want their link to shout at the reader, or prefer to use pink radio buttons or click-on photographs or whatever, they are free to do so, and they do.

This is one of the many freedoms at the core of the success of the web, and a powerful neat metaphor for those freedoms. It may, however, not be the best model for a struggle to wrest our data back from big corporations and Big Brother generally, who on the face of it, unlike web page designers, have big incentives not to cooperate. One of Berners-Lee’s many current ventures, arguably one of the most important, is the Solid project at MIT, which is constructing software to allow just the essential separation of our personal data from the apps and the servers that capture it that we argue for above.

With Solid, you decide yourself where your data lives — on your phone, on a server at work, on the cloud somewhere. Friends can look after it for you. At present, your book preference data may be with Amazon, your music preferences with iTunes, your Facebook friends hang out at a club owned by Mr Zuckerberg etc. Solid aims to keep all these descriptor keys to your life in the one place of your choice. (You will also want to keep a copy of it in another place of your choice. Turn it all into colour barcodes and make a rainbow poster for your kitchen wall. It belongs to you, back it up how you like.) Apps will need your permission before using your data, and it may for a while be the new normal to refuse to let them, just because we can. Other parallel developers want to enable you to charge a tiny sum every time your data is used to advertise to you. At present, the big corporations do this on your behalf then pocket the value of your interest. They sell your eyes to third, fourth, and fifth parties.

In theory, the Solid platform could be used not merely to let you personally hold, for instance, your health and education data, a useful step in itself with which many health care systems are beginning to experiment. (Hard-pressed doctors drop your case notes on the floor; hard-pressed administrators click send to Queensland rather than send to Queens, or the delete button instead of save. Nobody cares more about my health than me and my family. So let me take care of at least one copy of the records.) More radically, in principle Solid or similar platforms could also hold all the information the government has about a citizen. Yes, yes, the spooks and cops want to keep their own files about terrorists and not discuss the morals of data retention much with the lucky names on the list, and the present authors are, as we emphasise elsewhere, perfectly happy with that. (We’d like more public discussion of how that data is compiled, the categories and facets of behaviour regarded as suspicious.) Yes, yes, data might often need to be compiled in a way that, whilst held by you, could not be amended or destroyed by you. Conceptually easy, surely? It would need to include, in effect, a coded version which verified the open. Therefore, we would need to trust government as to what the code contained. Quite, but nothing like the present level of trusting them that unseen stuff on unseen servers collated by lord knows who is accurate. National Insurance records, driving licences, benefit and council tax payment history, in the UK; credit records the world over; could perfectly easily be portable, as long as, to repeat, the carrier isn’t able to alter or destroy key parts of them.

Clearly the same must apply to court records, prison sentences, and fines. We surely all actually prefer, whatever our views on penal reform, that somebody has a list of who is supposed to be in gaol today, and takes a roll call every day. Car registration plates perform many useful functions — reduction of bad driving and theft of vehicles — which require there to be unimpeded access to records. If we want people to pay the tax they owe, we need some system of collecting it, and some way of knowing collectively that we have done so. Imagination will be needed to turn all these into data stores held by individuals. The central requirement being, if, for instance, you own a car, that fact and details of your car must be in your data store, whether you like it or not; and authorised agencies must be able to look simultaneously at everyone’s store, to find a car they are interested in, and must be able to do it without you knowing. (Only sometimes — they can tell me they are doing their monthly check for who has paid car excise duty, and they and I will be happy about that.)

Let’s suppose then that Solid emerges as the brand leader here, propelled by Tim Berners-Lee’s wisdom, reputation, and technical excellence. (Many even more worthy leaps forward for mankind than this one have failed utterly, but let’s suppose it.) Solid stands for Social Linked Data. It aims to be a platform that you choose to use, at home and in your business, and for which developers will write new programs, replacing or supplementing many that most of us now use. Solid, to quote its possibly biased parents, is:

Customizable and easy to scale, relying as much as possible on existing web standards. Like multiuser application, applications on Solid talk to each other through a shared filesystem, and [through the magic of linked data] that file system is the World Wide Web.

‘Right now we have the worst of both worlds, in which people not only cannot control their data, but also can’t really use it, due to it being spread across a number of silo-ed websites,’ says Berners-Lee. ‘Our goal is to develop a web architecture that gives users ownership over their data, including the freedom to switch to new applications in search of better features, pricing, and policies.’

‘Web Inventor Tim Berners-Lee’s Next Project: A Platform that Gives Users Control of Their Data’, MIT’s Computer Science and Artificial Intelligence Laboratory website, 2 November 2015

Solid won’t want legal ownership over its users’ data, like Dropbox or Google Drive claim to possess. It will have the capacity to connect easily to many other applications, but the user will always decide, and know when they are deciding, what parts of their data can be shared.

At a guess, were Solid to be widely successful, it would be because existing big players produced seamless Solid versions of their well-known stuff. How will big corporations with a lot of money at stake behave in the face of the threat? Which is not, we truly feel, an existential one for the enormous players, who have multiple revenue streams.

We would need to know that Amazon, to concentrate for the moment on that very visible and totemic player, when they asked permission to use our data, were not then simply copying it, keeping it, selling it on, no doubt authorised by some murky white small print on a murky grey background on page 11 of an agreement that flashed up at a carefully calculated inconvenient moment. Much of the picture of you that Amazon reflects back to you — we’ve been thinking about your taste in books and here are some new ones we thought you’d like — is in fact done at the moment of the transaction, in real time. Your web browser says hello to their site, their site digs out what it ‘knows’ about you. So that aspect of your Amazon account, derived from your history with them, only anyway functionally exists either for you or them when you are online. If, rather than letting Amazon build server farms to house it, you insist on keeping it yourself (perhaps on a server ultimately rented from … Amazon) they might not care much. Their website, instead of going to its own store, would knock on the door of yours, yours would ask you if Amazon can doff its cap, wipe its feet, and wander round not touching anything except when it’s told to. You would say yes. Amazon would take only what it needs. There would need to be a way to search its pockets as it leaves again.

Equally, Amazon at present proactively sends you e-mails about goods they tell you they think you might want to buy and they definitely do want you to buy. One likes to imagine that an intelligent, hard-working, subtle, and experienced librarian, dedicated to the spread of knowledge and entertainment, who sits next to the garden window in one of the 14 buildings at Amazon’s South Lakes Union campus in Seattle, notices that Haruki Murakami has written a new small book about his love of jazz. She wonders who would most like to have that over the holiday and … your name springs to her mind!

But one knows she is a set of algorithms coded into silicon in a metal box, perhaps in Norway today, perhaps in Alaska tomorrow, engaged in varieties of collaborative filtering.

It must be possible, as a functional alternative, to invent a Solid-based send-me-promotions app, which works like this: Amazon has your e-mail address and you are happy with that. Amazon then e-mails you asking if you would like some recommendations today. The Solid send-me-promotions app is set to say yes automatically to Amazon, or yes but only books and banjo strings, or yes but it has to ask you first. When the thumbs up comes through, your App shows the list of Amazon purchases history lodged with you to them, and they finally ram the history of your expressed preferences into their recommendation engine. This sounds like 16 steps of palaver, but it’s not really a much more complex handshake than many internet transactions, and could be made seamless in practice.

There might be an arms-length Solid-squared version, in which The Utterly Ethical Not-For-Profit Run By Archangels holds a list of millions of e-mail addresses in a protected Solid account, and you have named Amazon to TUEN-F-PRBA as one of the organisations you don’t mind receiving e-mails from, for promotional purposes. Amazon has to connect to them, pass them a generic e-mail for TUEN-F-PRBA to send out to millions on their behalf, or accept a temporary list before they even get to connecting to you. Does that make you feel safer? If the NSA and GCHQ are worth what we pay them — to repeat, the present authors hope they are — they will have the lists in two shakes of a lamb’s backdoor API. But they have access to all e-mail in the present anyway. And perhaps it’s not them we’re afraid of.

To take another aspect, Amazon might equally say, of course we will delete some of your personal details, your credit card number for instance, if that is what you want. But our record of transactions with you goes back nearly 20 years now, and that commercial history belongs to us, in fact in some jurisdictions we are required to keep records for a long time, for tax and other purposes. By the way, we are Americans and proud to uphold American values, but our data-storing arms-length satellite company is based in the gloomiest corner of darkest Peru, where the laws are different. Sorry.

Alternatively, the game might all go sideways early on. One — only one — of the strong reasons why the sensible citizen should like Solid is that it would add power to their elbow as a consumer, in the face of powerful producers and sellers. That list of book and washing machine purchases, which can be used to fine-tune analyses of other things you might like, by reference to big catalogues of new and existing products, is of immense value to outfits who want to sell stuff, and indeed to outfits thinking up new products. Amazon love the fact that they own it, and can use it to be your best friend. The last thing they want is for the ordinary civilian to be able show their history to Barnes and Noble, or Walmart, via a Solid app. When the consumer owns the lists from all of the people who supply them with stuff, somebody is going to build a nice little product called WhatYouLike on the Solid platform that collates all of them (or uses Solid’s built-in collate software), does the recommendations, and tells the customer which retailer has the best price at the moment. WhatYouLike will cream a couple of dollar points from each eventual purchase. Another absurdly young billionaire created, perhaps. This one will have done every customer in the world a big favour. A whole lot of stuff that economic textbooks fancifully claim about consumer power in free markets will have come true.

Nothing in law in any state entitles anyone to an Amazon account. In well brought up countries they rightly could not refuse to trade with somebody on the grounds of their race or creed. Be fair, that’s just not the kind of people Amazon are anyway, they genuinely welcome every flavour of customer or employee, all over the variegated world. But any enterprise anywhere can set out reasonable terms and conditions for how it wants to go about its business, and invite the public to take it or leave it.

So Amazon might say, ah yes, Solid, we’ve heard of that. Very intelligent and imaginative. Loved Tim Berners-Lee’s web thing — currently making revenue of $100 billion a year out of that one. Truly grateful to the chap. Solid, not so much. If you want to buy books, beans, and bicycles from us, you need an old-fashioned account, sorry pal, that’s how our business works.

Would that not look like the safety play to them? Disintermediation is an ugly word and an unreliable ally. There simply is, at the time of writing, no Orinoco, Volga, Yukon ready to flow seamlessly into our lives if Amazon make a mistake about access to their website. A smart young woman could set up a portal with one of those names, even do some fancy coding, squat half a dozen scruffy friends in a garage, borrow a couple of hundred thousand dollars, build a website that claimed to sell the same range of products as Amazon. Fat lot of good that would be. Amazon, perhaps the great commercial success of the web era, is not, in point of fact, a web company. Facebook is a web company, eBay is a web company, Google is a web company. (But they’ll feel pretty physical if you wear a white shirt with the sun behind you in front of that driverless car.) Amazon lives in the real, not the virtual, world. As of October 2017, Amazon had 295 facilities (warehouses, fulfilment centres, hubs, and co.) in the continental United States alone, and 482 fulfilment centres worldwide, including 45 in India, 17 in China, and 14 in Japan, with many dozens in Europe. They were 177 million square feet in total, with 25 million more square feet actively planned. About seven square miles, in which 350,000 people work. No bunch of kids in a garage can pull the rug out from under Amazon, in the way that Jeff Bezos pulled the rug out from under the bookshop and other trades. Because actually he didn’t. The new wave of web technology did, and he, brilliantly, caught the electronic ride, and used it to build a very grounded empire.

So what should frighten Amazon is not imitator kids in garages, per se. What they should watch out for is the next new wave. Each thing that looks like a new wave they need to either freeze out or co-opt. Why would they co-opt the very new wave Solid, rather than informally combine with other big retailers and social media platforms to squash it?

*

Facebook and other social network sites could, in part, work much the same way as we say Amazon could. When the consumer arrives at their site, the local Solid app gives the Facebook website access to information equivalent to that which populates that user’s Facebook page, at present called up from Facebook’s server. Okay.

Facebook, like Amazon, will not be pleased for crowds of ‘Facebook friends’ to be available to other programs. Negotiable, but a real difficulty.

Perhaps a more fundamental issue: Facebook sells advertising, which appears on the same screen as those friends’ funny photos. Screen acreage on mobile phones is now more lucrative for them than desktops and tablets, but that’s only relative. In 2016, Facebook’s revenue was about $14 per year on average for each user. Actually, if we just take for granted for a moment the technical business proposition that the $14 could be harvested by the user instead, and even go a step further to the moral proposition that therefore it should belong to the user, it is, nevertheless, what surely every user would regard as a terrific deal. Facebook does add immensely to people’s lives. (In the only opinion that matters here, their own.) Way more than $14 per year worth of addition. Nevertheless, that $14 per year is tiny right up until the moment you remember that Facebook have over 2 billion users worldwide, so that would be around $30 billion per year revenue. Which, to come to the point, they will want to protect when Solid comes knocking.

And Google? There actually are other search engines. They are just simply nowhere near as good, for general purposes. (Specialist so-called vertical search engines, dedicated to particular purposes, helping lawyers to find relevant case law for instance, can be very useful.) Bing’s first page is, by strategic intention, much more beautiful than Google’s, a new gorgeous photograph every day. Worth visiting the site just for the picture. Then go to Google to search for stuff. Google has an extraordinary capacity to surface the right stuff in response to our search requests. Did we mention that Google sells ads by the many million, too? We did. Along with their other very smart plays, they built two clever and lightening quick advertising models, Google AdWords and Google AdSense. In the first, they auction the search terms every time a search is made by a Google user. In the second, website owners make money by displaying Google ads. Their revenue from them is $90 billion per year, perhaps three times greater than Facebook’s.

Facebook, to concentrate on them, use a large range of techniques to present these ads to the user, to measure how effective they are, and to bill advertisers accurately. The latter, of course, are concerned to ensure that they are only billed for parts of pages that were truly present in front of real eyeballs. These techniques inevitably involve user information of sorts passing backwards and forwards between user, Facebook, and advertiser, including specialist cookies, and devices to ensure that robots are not being used to cheat the system. They also involve Facebook owning big heaps of detailed information about users in general, and also about every individual in particular. All this data is processed offline, on Facebook’s private machines, as well as, perhaps more than, at the particular moment the user is online.

Some part of this would work perfectly well with anonymised data, at a guess. No doubt somebody could build a Solid app — MeNotMe — that looked at the data in an individual’s private stockpile and stripped out of it personal identifiers, to various degrees, and then passed it, on specific request and explicit permission from the owner, to external sites who ‘need’ it. So Facebook might say, to make our business work we need second-degree information (or whatever the category might be called) to transfer to us. This means, our back office will know that (say) a white woman, aged 25, who lives in rural California, did x, y, z on the site, but that woman will not in any way be personally identifiable; if you are happy with this, on we go. If not, we’re afraid you need to find another social network site.

And if current practice is anything to go by, said white woman, aged 25, won’t even be aware that she just had that conversation with Facebook. It will have been disguised in the small print, one way or another. Max Van Kleek of Oxford University is one of many attacking this problem from different angles:

Our work on Data Terms of Use (DToU) explores the idea [of] giving people control of information in new ways. First, DToU lets people specify how particular pieces of information they post (e.g. tweets, instagrams) get experienced by their audiences. To enable people to control how their info is stored and retained, DToU lets them also state requirements and constraints on the methods of storage and preservation (including by whom, geopolitical constraints and so forth). Finally DToU also enables people to specify how they want to be kept informed about the use and handling of their information item, otherwise known as the provenance trail(s) left by them as they are used and passed from one person to the next.

‘Everyday Surveillance – The 3rd Privacy, Identity & Data Protection Day 2016 Event’, University of Southampton website

If it comes to a stand-off, Facebook doesn’t have Amazon’s massive real world infrastructure as leverage against the insurgent woman in the garage just building a competitor site. No workforce in fulfilment centres, delivery vehicles, and store-based collection points the world over. It does have, however, its very strong first mover position: the huge historical investment, in small proportion by them, in great proportion by its users, whose friendship history, connections, timelines and networks are on Facebook servers. To repeat, it is possible that they might be party to a deal which negotiates away their monopoly possession of that history, but it scarcely seems likely to be an easy deal.

Credit rating agencies make a lot of money for their owners and operators. The three largest worldwide are Experian, Equifax, and TransUnion. (Callcredit is third largest in the UK). Experian has revenue of $4.8 billion a year. They make this kind of cash by giving scores to the creditworthiness of individuals, their accumu-lated history of paying their mortgage, loans, and utility bills on time. That information is collected widely across the finance industry. (Having no debts doesn’t get you a high score; having lots of debt and paying on time does.) They are happy for you to see and check your rating, challenge it to make it more accurate. Accuracy is what they are selling: the more accurate their data, the more their customers can rely on it, the more they can charge for it. Perfectly possible to build an Experian Solid app which sits on the individual borrowers’ own devices, and contains their data about you. Correctable only by you. (See above.) Experian would be highly unlikely to cooperate if the citizen could then pass the data on to possible new lenders, give it to other companies, with no Experian fee. And again, like Facebook, they have a huge pile of historic data, which makes life very difficult for a new entrant to the market.

An anti-Experian open source app would be tricky, but not impossible, to implement. Consumer groups could perhaps persuade banks and finance houses, the people who sell debt information in the first place, which they are able to do because they sell the actual debt before that, to provide the initial feed to individuals, exactly what they send to Experian and their rivals, to keep on their own Solid app. It would have to be an odd app, one which would show a red flag, perhaps close itself down, unless the user allowed data from all qualified sources to be inserted into it. No point in a creditworthiness device from which the user can exclude embarrassing mistakes. An independent, at least quasi- or near- governmental, agency would need to check which banking sources were able to insert information.

Now the reality is that practically every ordinary civilian allows their devices, and the programs and information on them, to be updated by processes they don’t understand, with data that is a mystery to them, and can only control in a yes/no fashion. Best guess, therefore, most civilians would be happy for changes to their credit status to update their app. The resulting picture of the individual would be much more transparent to them. A very small proportion of people take the trouble to track down their credit records, unless they are in serious trouble with money. Nevertheless, many might prefer this model to the one in which big secretive agencies own everybody’s creditworthiness and sell it. In principle, the price of loans would fall slightly, since the lenders would no longer have to pay the big agencies. Even, in principle, individuals could charge a (tiny) fee to bodies wanting to collate credit information about wide populations.

*

One further aspect should be noted. The ownership of the cloud estate, all that immense quantity of hardware consuming massive energy and holding vast quantities of data, is in the hands of a few private corporations and governments. Oddly, though, the total memory and processing power in all that cloud is small compared to the memory and processing power in the notebook computers and smartphones of the general population. And that presents one of the most attractive alternative futures. Instead of power and information being corralled by the big beasts at centres they own, it could be dispersed across all of us, distributed, at the edge. This idea, sometimes called the fog rather than the cloud, is technically feasible. Data stores could be built in to residential and office buildings, coordinated with a goodly proportion of the existing devices. On top of the political and social gains, there are at least two practical advantages. First, there are so many trillions of less than microscopically small transactions in every use of smart machinery, that paradoxically electricity actually has to travel such long distances in the process that the speed of light becomes a constraint on how quickly they can be made. Making the building blocks of storage and functions smaller and local speeds them up. Reduces latency. Second, if everything is in the same place, everything is as vulnerable as that one place. Many local authorities and large corporations in the UK keep back-ups of all their information, and process local taxes and payroll, with the Northgate data organisation, who, up until 2005, kept their servers in a clean modern office facility in Buncefield, Hertfordshire. Very sensible all round. Until early one Sunday morning, when the oil storage depot next door blew up, in the largest explosion heard in Europe in the then 60 years since the end of the Second World War, measuring 2.4 on the Richter scale. The Northgate facility disappeared. The timing luckily prevented serious casualties; Northgate recovered rapidly and thoroughly with textbook resilience. But common sense argues for dispersion of important data infrastructure to as wide a geography as possible. The coherence with the Solid idea is obvious.

There are hopeful straws in the wind here. MasterCard, not well-known for their insurgent bolshevism, have donated a million dollars to MIT to help develop Solid. The present authors strongly suspect that Solid will be adopted by a relatively small number of tech-savvy citizens, who will begin to strongly advocate it, and governments will then need to step in, and make the commercial world tolerate it, or better versions or rivals like it. Unlike those blue hyperlinks, this is a step forward which will only happen with state intervention.