9

THE DIGITAL REVOLUTION

The transformation of the field of trade publishing was a process driven above all by social and economic factors, by actors and organizations pursuing their aims, responding to changing circumstances and taking advantage of new opportunities in the competitive field of Anglo-American trade publishing. But interlaced with this transformation and contributing to it was a technological revolution that first began to make itself felt in the book publishing industry in the mid-1980s and became a source of increasing speculation and concern from the early 1990s on. By then the digital revolution had already convulsed the music industry and seemed set to cause similar disruption in other sectors of the creative industries. The rapid growth of the internet from the mid-1990s on served only to heighten speculation. By the late 1990s many publishers were pouring millions of dollars into electronic publishing projects of various kinds and venture capitalists were launching new companies aimed at digitizing book content and making it available in a variety of formats. ‘The digital future’ became the theme of countless conferences, the subject of innumerable articles and a key topic of conversation in the boardrooms of the publishing houses themselves. Bower and Christensen’s 1995 article on ‘Disruptive Technologies’¹ had warned bluntly of the dangers faced by leading companies if they resisted new technologies on the grounds that mainstream customers didn’t want them and projected profit margins weren’t big enough: by failing to act they would run the risk of leaving the field to smaller companies who would be able to create a market by experimenting with new products and put themselves in pole position if and when the new products eventually took off. The message was not lost on corporate chiefs. The large publishing houses were scrambling to be at the forefront of a technological revolution that seemed to many to be inevitable.

The conviction that the publishing industry was on the edge of fundamental change was strengthened by the reports of management consultancy firms in the late 1990s and early 2000s, many of which were predicting that ebooks would quickly become a substantial and growing part of the market. Among the most frequently cited was a report published in 2000 by PricewaterhouseCoopers which forecast an explosion of consumer spending on electronic books, estimating that by 2004 consumer spending on ebooks would reach $5.4 billion and would comprise 17 per cent of the market. A study by Arthur Andersen, commissioned by the Association of American Publishers and published in the same year, predicted that the ebook market would be anywhere from $2.3 billion to $3.4 billion by 2005 and would represent up to 10 per cent of the consumer book market. Expectations were also raised by the startling success of one of Stephen King’s early experiments with electronic publishing. In March 2000 he published his 66-page novella Riding the Bullet electronically, available only as a digital file that could be downloaded for $2.50: there was an overwhelming response, resulting in around 400,000 downloads in the first 24 hours and 600,000 in the first two weeks.

The downs and ups of ebooks

Notwithstanding Stephen King’s good fortune, the predictions made by PricewaterhouseCoopers and others turned out to be overly optimistic, at least in terms of the timescale. Those publishers who were actively experimenting with ebooks invariably found that the levels of uptake in the early 2000s were much lower than the consultants and many others had been projecting. Sales of individual ebooks numbered in the tens, in some cases the hundreds, but were nowhere near the hundreds of thousands, let alone millions, of copies that many had expected. Moreover, the bursting of the dot.com bubble in the late 1990s brought a new mood of scepticism about the internet economy and its capacity to transform traditional business models. From 2001 on, a period of retrenchment set in, and both publishing firms and private investors began to curtail their investments and lower their expectations. Many of the new electronic publishing divisions and ebook programmes – some of which had been launched with much fanfare and at considerable expense only a year or two earlier – were either closed down or radically scaled back. The ebook revolution had stalled and no one knew when it would get underway, or if indeed it ever would.

The mood of uncertainty continued through 2006 and 2007, and the question of ebooks and their role in trade publishing remained a hotly contested issue. Actual sales of ebooks for all the major trade publishers remained very low, both in terms of units and in terms of revenue, and they were showing no signs of significant growth. Based on figures I received from trade houses at the time, I estimate that ebook sales represented around 0.1 per cent of their overall sales in 2006. By the end of 2007 the share had risen to perhaps 0.5 per cent – still an insignificant number, ‘statistically irrelevant’, as a senior manager at one of the large houses put it. Against the background of persistently sluggish sales, opinions were sharply divided about how the ebook revolution, if indeed it was a revolution, was likely to pan out. On one side were the digital advocates who, unfazed by the disappointments of the previous decade, remained firmly convinced that the ebook revolution would happen eventually. On the other side were the digital sceptics who remained attached to the traditional print-on-paper book, valued its materiality and meaningfulness as an object and doubted whether – in the world of trade publishing at any rate – it would ever be replaced in any substantial way by electronic files read on screens. And somewhere in between were the digital agnostics – those who didn’t profess to know one way or the other how the future would unfold and were happy or resigned to adopt a wait-and-see approach, continuing to do what they did pretty much as they always had while the technological revolution took its own course. The differences between these positions were differences in disposition and inclination as much as anything, since there was no factual evidence at that time to settle the matter one way or the other.

The digital advocates could offer half a dozen reasons why ebooks had not yet made much headway: the reading devices were not good enough, the publishers had not made enough of their content available, the prices were too high, etc. But as the technology improves and the prices come down, we will see, they confidently predicted, a dramatic increase in ebook sales. The early proponents of the ebook revolution were right when they said the future was digital; they were just wrong about the timescale. Digital advocates are fond of repeating the technophile’s maxim that people tend to overestimate the impact of technological change in the short term and underestimate it in the long term. Just wait, they say. A new generation of ‘digital natives’ is growing up with computers and iPods and mobile phones and by the time they start reading they will feel perfectly comfortable reading books on screen (if they read books at all). They will not have the same attachment to the print-on-paper book as their parents and grandparents did. The publishing industry is still waiting for its iPod moment but when it comes, everything will change very quickly. Just look at what happened in the music industry: for the digital advocates, the music industry is the future of publishing foretold.²

In the eyes of the digital sceptic, the arguments of the ebook champions were full of holes. They just don’t seem to appreciate, said the sceptics, that the print-on-paper book has certain qualities that are valued by readers and that the ebook can never capture or reproduce. The book is an aesthetically pleasing form, a work of art in its own right with a stylish cover and attractive design which is gratifying to hold, to open and to own. It is also exceptionally user friendly: nothing is easier than turning the pages of a book and reading clear text on white paper. The eyes are not strained and you can move back and forth with ease. It never runs out of batteries, it never freezes up and it doesn’t break if you drop it. A book, moreover, is a social object: it can be shared with others, borrowed and returned, added to a collection, displayed on a shelf, cherished as something valued by its owner and taken as a sign of who they are and what matters to them, a token of their identity. None of this, say the digital sceptics, is offered by an ebook. The ebook is pure content and, as such, it can never reproduce the materiality of the book. In the print-on-paper book the content and the form are inseparable, and it is this unique combination, say the sceptics, that is precisely what is valued by readers.

While digital advocates are inclined to see what happened in the music industry as the writing on the wall for the publishing industry, the digital sceptic can give you a handful of reasons why music is a poor analogy for the book. Most consumers want to listen to short, two- or three-minute songs rather than whole albums, and most albums are simply collections of songs, including many songs that listeners would prefer to skip anyway – not true with the book. The quality of the listening experience with digital music is generally higher than it is with traditional analogue forms of musical reproduction – not true with reading on screen or with any digital reading device, where the quality of the reading experience, even with e-ink technology, is worse. There are real advantages to being able to carry a compact musical library around with you so that you can listen to whatever song or album happens to suit your mood whenever you want, and this simply wasn’t possible in the days of the old vinyl LP, or even the days of the cassette tape – once again, the analogy with the book is poor, since the book is eminently portable, and most people, unless they happen to be busy executives who spend a lot of time on airplanes or people who take inordinately long holidays, have no need to carry a virtual library around with them. For most people most of the time, one book will do.

One senior figure at a large technology firm who works closely with trade publishers, and who has been heavily involved with ebooks and other forms of electronic content delivery for many years, summed up the sceptic’s case in 2009 with elegance and force:

The utility of having a print book in digital format for trade publishing is probably one on a scale from one to five, because all you’re doing is replicating the narrative experience of page turning and linear reading in a digital form. Are you improving the experience for most users? Probably not – you’re probably actually degrading the experience for most users in terms of the resolution, the convenience and everything else. Sure, you can carry 80 books around on this $400 reader, but the number of people I know who require 80 books to be carried around at one time is very small. For narrative, immersive reading, digital readers are a complete waste of everyone’s time. It will never be a big business on the trade side, in my opinion. And if I’m wrong in 20 years, then I’m wrong, but I don’t think I will be. This is not the music business – these aren’t two- or three-minute songs. It’s not the newspaper business – they don’t have a shelf life of a day or an hour. It’s not the magazine business – narrative content you can get through on a subway ride. None of that is the book business on the trade side. That’s why I think print will continue to have a life for the great majority of sales on the trade side of the business.

Faced with these sharply opposing views and the absence of clear-cut evidence to help one decide between them, many people in the industry were happy to take a back seat and wait to see what happened. They may have had their own personal inclinations and leant mildly in one direction or the other, but when asked about the future they were content to remain agnostic. They admitted that they really didn’t know what the future held in store; they may have felt some unease, trepidation even, about a future where so much seemed uncertain, but they were happy to leave others – better placed and better informed, they hoped – to grapple with the issues. Some senior managers on the verge of retirement expressed a mild sense of relief that they would not have to deal with the challenges that would face their successors – ‘I’ll be watching from the sidelines,’ said one CEO who was soon to retire. This was an industry that found itself in the midst of a technological storm, keenly aware that change was taking place all around it but unsure about what its implications were likely to be for its own ways of doing business and its future.

By the autumn of 2009 the decade of confused anticipation was over: when the change began, it happened more quickly and decisively than most commentators had expected. The sales of ebooks had risen sharply, so much so that the digital advocates could feel – now with some empirical backing – that their optimism had been warranted, while the sceptics were quietly eating their words and the digital agnostics were struggling to adjust to an unfamiliar world. What had changed? One word: Kindle. In fact, the change had begun earlier, almost imperceptibly, with the launch of the Sony ebook reader in the US in 2006. This was followed by a small but significant upturn in the sales of ebooks in late 2006 and 2007. When we spoke in early 2008, the manager of the digital division of one major US trade house said that their ebook sales in 2007 had grown by more than 50 per cent over the previous year, and that the increase was based primarily on sales for the Sony reader. However, it was the launch of the Amazon Kindle on 19 November 2007 that really changed the situation. Like the Sony reader, the Kindle used e-ink technology rather than backlit screens, which simulates traditional ink on paper and minimizes battery use; it also used wireless connectivity, free for the user, to enable readers to download ebooks and other content directly from Amazon, making it very easy to buy ebooks. The release of the Kindle was immediately followed by a surge in ebook sales: the same trade house that had seen ebook sales grow by 50 per cent in 2007 now saw its ebook sales leap by 400 per cent in 2008. This was a sudden and dramatic change. Although ebook sales still represented a very small proportion of overall revenues for most New York trade publishers at the end of 2008 – probably just under 1 per cent – the numbers were much higher than they’d ever been and they were growing fast.

The upward surge in ebook sales both continued and accelerated throughout 2009 and 2010. In the view of the manager of the digital division mentioned above, the introduction of the Kindle represented a kind of watershed, ‘a tipping point’, as he put it. ‘In 2006 the question, “Who wants to read a book like this?” was a very open question. I think that question has now been answered.’ He foresaw a snowball effect where there would be more and more devices coming into play in the ebook market and more retailers wanting to sell ebooks, and in fact this is exactly what happened. In November 2009 Barnes & Noble launched the Nook, with e-ink technology and free access via a wireless connection to the Barnes & Noble store, followed a year later by the Nook Color equipped with a 7-inch full-colour LCD screen. In April 2010 Apple launched the first version of its slick, stylish iPad, selling 3 million devices in the first 80 days; by the time the iPad 2 was launched in March 2011, more than 15 million iPads had been sold worldwide. Unlike the Kindle or the Nook, the iPad was a true multipurpose device with a full colour screen which could be used to do most of the things one can do with a laptop or desktop computer; reading books is but one of its many functions. The iPad was soon followed by a plethora of other tablet devices launched by manufacturers seeking to mimic the features of the iPad while undercutting it on price.

The appearance of a new generation of devices that were much more stylish and user friendly than the ebook readers of the early 2000s, coupled with the aggressive promotion of ebooks by major booksellers with large and established clienteles, was the critical convergence of factors that underpinned the dramatic upsurge in ebook sales from 2008 on. The remarkable pattern of growth can be seen from the data collected by the Association of American Publishers and the International Digital Publishing Forum. Together they have gathered data on ebook sales in the US from 12 to 15 trade publishers from 2002 to the present – the results, presented as quarterly totals up to the second quarter of 2011, are shown in table 15 and figure 12.

Table 15 US trade wholesale ebook sales, 2002–2011

The data show that ebook sales were very low and largely static up to the end of 2005, generally hovering around $2 million. They doubled in the first and second quarters of 2006, and by mid-2007 they had quadrupled. Revenues for 2007 as a whole were up 60 per cent on the previous year. The growth in 2008 and 2009 was even more dramatic. Sales doubled between the fourth quarter of 2007 and the fourth quarter of 2008, and revenues for 2008 as a whole were up by nearly 70 per cent on the previous year (less than the 400 per cent reported by the publisher above but impressive nonetheless). The rapid growth continued through 2009 and 2010, with sales reaching nearly $230 million by the first quarter of 2011 and over $450 million in the first half of 2011 – two and a half times the sales recorded in the first half of the previous year and 20 times the sales recorded in the first half of 2008. This is fierce growth.

Figure 12 US trade wholesale ebook sales, 2002–2011

For the large trade publishers in the US, the surge in ebook sales has meant that a growing proportion of their revenues is being accounted for by ebooks rather than traditional print books, whether hardcover or paperback. Although the precise figures vary from house to house, the overall pattern of growth of ebook sales as a percentage of overall revenue looks roughly like figure 13. For most large US trade publishers, ebooks accounted for around 0.1 per cent of overall revenue in 2006 and 0.5 per cent in 2007; in 2008 this grew to around 1 per cent; in 2009 this was up to about 3 per cent; by 2010 it had risen to around 8 per cent and in 2011 the figure is likely to be between 18 and 22 per cent (possibly even higher for some houses). The surge was particularly dramatic in the period around Christmas 2010. ‘The week before Christmas it was 12 per cent of our business, the week after Christmas it was 26 per cent,’ explained the CEO of one large trade house. It had more than doubled in one day – no doubt thanks to the fact that many people had received reading devices as Christmas gifts and wanted to load them up with books.

Figure 13 Ebook sales as a percentage of total revenues of major US trade publishers

The steep rise in the overall proportion of revenues accounted for by ebook sales is striking and significant in itself, but it masks important variations in terms of different categories of books – and, indeed, in terms of different authors and different books. While many commentators had expected the ebook revolution to be driven by businessmen who wanted to carry business books with them and read while travelling, in fact the biggest shift for mainstream trade publishers has not been business books: it has been in the area of commercial fiction, especially genre fiction like romance, science fiction, mystery and thriller. ‘It became clear early on that these categories were just perfect for the ebook format,’ explained Sally, the head of the digital publishing group in one of the large corporations. ‘These are categories where people consume a lot of books and they are always waiting for the next one.’ For fiction as a whole, ebooks were accounting for around 40 per cent of overall sales in this corporation by mid-2011, but in some categories of genre fiction and for some authors the percentages were even higher – 60 per cent for some categories like romance, some authors as high as 80 per cent. The big bestsellers by brand-name authors have also seen a big shift to ebook sales but the percentages tend to be lower. Sally put this down to their increased visibility in the marketplace: ‘The bestsellers have tremendous ebook sales but still overall more print percentage because they get more shelf space, more visibility. I do expect the bestsellers always to have more p-ratio just because of exposure in the marketplace. The books that don’t get the huge shelf space exposure have the higher e-percentages, especially in those categories where people just consume a lot.’

While the shifts are strongest in commercial fiction and especially genre fiction, there has been a strong shift in literary fiction too. Jonathan Franzen’s Freedom – one of the literary highlights of 2010 – sold around 750,000 copies in hardcover and 250,000 ebooks in the US during its first year, so ebooks comprised around 25 per cent of its frontlist sale. While literary fiction lags behind commercial fiction and genre fiction in terms of the shift to digital, the percentages are still high – and much higher in 2011 than they were only a year earlier.

Non-fiction has also seen a significant shift to digital but it generally lags behind fiction, both commercial and literary. ‘While some people expected to see very rapid adoption in business books, we haven’t seen that,’ explained Sally. ‘The consumption patterns are different. These books aren’t being read voraciously on the go, they’re being used more like reference works where it’s more convenient to have them on your shelf. So they’re lagging behind but they’re trending up.’ As with fiction, the non-fiction books that have seen the biggest shift to digital are those with a simple narrative structure that you read from beginning to end, especially at the more commercial end of the market – celebrity biography, popular narrative history, big ideas books, etc. Sally put the ebook percentage for narrative non-fiction in her company at around 20 per cent in mid-2011. Anything more complicated – books that use colour, like art books or children’s books, or books that are used more like a reference work that the reader wants to read slowly over time and dip in and out of – has lagged far behind.

So how are these trends likely to evolve in the coming years? Like all of the large publishing houses, Sally’s company is constantly trying to figure it out, modelling and remodelling the possible scenarios on the basis of the latest sales figures and the breakdowns between print and digital for different categories of books. ‘We just went through this exercise last week,’ explained Sally, ‘looking at the major categories and what we think will happen. Fiction will be high – I think we put in something like 75 per cent’, with even higher percentages for genre fiction. Print will remain an important format in fiction but it will become the junior partner in the print–digital split – at least in the view of Sally and her fellow modellers. ‘I think there will always be physical formats but of fewer titles. The physical book format is here to stay for quite a while, just because of the demographic,’ she continued, though with an emphasis on ‘quite’ that suggested she wasn’t entirely sure in her own mind how long this would last. ‘I think there are always going to be some people who prefer the physical book.’ But Sally and her colleagues are already envisaging a time in the not too distant future when they will be publishing some books in ebook formats only, with no print edition.

Of course, Sally and her colleagues may be wrong about all this: this is an industry in a state of flux, the numbers changing from month to month, and no one – not even the industry insiders – can predict with any confidence how things are likely to evolve in the coming years. What the proportions are likely to be in one or two years’ time, let alone five or ten years’ time, is anyone’s guess. Will ebooks become 30 per cent, 50 per cent, even 90 per cent of publishers’ total sales in the next few years? The truth is, no one knows. Most people have an opinion but no one knows a thing. ‘I wish I could give you wisdom,’ said one CEO in 2011, speaking with unusual frankness, ‘but I have no idea. The consumer will act to define this – it won’t be defined by Amazon or Barnes & Noble or Apple or us. Maybe only 26 per cent of people in America will want to read on devices and maybe we never go up another percentage point. On the other hand, maybe 100 per cent in the next three years switch over to reading devices. We just don’t know. Anyone who tells you anything else is telling you a complete load of shit.’ By its very nature this is an unpredictable trend, dependent on a host of incalculable factors from as yet unknown innovations to the habits and tastes of readers, and there is no way that anyone can know for sure whether the trend will continue to arch upwards or will level off or even decline at some point. Trying to predict the pattern of ebook sales over the next two to three years is like trying to predict the weather in six months’ time.

Regardless of what the exact figures turn out to be, there can be no doubt that we are witnessing major changes that could have profound consequences for the industry as a whole. It is simply too early to say what these consequences will be, but already many are beginning to wonder if the traditional revenue models of trade publishing – which have relied on market segmentation and the temporal phasing or ‘windowing’ of editions that are differentially priced – can be sustained in the face of the ebook surge. If an ebook is available at the same time as the trade hardcover is published and at a significantly lower price, then what consequences will this have not only on the sales of the hardcover, but also on the sales of any subsequent paperback editions? Will this make it harder to publish in trade hardcover in the first instance, cannibalizing the sales of the hardcover to the point where it ceases to be economic, at least for some books and some authors? Will it undermine the sales of a subsequent trade paperback edition, which, apart from appearing a year later than the ebook, might be a dollar or two more expensive? Will it kill off the mass market paperback, the sales of which had already been declining for some while? And what consequences will these changes have, if indeed they occur, for the revenue streams and profitability of the publishing houses themselves? No one knows the answers to these questions but many in the industry are wondering anxiously about them. They are watching, experimenting, carefully scrutinizing the figures day by day and month by month as if they were tea leaves, searching for clues about their future.

It is also too early to say whether the dramatic developments in the US market are the harbinger of things to come elsewhere or will turn out to be another instance of American exceptionalism. Outside of North America, the UK market is probably the most developed in terms of ebook sales, but it still lags well behind the US. The sales manager at one of the large UK trade publishers reported that ebooks represented around 2 per cent of their business in 2010, which is about a quarter of the equivalent US figure. He expected this to rise to about 8 per cent by the end of 2011 and to continue to grow after that, but always lagging behind the US. In other markets and other languages, where reading devices are less prevalent and more expensive, where the infrastructure for the purchase of ebooks – free wireless connectivity to a trusted and well-stocked ebook store – is less developed and where attitudes towards books may be different, one might expect to see a slower and less decisive shift. Whether this is a temporary lag or a sign that there will be different patterns in different languages and countries is unclear – again, there are many opinions but no one knows.

Whatever happens in the coming years, the developments since 2008 have made it perfectly clear to everyone that the publishing industry will not remain untouched by the technological revolution that is sweeping through other sectors of the creative industries. It may not be affected in the same way as the music industry or the newspaper industry, but affected it will be. The surge in ebook sales has made this palpable, but in fact the digital revolution in the publishing industry has been underway for many years. Ebooks are part of a deeper transformation that dates back to the 1980s and that goes to the very heart of the publishing business – what I call ‘the hidden revolution’.³ This is not so much a revolution in the product but rather a revolution in the process. Regardless of what the final product looks like, the process by which it is produced is completely different. Thanks to the hidden revolution, the book had become a digital file by the end of the early 2000s. It was ready to be delivered in whatever format the market demanded, whether this was the traditional print-on-paper book or an ebook to be read on some as-yet uninvented reading device.

The hidden revolution

So what is the hidden revolution in publishing? In order to make sense of this, we need to see that the digital revolution has affected the publishing business in many different ways – we can distinguish four different levels: (1) operating systems; (2) content management and the digital workflow; (3) sales and marketing; and (4) content delivery.

1 Operating systems The most immediate respect in which digitization has affected the publishing industry is in terms of operating systems and information flows. Like many sectors of industry today, the management systems in all major publishing houses are now thoroughly computerized and management information is compiled and circulated in digital forms. Since the mid-1980s most publishing firms have been engaged in a continuous process of investing in the computerization and digitization of their offices and operating systems. They have built or installed back-office publishing systems which store bibliographical and other data on each title and can be accessed by anyone in the organization who can log onto the network. Financial data, sales data, production details and other information are held in dedicated IT systems, and a high proportion of the communication within and between organizations now takes place electronically. Email has become the communication medium of choice. Documents – including proposals and manuscripts – are commonly circulated as electronic files rather than printed texts, a development that has facilitated multiple submissions and auctions.

The development of IT systems has undoubtedly generated efficiencies and has enabled some costs to be eliminated or reduced, but the level of investment required to build and maintain state-of-the-art IT infrastructures is very substantial indeed. While this factor alone has not driven consolidation in the publishing industry, it is not irrelevant to it, because this is one area where small houses can be disadvantaged and large corporations can achieve significant economies of scale.

It is not just the working practices and information flows within firms that have changed: the digital revolution has also led to the digitization of the supply chain. The systems for managing stock and transferring it between different organizations within the supply chain have been computerized. More and more bookstores introduced electronic point of sale (EPOS) facilities to track the sales of individual titles and manage stock flows, and the ordering of stock from wholesalers and publishers was increasingly handled through dedicated electronic ordering services like PubNet, TeleOrdering and First Edition. The use of increasingly automated EDI (Electronic Data Interchange) systems has become a pervasive feature of the supply chain. The development of sophisticated computerized systems for stock management and control has also become an important source of competitive advantage for the large retail chains and for the large wholesalers, like Ingram and Baker & Taylor in the US and Bertrams and Gardners in the UK. By expanding their inventories and developing computer-based information systems that enable them to fulfil orders in one or two days, the large wholesalers can provide a highly efficient, one-stop service to bookstores, who can reorder stock on a daily basis in the light of their computerized point-of-sales data.

2 Content management and the digital workflow The computerization of operating systems is not of course unique to the publishing industry – most sectors of industry have experienced similar transformations during the last two decades. But digitization has the potential to transform the publishing industry much more profoundly than this, precisely because the publishing industry – like many sectors of the creative industries – is concerned fundamentally with symbolic content that can be codified in digital form. Hence the whole process of creating, managing, developing and transforming this content is a process that can in principle be handled in a digital form – from the moment when an author composes a text by typing on the keys of a computer to the final creation of a file in a format that can be used by a printer. A central part of the history of the publishing business since the early 1980s has been the progressive application of the digital revolution to the various stages of the production process, leading to the gradual rise of what we could call ‘the digital workflow’. From the viewpoint of the production process, the book itself has been reconstituted as a digital file – that is, a database. To the production manager, that’s all it is: a file of information that has been manipulated, coded and tagged in certain ways. The reconstitution of the book as a digital file is a crucial part of the hidden revolution.

This process did not take place effortlessly – on the contrary, it was a long, arduous transformation, still continuing today, in which many of the traditional practices of the publishing industry were eclipsed by new ways of doing things. It didn’t simplify things but in practice made them more complex, partly because new procedures had to be invented and partly because the digital world, with its plethora of file types and formats, programming languages, hardwares, softwares and constant upgrades, is in many ways more complicated than the old analogue world of print. Typesetting was one of the first areas to be affected. The old linotype machines, which were the standard means of typesetting in the 1970s and before, were replaced in the 1980s by big IBM mainframe typesetting machines and then, in the 1990s, by desktop publishing. Typesetting costs plummeted: in the 1970s it typically cost $10 a page to get a book typeset from manuscript, whereas by 2000 it was costing between $4 and $5 a page despite the decrease in the value of the dollar produced by two decades of inflation. But for those who lived through the changes, this was a difficult and confusing time. The job of the typesetter was being redefined and lines of responsibility were being blurred. Many of the tasks formerly carried out by typesetters were being thrown back on production staff who were, at the same time, trying to use and adapt to new technologies that were in constant flux.

By the mid-1990s, many of the technical aspects of book production, including typesetting and page design, had been transformed by the application of digital technologies, but there were two areas of the workflow where progress was more erratic: editing and printing. When an editor receives a manuscript from an author, he or she will read the text and comment on it, often suggesting revisions that may range from small stylistic alterations to major structural changes. This process of editorial revision may happen more than once, as the editor works with the author to try to improve the text. Once the manuscript has been accepted by the editor, it will go through another process of editing, commonly known as copy-editing, in which a specialized copy-editor, often working freelance for the publisher, will line-edit the text, correct grammatical or stylistic errors, query anything that is unclear, eliminate repetition, make sure that the bibliography and any other technical aspects are in order and mark up the text for the typesetter. Both the editor and the copy-editor have traditionally worked with a typescript or printed manuscript. There are advantages in working with a printed text – it’s easier to find your way around, to move back and forth between pages and keep track of the changes you’ve made. Many editors and copy-editors would find it difficult to work on screen, especially if a text needs a lot of structural or developmental editing.

There are other problems in trying to work with the electronic files supplied by authors. Often the electronic files contain numerous detailed errors and inconsistencies – for example, not differentiating between the letter O and zero 0, between hyphens and en-dashes, and so on. Errors of this kind have to be picked up and corrected. If the copy-editor (or assistant editor) also has to transfer corrections made on paper to the electronic file sent by the author, the costs involved in doing this begin to outweigh any advantages there might be in working with the author’s disk. ‘I can send the manuscript with mark-up to a compositor in Asia who could double-key it to 99 per cent accuracy and add the tags for page layout for half the price it would cost me to hunt and peck the corrections in Word. So do I pay someone $30 an hour to hunt and peck corrections or do I have it re-keyed, double-keyed with additional functionality, for half the price? Well, throw the disk out and have it re-keyed,’ explained one production manager at a large trade house. So although the author’s keystrokes are in principle the point at which the digital workflow could begin, in practice, at least in trade publishing, the digital workflow usually begins when the copy-edited manuscript is re-keyed by the compositor. In some fields of publishing – in some university presses, for example – procedures have been introduced to clean up authors’ files (‘initial prep’, as some production managers call it) and copy-editors have been encouraged to edit onscreen using the tracking and comment features of Word, but these are not standard practices throughout the industry. Many trade publishers continue to pencil edit on paper and then have the manuscript re-keyed by the compositor, who supplies the publisher with a clean electronic file incorporating the tags for page layout.

The second area where progress has been more erratic is printing. Until the late 1990s, most publishers used traditional offset printing for all of their books. Offset has many advantages: print quality is high, illustrations can be reproduced to a high standard and there are significant economies of scale – the more you print, the lower the unit cost. But there are disadvantages too: most notably, there are significant set-up costs, so it is uneconomic to print small quantities. It was difficult to print fewer than 500 copies, since the unit costs were too high to be practicable. So backlist titles that were selling a few hundred copies or less per year were commonly put out of print by many publishers, and the large trade houses often drew the line much higher. It simply wasn’t economic for them to keep these books in print, taking up space in the warehouse and reprinting in small quantities if and when the stock ran out.

The advent of digital printing changed all that. The basic technology for digital printing has been around since the late 1970s, but it was not until the 1990s that the technology was developed in ways that would enable it to become a serious alternative to the traditional offset presses. With the appearance of Xerox’s DocuTech printer and similar machines from other manufacturers in the early 1990s, the quality, speed and cost per page of digital printers were beginning to reach levels at which they could compete with offset printing on short print runs. The quality was not as high and the reproduction of halftones remained distinctly inferior, but with improvements in technology the quality was getting better and better. By the end of the decade, the reproduction quality for straight text was, to the untrained eye, indistinguishable from offset printing, although there was still a discernable difference in the quality of halftones.

As the technology improved, a number of players entered the field in the late 1990s and early 2000s offering a range of digital printing services to publishers. Of particular significance in this context were two services: short-run digital printing (SRDP) and print on demand (POD). SRDP is simply the use of digital printers to produce small quantities of books – anything from 10 or 20 copies to 300 or 400 copies. It works with the same distribution model as books printed by traditional means: the publisher orders books to be printed, the books are shipped to the publisher’s warehouse where they are held in stock and copies are then distributed through the normal channels in the normal way. The only difference is that the books are printed digitally and the quantities are smaller than they would be if the books were printed by traditional offset. Since the unit cost for digital printing is basically static, there is a point – currently around 400 copies for a normal book – at which the unit costs switch over: below that point it will be cheaper to print digitally, and above that point it will be cheaper to print using a traditional offset press. Thanks to SRDP, publishers now had the full range of print quantities available to them. They could continue to reprint books, and keep them in print and in stock in perpetuity, simply by switching over from offset to digital printing.

SRDP is just a different method of printing, but POD is more than that: it is a fulfilment system that uses digital printing to fulfil a specific order. True POD is the printing of a text in response to the demand of a particular end user. As one of the senior managers involved in setting up Lightning Source – one of the pioneers of POD and still a leader in the field – put it, ‘It used to be print book, sell book. We say no, no. Sell book, print book.’ So, for example, if an individual customer places an order for a book, whether through a traditional bookseller or through an online retailer like Amazon, then the order can be transmitted to the POD supplier, who will print the book, ship it out and, depending on the service offered and which party collects the money, either pay or invoice the publisher. The POD supplier holds a file of the book on its server but does not hold any stock of the book, since the book is printed to meet the demand. Physical stock is replaced by a ‘virtual warehouse’.

By the early 2000s, many publishers in the English-speaking world were using some version of digital printing for their slower-moving backlist titles, whether SRDP or true POD. Those in the fields of academic and professional publishing were among the first to take advantage of these new technologies: many of their books were specialized works that sold in small quantities at high prices, and were therefore well suited to digital printing. Many trade publishers were accustomed to dealing in the larger print quantities for which offset printing is ideal, but they eventually came to realize – in some cases with the help of the long-tail thesis first put forward by Chris Anderson in 2004 ⁴ – that there was value locked up in some older backlist titles that could be captured by using digital print technology. Publishers – academic, professional and trade – began to mine their backlists, looking for older titles for which they still held the copyright, scanning them, turning them into PDFs and re-releasing them as digitally printed books. Books that had been left to die many years before were suddenly brought back to life. It is one of the ironies of the digital revolution that, so far from ushering in the death of the book, one of its most important consequences has been to give the printed book a new lease of life, allowing it to live well beyond the age at which it would have died in the pre-digital world and, indeed, rendering it potentially immortal.

3 Sales and marketing The digital revolution has also had, and continues to have, a profound impact in the areas of sales and marketing. The dramatic rise of Amazon is only the most obvious respect in which the retail environment of book publishing has been transformed by the internet. The significance of Amazon is not to be measured only in terms of its market share as a retailer (itself substantial, and still growing). It also stems from the fact that Amazon and other online booksellers use a retail model that is fundamentally different from that of the traditional bricks-and-mortar bookstore. In the Amazon model, the availability of books to the consumer is no longer tied to the physical availability of the book in the bookstore (or even in the retailer’s warehouse); availability is virtual, not physical, and hence it is not dependent on the prior decision of a buyer to stock the book in the store. Of course, the Amazon model does not entirely eliminate the gatekeeper role of the buyer, since Amazon does hold stock of some books in its warehouses and whether it does or does not hold stock has a big impact on fulfilment times. Nor does it eliminate the role of marketing money in determining the visibility of particular titles, since Amazon has its own forms of co-op advertising and paid-for marketing campaigns. Nevertheless, the Amazon model does introduce something new into the retail space: a consumer offer in which the visibility, availability and sales of a book are less dependent – not wholly independent, but certainly less dependent – on the decisions and interactions of intermediaries in the bookselling chain, in particular on the decisions of sales reps and others in the publishing houses about which books to prioritize when they call on buyers, and on the decisions of buyers about which books to stock and in what quantities, as well as an array of related decisions about how and where to display the books, how much co-op money to put behind them and so on. The weakening of this link is precisely why Amazon has become a mechanism used by buyers and other managers in bricks-and-mortar bookstores to check on the soundness or otherwise of their initial purchasing and stocking decisions.

The digital revolution has also done something else equally important and in some ways more far-reaching in terms of the way the publishing business works: it has, as we’ve seen, turned sales figures into publicly available forms of knowledge. The gathering of accurate sales figures and sales histories, based on point-of-sale data and made available (at a price) to all players in the field, has changed the rules of the game. It is no longer possible to hide, to pretend, to make out that an author’s previous books have been a tremendous success when, in fact, they have not, as anyone can now check the sales figures (within certain limits). Hype still has a crucial role to play in the publishing game, but hype based on inflated figures is largely a thing of the past. The digital revolution has created a kind of transparency in terms of sales figures that simply wasn’t there before. It has also given rise to a kind of tyranny of numbers whose consequences may be less benign (a point to which we shall return).

Just as the digital revolution is transforming the sales environment, so too it is having a profound impact on marketing and the multiple ways in which publishers seek to generate awareness of their books among readers and consumers. The e-marketing revolution in publishing is only just beginning. Many initiatives are already underway and many more are in the pipeline or at various stages of conceptualization and development. In an earlier chapter we saw how some of the large trade publishers are shifting more and more of their marketing resources away from traditional print media and into online marketing activities of various kinds: this trend is set to continue and to take ever more elaborate and varied forms. Increasingly publishers will use the online environment to build direct connections with consumers and to facilitate online interactions between writers and readers – this is happening already, as we saw in chapter 7, and there is every reason to believe that it will continue. For every major trade publisher today, expanding their e-marketing activities and trying to understand how best to use the online environment to reach out to their readers and grow their readership are among their most urgent priorities.

Another aspect of marketing and promotion which has become very important for publishers is what could be called ‘digital sampling’. Of course, it has always been possible to browse books before buying them. The traditional way of doing this was to go into a bookstore and page through the book, perhaps sitting in a corner somewhere and reading a few pages, before deciding whether to buy it. But the online environment makes it possible to dissociate browsing from the turning of printed pages in a bricks-and-mortar bookstore. Publishers can now allow readers to browse a book online – they can see the table of contents, read the blurb on the cover, dip into the text and perhaps even read a chapter or two. Both Amazon and Google provide publishers with programmes of this kind – Amazon’s ‘Search Inside the Book’ and Google’s ‘Book Search’. These programmes are viewed by most publishers as an online shop window, an additional marketing tool that will enable readers to discover their books, find out more about them and encourage them to buy. If you are browsing a book online you are only a click away from buying it: a panel on the screen makes it easy for viewers to buy the book from Amazon and other bookstores or directly from the publisher.

The key issue with digital sampling is where you draw the line between sampling and consumption. You want to allow the reader to have enough exposure to the text to enable them to get a clear sense of the content and, hopefully, decide to buy the book, but you don’t want them to be able to read so much of the text that the decision to purchase becomes redundant. Where do you draw the line? Different publishers draw the line in different places – some say 5 per cent of the book, some say 10 per cent, some say 20 per cent, some say you can read two pages forward and two pages back and then you hit a wall and the rest of the text is blanked out. Both Amazon’s Search Inside the Book and Google’s Book Search use restricted access models of this kind that are agreed contractually with the publisher. Other publishers prefer to vary the sampling model depending on the type of book – for example, in a work of non-fiction or a reference work, episodic sampling may be the best model, whereas in a work of fiction it might be better simply to allow the consumer to read the first chapter or two and nothing else. ‘So for fiction I tend to be even more expansive in allowing for a continuous read of the beginning of the book because I think that’s how you get the best flavour of whatever you’re potentially buying,’ explained a senior manager at one of the large trade houses. ‘With non-fiction and especially reference-oriented non-fiction you have to be a little more careful because people may only want individual pages at individual times. You’ve got to make sure that the models of sampling fit with the content of the book so that you’re delivering enough sampling for someone to know that they want it but not so much that they get everything they wanted without actually having to pay for it.’

4 Content delivery So even before one broaches the issue of content delivery, it is clear that the digital revolution has had, and is continuing to have, a profound impact on the book publishing industry. Anyone who suggests otherwise is simply ill-informed or misinformed. But there is no doubt that it is the fourth level – that of content delivery – where the potential impact of the digital revolution is most profound. There is one basic characteristic of the book that makes this fourth level possible: the content of the book is separable from the form. This is a characteristic that the book shares in common with other products of the media and creative industries – films, music, newspapers, etc. – and is the reason why the impact of the digital revolution in these industries is potentially so much more disruptive than it is in, say, the refrigerator business. In essence, the digitization of content dissociates content and form. It captures content in a way that separates the content from the particular form in which it is, or typically has been, realized; it also captures content in a way that is sufficiently flexible to enable it to be realized, at least in principle, in a multiplicity of other forms. The physical book – the print and paper and binding and glue, the material object of a certain shape and size and weight – is a particular vehicle or form in which this content has been customarily realized for some 500 years, but it is not the only form in which it has been realized in the past, nor is it the only form in which it could be realized in the future. The digitization of content simply highlights a characteristic that was always part of the book but was obscured by the elegant union of content and form in a particular physical object. It brings out, more clearly than was previously the case, that the real value of the book lies in the content that is embedded in the physical form of the book, rather than in the physical form as such – hence the oft-repeated slogan associated with the digital revolution, ‘content is king’.

It is not difficult to see that if book content is delivered to end users in an electronic form rather than in the form of the physical book, it transforms the supply chain and turns the traditional financial model of book publishing on its head. It is no longer necessary to lock up resources in physical books (with the attendant costs of paper, printing and binding), store them in warehouses, ship them to bookstores and wholesalers, accept them as returns if they are not sold and ultimately write them down and pulp them if they turn out to be surplus to requirements. In a world where content was delivered entirely electronically, publishers could bypass most if not all of the intermediaries in the traditional book supply chain and supply content either directly to the end user through their own website or via online intermediaries like Amazon. The costs associated with producing, storing and shipping physical books would be eliminated and the problem of returns would vanish in a click. It’s a seductive vision. It’s hardly surprising that ebooks have had, and continue to have, many champions, including many who work in the publishing industry. So why did it take so long for this vision to gain real traction in the marketplace? Why did the ebook revolution, which sounds so sensible when described from a purely operational point of view, make such slow and erratic progress in the first decade of the twenty-first century?

Among those who work in the ebook business, four reasons are commonly put forward – we alluded to some of these earlier but let’s now examine them in more detail. First and perhaps most importantly, there was the problem of hardware: the early reading devices were expensive, clunky and awkward to use. The screens were small and the resolutions were poor, and many people were reluctant to spend several hundred dollars on dedicated reading devices which might turn out to have a short life. Amazon’s Kindle – with its sleek appearance, its use of e-ink technology to mimic the effect of print on paper and the ability to download books directly via a wireless network without having to use a computer or go online – clearly marked a quantum leap forward in terms of technical design and ease of use, and the new generation of tablet computers, led by the iPad, has integrated the reading of books into the array of activities supported by a stylish, multipurpose device. The initial prices of dedicated reading devices may have put off some potential users – the Kindle was priced at $399 when it was first released in November 2007; but the prices came down quickly and it wasn’t long before the devices began to look very affordable (the third generation of the Kindle, launched in 2010, was being sold for $139). With the launch of Barnes & Noble’s Nook in 2009, the availability of the Sony ebook reader, the Kobo and other reading devices and the flooding of the marketplace with iPads and other tablet computers, the hardware problem was no longer the issue it had been in the early 2000s.

Second, there is the problem of formats: the early days of ebooks were characterized by a bewildering array of proprietary formats which were not interchangeable across different reading devices. This was confusing and off-putting for consumers, who were disinclined to invest in devices that could be rapidly eclipsed by technological change. The problem is comparable to that faced by other technologies in the early stages of development, like the format war between Betamax and VHS in the early development of the VCR, or the format war between Sony’s Blu-ray and Toshiba’s HD-DVD in the development of high definition DVD. This remains a problem in the ebook marketplace, as the most popular reading devices use proprietary file formats that are specific to the device. There have been attempts to create a standardized format for ebooks – first the Open eBook standard, followed by the ePub format that was launched by the International Digital Publishing Forum in 2007; but the ePub format is not supported by all devices (most notably, not as of yet by the Kindle).

Third, there is the problem of rights: there was considerable confusion about who owns the rights to exploit the content of a particular book in an electronic format. Is it the publisher? The author? And what should be paid to whom? Most older contracts between authors and publishers were drawn up at a time when the idea of exploiting content electronically was not even imagined, so there is no provision in the contract to indicate who controls the rights and how revenue is to be distributed. Can publishers assume that they control the electronic rights for earlier books they have published, even though the contract does not explicitly grant them these rights? Or do these rights remain with authors and their agents – and if so, are they free to assign the rights to someone else or even publish the works themselves in a digital format? The issues are rendered even more complex by the problem of embedded rights – that is, the copyright on material that is embedded in the text, such as quotations or illustrations. The publisher may have cleared permission to use this material for the print edition of the book, but can it be assumed that this permission can be transferred to an electronic edition? Or does the publisher have to go back to all of the original rights holders, assuming they can be tracked down, and clear the rights again for the ebook? No one really knew the answers to these questions as they simply hadn’t arisen before, and many publishers were inclined to wait and see what norms might emerge before moving ahead on their own and risking copyright infringement.

Finally, there is the question of price: on the whole, publishers and retailers chose to price ebooks at levels that were roughly comparable to the price of print books, or at most 20 per cent below the price of the prevailing print edition. This was partly experimental, and partly an acknowledgement of the fact that, while there were some savings involved in delivering book content in electronic formats, these savings were not as great as many people assumed – all the acquisition, editorial and development costs were still there, the books still had to be designed and typeset, royalties still had to be paid (and probably at higher levels), there were still the marketing costs and the publisher’s overheads, and then there were the additional costs involved in building and maintaining the IT infrastructure to support ebooks. The costs associated with the production of the physical book – print, paper and binding – are in fact a relatively small proportion of publishers’ costs (though there are of course additional costs associated with warehousing and shipment, as well as the cost of returns). However, this does not go down especially well with consumers, for whom the perceived value of an ebook is significantly lower than that of a print book simply because it lacks the physical traits of the printed book. So pegging ebook prices at the same level (or slightly below) the prices of physical books deterred many consumers from making the transition – or at least this is the view of some.

For many who work in the industry, these four factors go a long way to explaining why ebooks failed to take off as quickly as many had expected; they also explain why, once a variety of high-quality devices were available and prices had come down, the surge in ebook sales was both sudden and strong. There is, however, an important element missing from this account – namely, the crucial role of the trusted intermediary in the supply chain. The fact that, from November 2007 on, ebooks were being actively promoted by Amazon was enormously significant, since millions of readers had grown accustomed to buying their books from Amazon, had given them their credit card details and had come to trust them as a reliable retailer of books. This was just enough to lower the threshold of anxiety that had disinclined so many readers in the past from switching over to ebooks – how do I know that this device will be any good, that it will be worth all the money, that I’ll be able to get the books I want and won’t have to worry about complicated things like formats and compatibility, that the reading experience will be agreeable and that the device and all the books I’ve bought won’t be superseded by something else in six months time? Now, with Amazon providing its own devices, with prices falling and more and more ebooks being made available by one of the most trusted retailers of books in the business, the time for many readers to experiment had come.

It is therefore not surprising that Amazon was able very quickly to establish a dominant position in the emerging market for ebooks. Their aggressive pricing policy undoubtedly helped (more on this below) but their reputation among readers as a trusted retailer of books was probably also a vital factor. This also helps to explain why Barnes & Noble quickly moved into second place when it launched the Nook – contrary to the expectations of many in the industry, who thought the iPad would steal the show. Like Amazon, Barnes & Noble is a trusted retailer of books with a large and well-established clientele, and it was able to leverage its reputation among readers when it launched its reading device. It was also able to showcase and hand-sell its device in the front of stores across the country, which would have greatly helped its campaign too. Although estimates of market share in the rapidly changing ebook marketplace are notoriously unreliable, it seems likely that Amazon had around 90 per cent of the ebook market in the US at the beginning of 2010, but that this had fallen to 55–60 per cent by the middle of 2011. Barnes & Noble experienced strong growth through 2010 and probably had around 20–25 per cent of the ebook market by the middle of 2011 – ‘definitely the strongest number two’, as one industry insider put it. Apple was in third place with around 10–12 per cent of the market and the remaining 8–10 per cent was shared between Sony, Kobo and other smaller players.

Up till now we’ve been analysing the reasons for the erratic progress of ebooks but we have done so in a rather undifferentiated way. We haven’t considered why some forms of content might lend themselves to being made available electronically more readily than others, and why some might be embraced more quickly by users. In the following two sections I shall try to shed some light on these issues by distinguishing more carefully between different forms of content and examining the ways in which new technologies can enable content providers to add real value to it. This will enable us to see that the impact of new technologies on the modes of content delivery is likely to vary from one field of publishing to another depending on a variety of factors, including the nature of the content, how people use it, who is paying for it and how much they are paying, the kind of value that can be added by delivering it electronically and the extent to which this added value is appreciated or valued by users. The idea that there will be an inevitable one-way migration from print to online dissemination is too simple. It may be a compelling idea for those who are inclined to believe that technology is the pacemaker of social change, but the world is often much more complicated than the technological determinist would like us to think.⁵

Technologies and added value

There are at least nine respects in which new technologies can enable content providers to add real value to their content: (1) ease of access; (2) updatability; (3) scale; (4) searchability; (5) portability; (6) flexibility; (7) affordability; (8) intertextuality; and (9) multimedia. These features are not unique to the online environment (they also apply in varying ways to other forms of electronic storage) and using new technologies to add value to content is not something that applies only to publishers: publishers are just one class of content providers among many others, and the types of content they provide may be less amenable to the value-adding features of new technologies than other types of content (such as recorded music). But here I’ll examine these value-adding features in relation to the forms of content handled by publishers and with a particular focus on the delivery of content online or via wireless connectivity.

1 Ease of access One of the great advantages of delivering content electronically is ease of access. In traditional systems of content provision, access to content is generally governed by certain spatial and temporal constraints – libraries and bookstores, for instance, are located in specific places and are open for certain hours of the day. But content delivered electronically is no longer governed by these constraints: in principle it is available 24/7 to anyone who has a suitable connection and the right of access. There is no need to go to a library to track down a book or a journal article if the content can be accessed from one’s office or home. The personal computer or handheld device, located in a place that is convenient for the user or carried with them, becomes the gateway to a potentially vast body of content which can be accessed easily, quickly and at any time of the day or night. Moreover, unlike the traditional printed text, an electronic text available online can in principle be accessed by many users simultaneously (even if, in practice, access may be restricted to one user at a time).

2 Updatability Another feature of content delivered electronically is that it can be updated quickly, frequently and relatively cheaply. In the case of content delivered in the form of the traditional printed text, making changes or corrections is a laborious process. Changes can be made to the text at any point up to the typesetting stage, but once the text is typeset it is costly to alter. Printed texts cannot be changed: once they are printed the content is fixed. But content delivered electronically is not fixed in a printed text, and hence it can be altered and updated relatively easily and cheaply. This is a particularly important feature in cases where the content is dealing with material that is in a state of continuous flux, such as financial data. But there are many other contexts where the capacity to update content quickly, frequently and cheaply is an important trait and can add real value.

3 Scale Undoubtedly one of the most important features of online content delivery is scale: the capacity to provide access to large quantities of material. The internet economy is an economy of scale – it offers the possibility of providing access to collections of material that are extensive and comprehensive, of providing a range of choice and depth which is simply not possible in most physical collections. It is this scalability of the internet economy which has driven the online aggregation business. Numerous intermediaries have emerged with the aim of aggregating large quantities of content and selling this on to libraries and other institutions. Part of what is attractive to libraries is the ability to gain access to large quantities of content at relatively low costs, or at costs that are significantly lower than they would have to pay if they were to acquire the same content in a piecemeal fashion. Part of what is attractive to end users is the knowledge that they can find what they are looking for at a single site – a one-stop shop. By providing scale, that is, access to very large quantities of data or content, the content providers (or intermediaries) can add real value. In the internet economy, the whole is more than the sum of its parts, precisely because the comprehensiveness of a collection is valued for its own sake. However, it is important to stress that it is not quantity alone which is valued, but rather quantity which is perceived as relevant to what the user wants and needs – that is, pertinent scale. Intermediaries that set about aggregating large quantities of content while paying little attention to what end users actually wanted soon found that quantity alone does not suffice. What end users generally want is pertinent scale, not scale for the sake of it.

4 Searchability A fourth feature of content delivered electronically is enhanced search capacity. The traditional printed text offers its own means for searching content – the table of contents provides a guide to the content of a book, and the index is effectively a search mechanism for the printed text. But the capacity to search a digitized corpus of material using a search engine based on keywords or names is infinitely quicker and more powerful than the traditional search mechanisms employed in printed texts, and the search capacity can be extended to much larger quantities of content. Search capacity can be provided both within a corpus and across corpora, thus providing the end user with a powerful means of searching for and accessing relevant content. This way of adding value is complementary to, and in some ways required by, the provision of scale. For the end user, there is not much value in having scale, even pertinent scale, unless you have an effective means of locating the content that interests you. The greater the scale, the more valuable – indeed essential – it is to have an effective means of searching the database.

5 Portability Despite scale, content stored and transmitted electronically also offers the possibility of increased portability for the end user. In a digitized format, content is not tied to any particular medium of delivery, like the print-on-paper book. It is versatile, transferable, liberated from a specific material substratum and capable of being stored on any number of devices, provided it has not been locked into a particular device and the format is not proprietary. Moreover, the compression that can be achieved in a digital format enables a large amount of content to be held on a very small device. The traditional printed book is also a very portable object – that is part of its appeal. But the volume of content that can be made easily portable in a digital format is far greater than anything that could be achieved in the medium of print. Large amounts of content downloaded online or via a wireless network can be stored on a computer or some other much smaller device, such as an iPod or cellphone or dedicated reading device, and carried easily from one place to another, without the weight and sheer inconvenience that would be involved in carrying large numbers of books.

6 Flexibility Content delivered electronically offers the possibility of greater flexibility for the end user depending on the functionality that has been built in by the content provider or device manufacturer. In the case of music, for example, the listener can skip over tracks on an album or create their own playlists by selecting their preferred songs. Most ebook reading devices allow users to vary the size of the typeface – a valued feature for many older readers with failing eyesight – and to look up words in a built-in dictionary, among other things. On the other hand, there are forms of flexibility that could be added to digital content but which may not be for reasons that are economic and/or legal rather than technical, such as the ability to share the content with others. And there are other forms of flexibility, like the ability to flick back and forth between pages, that may be lost or diminished in the transition from traditional physical products to content delivered electronically.

7 Affordability Delivering content digitally also enables it to be delivered more cheaply. Savings can be made by eliminating the manufacture of physical products like the print-on-paper book – although, as noted earlier, these costs are much lower as a proportion of overall costs than most people assume. Further savings can be made by eliminating many of the stages and players in the traditional supply chains for physical products – the physical warehouses, the packaging, the transport costs and the cost of returns. Taken together, these amount to real savings for content providers, and there is an expectation on the part of consumers that at least some of these savings will be passed on to them – a kind of ‘digital dividend’. Exactly what this amounts to will vary greatly from one form of content to another and even one supplier to another, but there can be no doubt that cost reduction and lower prices are major factors driving the shift to digital content delivery.

8 Intertextuality Another feature of the online environment is that it is able to give a dynamic character to what we could describe as the referential function of texts. In the traditional medium of the printed text, the capacity to refer to other material is realized through conventional literary devices such as references, footnotes and bibliographies: these are mechanisms for referring the reader on to other texts upon which the author has drawn or which the author regards as important, interesting and/or worthwhile. In the online environment, the referential function of the text can be made much more dynamic by using hot links to enable the reader to move to other pages and other sites. These links can be of various kinds – links to other pages, to other texts, to other sites and resources of various kinds, to bibliographies, biographies and online bookstores. Through the use of hypertext links, the content provider can enable the end user to access referred-to texts quickly and easily, without having to locate the text physically. And whereas references in printed texts can be updated only when there is a new edition of the work, hot links can be updated incrementally and at any time.

9 Multimedia The delivery of content electronically also enables the content provider to use a variety of media and to supplement text with content delivered in other forms, including visual images, streaming video and sound. There are contexts in which this can enable content providers to add real value – for example, where it enables them to use colour illustrations that would be too costly to reproduce on the printed page, or to use streaming video to reproduce a speech or to illustrate a complex process. Of course, there are costs associated with the provision of multimedia content – it may be costly to produce and permission fees may be high. It may also be difficult to use, in the sense that the files may be large and slow to download. But it does at least offer the possibility of adding a kind of value to content which would not be possible in traditional print formats.

Technologies and fields of publishing

If we want to understand why the digital revolution has affected different sectors of the publishing industry in different ways in terms of content delivery, then a careful analysis of the kinds of value that can be added to content in an online environment is a good place to begin but it is not enough. We also have to see that different publishing fields operate in different ways, that different types of content lend themselves to online delivery more readily than others and that end users will have their own views about whether the value added by new technologies is, in any particular case, valuable to them – or at least sufficiently valuable to them to induce them to access content in this way and pay for it, if indeed the content is being delivered in such a way that the content provider is expecting the end user to pay for it. In other words, technologies always exist in specific social contexts and their usefulness or otherwise is shaped by a variety of contextual factors. Added value, like beauty, is in the eye of the beholder. It is a contested social phenomenon. What end users regard as valuable, and how valuable they regard it, may not be the same as what content providers regard as valuable, and how these different valuations play out will vary from field to field and from one form of content to another.

Consider some examples. The field of scientific and scholarly journal publishing experienced a rapid and decisive shift from print to online delivery in the late 1990s and early 2000s. This dramatic transformation was driven forward by the large scientific journal publishers like Elsevier and Springer, who invested heavily in the development of online platforms and in creating fully fledged digital workflows for journal production, but it was also actively encouraged and supported by librarians in the research libraries that comprised the primary market for journals. There are several reasons why the transition to online delivery was so rapid and decisive in the field of scientific and scholarly journal publishing:

• The market for scientific and scholarly journals was an institutional market: these journals were not bought by individuals paying out of their own pocket but by institutional gatekeepers – librarians – who had access to annual budgets that had to be spent on content acquisition.

• The subscription model for journals already existed and it was easy to adapt this model to a site licence for journals delivered online. Librarians were familiar with the business model and were keen to be able to provide an additional service to their users.

• The move into an online environment provided new opportunities for the aggregation of content and for generating the kind of pertinent scale that can add value, and this feature was valued by both librarians and end users. Journal publishers with large numbers of journals could offer special deals to libraries and library consortia to gain access to the whole corpus (e.g. Elsevier’s ScienceDirect or Thomson’s Web of Science), and third-party aggregators (e.g. Gale) could offer packages of content assembled from a variety of different publishers.

• The nature of journal content also lent itself to online dissemination. Journal articles are generally very short – as short as two to three pages, but usually no more than 20 pages – and hence they can be either read on screen or printed out with ease. Particular journal issues are for the most part arbitrary collections of articles which bear no intrinsic connection to one another (with the exception of themed issues), and hence the end user is likely to read only a specific article rather than a series of articles. Since users are generally looking for specific articles on specific topics, the capacity to search for relevant material across a large corpus of material is a valued feature. Moreover, scientists and academics are accustomed to working on their computers, and having journals available online, so that they can access them anytime from their desktops and offices without having to visit a physical library, is a feature that is highly valued by them.

These and other factors go a long way to explaining why scientific and scholarly journal publishing was one of the first areas to move rapidly and decisively into an online environment. This is a form of publishing that exists within a specific institutional space (publishing organizations selling content on a subscription basis to libraries with acquisitions budgets), where the nature of the content (short articles on discrete topics, easily read on screen or printed out) is amenable to online dissemination and where there are clear gains to be achieved in terms of added value (including scale, searchability and ease of access), gains that are valued both by the institutional intermediaries paying for the content and by the end users. But even in the sphere of scientific and scholarly journal publishing, the migration to online dissemination has not – or at least not yet – been total. While librarians are keen to offer their users the ability to access journals online, many remain concerned about the ‘archiving problem’ – that is, how they can ensure perpetual access to journal content for which they have already paid. What happens if at some future date they cancel the subscription, or if at some future date the journal folds or the publisher goes out of business? How then will they be able to ensure access to back issues unless those issues are sitting physically on their shelves? If the electronic content is to be held in escrow by a third party, who will bear the costs of doing so and manage the collection? Until librarians are convinced that the archiving problem has been satisfactorily solved, many will continue to insist on receiving hard copies of each issue as well as access online and the economies that could be achieved by moving entirely into an online environment – both in terms of production costs and in terms of shelf space – will not be realized.

Another field of publishing where there has been a clear and irreversible shift to online dissemination is reference publishing. Large reference works, like comprehensive encyclopaedias and dictionaries, have migrated to online environments as their primary medium of content delivery, even if they continue to be made available in print as well and to offer a range of smaller, print-based spin-off products. Again, there are several reasons why this happened:

• Large multivolume encyclopaedias like Encyclopaedia Britannica faced intense competition in the early 1990s from electronic encyclopaedias like Microsoft’s Encarta, which were compiled on CD-ROM and given away as promotional extras with the sale of new computers. The proliferation of free encyclopaedias, even if they were less comprehensive and inferior in quality, undermined the market for the large, expensive, multivolume work. Sales of Britannica and other traditional encyclopaedias collapsed in the 1990s. If they wanted to survive, they had to reinvent themselves as branded resources that were primarily electronic, and ultimately web-based, in character. They could continue to produce a range of print-based products but their primary mode of content delivery would have to become electronic.

• Encyclopaedias, dictionaries and other reference works lend themselves to online dissemination for several reasons. They are accumulations of discrete pieces of knowledge – ‘bitty’ forms of content. Users do not generally read a reference work from cover to cover but rather consult it in order to answer a specific question or gain a specific piece of knowledge; the capacity to search quickly for a specific piece of information is therefore a highly valued feature. Similarly, cross-referencing to related material is valued by users.

• Scale is also vital for many reference works: the more comprehensive they are, the more useful they are likely to be for the user. Of course, this is not always the case: sometimes a concise reference work, like a dictionary, a travel guide or a book of recipes, may be quite adequate for the purpose at hand. But there are many purposes for which the scale and comprehensiveness of the reference work will be regarded by users as a positive asset. In the medium of print, large, comprehensive reference works of this kind are very costly to produce and very costly to buy. They are also clumsy and unwieldy and can take up a great deal of space. Making them available as online databases overcomes these disadvantages precisely because the online economy is an economy of scale.

• Reference works need to be regularly updated. It is very costly to do this in the print medium, but a reference work can be continuously updated and expanded, relatively easily and cheaply, in an online environment. This feature has, of course, been incorporated into the very raison d’être of Wikipedia, which actively encourages users to become content creators and contribute to the continuous process of updating, amending and expanding the content that comprises the online encyclopaedia.

• For some reference works like encyclopaedias, the use of multimedia, including colour illustrations, sound and streaming video, can also add a kind of value that is appreciated by end users.

For these and other reasons, the online environment provided a natural home for some large reference works: given the nature of the content and the way it was typically used, the kind of value that could be added in the online environment was the kind of value that was appreciated by many users. The problem for content providers was twofold: how to develop a business model that would enable them to charge for their content, and how to position themselves in this new environment in a way that would enable them to ward off the threat posed by much cheaper alternatives (or even free alternatives, as in the case of Wikipedia and other online resources).

For large reference works like the Oxford English Dictionary or multivolume encyclopaedias (especially those of a more specialist kind, like the International Encyclopedia of the Social and Behavioral Sciences), suitable business models were readily at hand: since their primary market for the complete set was mainly an institutional market, that is, libraries, schools, universities and other institutions of this kind, they could replace the one-off transaction fee for the complete set in print either by a one-off transaction fee for the work made available on CD-ROM or DVD, or by a site licence to an online resource for which the institution would pay an annual subscription fee based on the size of the institution and the number of users. Since the main market for these works was institutional, the fee – whether transaction or subscription – could be set at a high level, as the institutions were already accustomed to paying substantial fees for these works.

What was much less clear was how content providers could charge for reference content that was traditionally packaged in much smaller formats, like small dictionaries or one-volume encyclopaedias on more specialized topics. Many of these smaller works were sold to individual consumers on a transactional basis – one book, one sale, end of transaction. It was not at all clear that they would want to buy this content in electronic formats. For many purposes, having a reference work available in a concise printed format is more convenient than using a reading device or checking online, even if the electronic resource is much more comprehensive – it’s much easier, for example, to open a cookbook to a favourite recipe and lay it on the kitchen counter than it would be to look up a recipe online and print it out. Nor was it clear that institutions would want to purchase this content in an electronic format, precisely because it lacks the one thing that is of particular value to institutions – scale. Smaller reference works have therefore continued to survive and, indeed, flourish in traditional printed formats while their larger brethren migrated online.

So there are good reasons why the field of scientific and scholarly journal publishing and the field of reference publishing have experienced a migration – partial in some cases, but clear and irreversible nonetheless – from print to online content delivery. However, in other publishing fields the situation is much less clear. In the field of scholarly book publishing, for example, there has been a great deal of experimentation with electronic content delivery since the late 1990s. This is partly because the field of scholarly book publishing had been experiencing serious difficulties for many years, stemming largely from the steep decline in the sales of scholarly monographs, and there were many who believed or hoped that the digital revolution would provide a solution to the problems of scholarly monograph publishing. Many different initiatives were launched, some funded by philanthropic organizations with an interest in scholarly communication, such as the Mellon Foundation, some funded by third parties and venture capital seeking to develop commercially viable businesses, and some funded by publishers themselves. I have examined a variety of these initiatives elsewhere and will not repeat the analysis here.⁶ Suffice it to say that, despite the hopes of many in the scholarly publishing community, there is no obvious electronic solution to the problems of scholarly monograph publishing. Just as the digital revolution was not the origin of the problems faced by scholarly book publishers, so too it is unlikely to be their salvation. The reasons for this are not technical, but primarily economic and cultural. Academic publishers and others have made scholarly books available online, either as individual ebooks or as part of a database that is offered to customers on a transactional or subscription basis, but the take-up of these offerings has been modest to date and the amount of revenue publishers have been able to generate has been small. Up till now, academic publishers have benefited much more from what I’ve called the hidden revolution – especially the ability to print small quantities using digital technologies, to reduce print runs and to keep books in print long beyond their natural life cycle in the pre-digital age – than they have from making scholarly book content available online.

It also seems clear that, to the extent that there is a market for scholarly book content delivered online, it is more likely to be an institutional rather than an individual market, at least for the near future. The electronic initiatives that have been most successful to date in the field of scholarly book publishing are those which have clearly targeted the institutional market – above all, the research libraries. Whether scholarly books are sold to libraries on a title by title basis (as with NetLibrary, dawsonera and various library suppliers) or sold as part of a scholarly corpus of books on a subscription basis (as with publisher-based initiatives like Oxford Scholarship Online or with third-party ventures like the Humanities E-Book project), it is libraries that provide the most robust market for scholarly books in electronic formats. The reasons are not difficult to see: librarians have budgets to spend on content acquisition; they are accustomed to acquiring digital products and predisposed to do so, especially if they think this will reduce pressure on shelf space and provide extra functionality for library users (such as access from their desktops); and the business models, whether transactional or subscription based, are familiar, well tested and easy to administer. There is every reason to believe that in the coming years we will see a slow but steady increase in the sale of scholarly book content into research libraries in electronic formats. However, this is not to say that it will necessarily be at the expense of printed books, which will continue to be purchased by many libraries, either instead of or in addition to electronic versions, and will in all likelihood remain for some while to come the preferred medium for individuals who wish to purchase scholarly books for their own use.

If this analysis is correct, then what we’re likely to see in the field of scholarly book publishing is not a wholesale migration from print to electronic dissemination but rather the development of mixed models of revenue generation. The kind of migration that has occurred in scientific and scholarly journal publishing may not be a good model for what will happen in the field of scholarly book publishing. A scholarly book is a different kind of object from a scientific journal and it is used in different ways. It is fine to browse a book online or to search a text to find what you need, but if you want to read all or most of a scholarly book, and be able to move back and forth in the book and study it carefully, most readers prefer to have a printed version. Hence it seems likely that, at least for the foreseeable future, the making of scholarly book content available online will take place alongside the continued publication of books in printed formats. Academic publishers may gradually move away from the traditional model of revenue generation, which depended almost exclusively on the sale of printed books, to more mixed models that still depend largely, perhaps even overwhelmingly, on print sales while also seeking, at the same time, to diversify revenue streams – for example, by generating a proportion of their revenue from the sale of site licences, from the sale of ebooks and/or from the licensing of content. But for the foreseeable future scholarly book publishers are going to continue to rely heavily on revenue from print sales. The income they generate from electronic sales will, in all likelihood, be incremental additions, and quite modest ones at that.

If scholarly book publishing presents a mixed picture, so too does trade publishing, although in this case the channels to market for content delivered electronically are rather different. Whereas scholarly book content delivered electronically has found a market primarily among institutions rather than individuals, the ebooks published by trade publishers are being bought primarily by individuals equipped with reading devices. The individuals who buy ebooks value above all their affordability, readability (easy on the eye, adjustable font size), ease of access (can be bought easily and quickly) and portability – in the surveys carried out by the Book Industry Study Group in 2010 and 2011, these are the features that tended to be regarded as most important by customers.⁷ The BISG surveys also showed that the kinds of books consumers most preferred to read as ebooks tended to be straight narrative fiction (genre, commercial and literary fiction), which mirrors the experiences of trade publishers who have seen the most dramatic increases in these categories. The kinds of non-fiction that were most popular in ebook format were those with strong narrative elements, like biography and autobiography; other kinds of non-fiction, including professional and academic books, lagged well behind.⁸

While it is clear that a growing number of readers are happy to read books on reading devices, especially when the books take the form of straight narrative text, what is not clear at this stage is exactly how far the shift to digital will go in the different categories of books, each of which has its own distinctive characteristics. Readers of genre fiction and commercial fiction may value above all the ability to get new books quickly and cheaply, so that they can read them as soon as they are available; storing the book on their shelves as a keepsake may be of little value to them. On the other hand, for a new novel by a great writer or a serious work of non-fiction, there will probably always be some readers who will prefer to have the physical book. Price, ease of access and portability may not be the most important considerations for them; other things may matter more. They might simply prefer to read the text on the printed page, which is gentle on the eyes and enables them to move back and forth with ease. They might value the ability to share the text with others, lend it to others or borrow it from them, or perhaps give it as a gift. They might value the object itself as a cultural form, a material object, attractively designed and produced, durable and displayable, in which you invest a good deal of time and effort, from which you derive pleasure and satisfaction and which, having read and enjoyed it, you might wish to own, to keep on a bookshelf, to return to at some later date and dip into, consult or reread. Indeed, it’s even possible that the shift to digital in certain categories of books (what one publisher described as ‘disposable books’ – read them, delete them, get the next) could be accompanied by a revaluation of printed books in other categories, with some readers placing greater value on printed books, especially beautifully produced hardcover editions, for those books and authors they love and treasure – who knows. Moreover, while it is relatively easy to make straight narrative text available electronically, it is far more complicated and costly to produce other types of books in electronic formats, especially heavily illustrated books like cookbooks, art books and children’s books, and it’s not clear at this stage whether consumers would in any case wish to buy these books in these formats. Nor is it clear whether they would wish to buy books where a great deal of multimedia content had been added – it is simply too early to say.

So how should publishers prepare themselves for a world which is changing fast but where there is still so much uncertainty about which formats will prove popular for different kinds of content and which will endure? Fortunately for publishers, they don’t actually need to know which forms of content delivery will prove most popular for which types of content. While the world around them is swirling with speculation and change, they can afford to remain agnostic on the question of content delivery. For they are in the position of a water company who owns and controls the water supply but doesn’t own the pipes that deliver the water to consumers. If some consumers decide that they would prefer to receive their water through a different kind of pipe, then the water company needs to be in a position to supply it for that pipe. They don’t necessarily need to build the new pipe themselves – they can let others do it and take the risks. But they do need to make sure that their water can be pumped through the new pipe and that those who own and control the pipe don’t have a stranglehold on the supply chain. And that means that what they need to do in the first instance is to build a digital archive.

Building the digital archive

Steve is the head of a division called Media Asset Development at one of the large trade houses in New York. He joined the company in 1995, after having worked on the digital end of the music industry for several years. He arrived at a time when the debate about digitization in the publishing industry was just beginning to be taken seriously by senior managers. Most book production was still being done in the traditional way, and then a small number of books were picked out to be produced ‘digitally’. ‘So there was publishing and then there was digital publishing, there was production and there was digital production. The fight since I arrived has been to get the company to stop thinking that way altogether. There is no production without files, there is no mechanism by which you can go to press anymore without it being digital. So let’s stop saying “digital production” and start calling the production area “the production area”.’ There were people in the company who thought (and some still do) that you could hire someone in the production department who didn’t understand file management – ‘Huge mistake,’ says Steve. ‘I think we’re getting that clear in the company, that if you don’t understand file management, you can’t be in production; it just doesn’t make sense anymore. It’s like saying “I understand horses and I want to work in a car manufacturer.” It’s like the time has passed.’

The company also came to realize that they needed to take ownership of their files in a way they had never done before. It happened around 1998–9. What triggered it was the decision to start making some of the older backlist titles available as print on demand. ‘As a company we decided that print on demand is going to be something we definitely have to guarantee. No book is ever going to go out of print again. Print on demand requires a file, right, so this is the first time we had an actual requirement. Ebooks were sort of a nice toy, but a requirement came up with print on demand. This is a stake in the ground: the company is saying, “We need files.” But the problem was that the files were not easy to get. The files were with the printers and compositors,’ explained Steve. ‘All of our printers and compositors said, “We have your files, we’re storing them for you, we can give them back to you for 200 bucks or some price.” We did an analysis of our title list and we figured there was somewhere around 15,000–20,000 books we wanted, so we said, “Great, give them back.” The reply from our printers and compositors was, basically, “You know, we can’t actually do this because we’re not sure which is the last version. We’ve got them stripped on tapes and we’ve got 5,000 tapes and we’d probably have to run them all to figure out the versioning.” And out of the 15,000 we asked for, 300 were available.’ It quickly became clear that the printers and compositors weren’t really archiving the publisher’s files. They were backing up their work in progress but they weren’t archiving, and there’s a big difference between backing up and archiving. ‘Archiving really means the book is ready to go for print at its most recent iteration in a complete file set today. Backing up is just, “Well, if we go down, can we restore?” They don’t ever keep our most recent version, they just back up their work in progress. So there’s no archive.’

At that point it was clear that the publisher had to create its own archive and its own archiving procedures. ‘We were very clear that owning the process is what we had to do. That was a huge eye-opener – that no one else out there cares about our files but us, nobody’s going to take care of them, nobody’s going to version them, nobody’s going to do any quality control.’

Once the company had decided that it needed to create its own digital archive, it then had to establish the archiving procedures and populate the archive with the company’s digital assets. This was much more complicated than it might at first seem. First, there was the sheer quantity – with a large house like this which includes many imprints with their own long publishing histories, there were as many as 40,000–50,000 titles still in print. Then there are the different purposes for which the digital content could be used. ‘Are we talking about print on demand and ebooks? Because those can be different animals. Are you talking about capturing them just to capture them, or do you really want to capture things that have immediate value? Do you want a return on investment on what you capture instantly or is it long-term investment? You start having to have those conversations. But the place we should have started, and it’s the philosophy that we try to implement here, is that a book is not a book. Books are categories. Books are types. Books are different styles of things. So you can’t just say, “Go capture 20,000 books.”’ Steve elaborates:

If you keep thinking of books as generic objects, you’re thinking about them the way that they are in paper. In paper they’re all the same – it’s a book, you’re delivering tree. So if your delivering mechanism is delivering tree, you’re done. If it’s delivering digital goods, they are differentiated in multiple facets. So when you start the backlist discussion, you start saying, ‘What is your target? What’s the specification? What are you thinking about and for what kind of book?’ It gets much more complicated. The thing people always hoped was that the digital world would get simpler and it’s actually a whole lot more complicated because your end result isn’t the same. The end result is a database, the end result is a PDF, it’s an image-based PDF, it’s an XML file, it’s an ad-based, Google-search-engine toolset – we’re going to have many more properties digitally than we possibly could have physically. We have seven physical properties now: large print, mass market, hardcover, paperback and you have some weird digest editions and stuff like that. Online we have ad-based, widget-based, ebook-based, subscription-based, chunked content – there are hundreds of formats and types and styles. So it’s a much bigger world digitally than in print. So when you say, ‘Oh, let’s go capture all the backlist,’ it’s like wait a minute, do you want us to capture cookbooks? Do you want us to capture things that are out of stock today? Do you want us to capture out of print? Do you want us to capture only the 10,000 top sellers? So we had those discussions. What we came around to was the 10,000 top-selling books of all time from that year back – let’s get those and decide how to make them available. That was the backlist adventure.

Each of these 10,000 top-selling books was sent out to be scanned and turned into an XML file using OCR (Optical Character Recognition) software. It was more expensive than simply producing a PDF – it cost around $200 to turn an average-sized book into an XML file, whereas you can produce a PDF with a simple scan for around $50. PDF is fine for print on demand but it is less adaptable, cannot be used for ebook outputs and is a much larger file, and hence more expensive to store.

To the 10,000 backlist titles, they added the 4,000 or so new titles that were being produced every year, so that by 2008 they had a digital archive containing some 40,000 titles. In addition to populating the archive, they needed to develop procedures for handling the content. These involved three distinct processes – ‘digital asset management’, ‘digital transformation’ and ‘digital distribution’. Steve pulled out a blank sheet of paper and drew a sketch (see figure 14). On the left-hand side of figure 14 are the different production departments of the company that are producing new books and delivering them as digital files to the Digital Asset Management (DAM) system, which is the company archive. What happens in the production departments depends on their workflow processes and the degree of sophistication they want to build into the files – if they want to do XML tagging, for example, it happens there. When the files are complete they are dumped into the DAM, the archive, which stores different kinds of files – Quark files, InDesign files, PDF files, XML files, etc. – for each book. The files for each book are stored under one ISBN, usually the hardcover ISBN, so you have four or five different files for exactly the same book. This immediately creates a level of complexity that has to be carefully managed. So, for example, if you want to make a correction on a particular page of a particular book, you have to make sure you have procedures in place to ensure that the correction is made before the book is reprinted, whichever edition is being reprinted. ‘Maintaining that print file correctly is the key,’ explains Steve. ‘If the archive has one thing that it has to do without fail, it’s have the right version of the file for print, at all times. That’s the golden rule that can’t be broken.’

Figure 14 The digital archive

This may sound simple enough but it’s easy to screw it up. For example, suppose one of the imprints has a book that’s selling well and they need to reprint quickly, but they find that they’ve got a typeface corruption on page 90. The printer says he can probably fix it so they tell him to go ahead and fix it because they need the books as soon as possible. ‘They get a call back saying they fixed it, that’s great, we’re rolling and you say, “Thank God.” But that file is no longer in sync with my digital files. Did we ask for the corrected file back? No. Are we just waiting to have the same problem happen again when we do a reprint? A production error trumps everything else, period. If something’s down or broken or in press and is wrong, it’s got to be fixed, instantly; there’s no question about it. The problem is the recovery of the file – we need the file back. It just tends to be forgotten.’ So now the file at the printers is out of sync with the file in the digital archive and the publisher doesn’t know how the file has been changed to correct the error. The problem could recur the next time the book is reprinted. The integrity of the files in the archive has been compromised by the need to solve the production problem and get the book reprinted quickly.

Getting people in the various publishing divisions to think differently about these issues is not easy because they are not rewarded for helping to maintain the integrity of a database; they’re rewarded for selling books, and making sure that a book which is selling well is reprinted promptly and available to meet current demand is a vital part of this. But the problem is that, in focusing on their short-term problem and not thinking about the long term, they’re only storing up problems for others further down the line. When the mass-market people pick up the book in a year or 18 months’ time, they’re picking up your problem. To avoid that, a determined effort has to be made to ‘incentivize care’, as Steve puts it – that is, to get people throughout the organization to see that maintaining the integrity of files is in the interests of everyone. ‘The goal of publishing isn’t to trip and stumble out the door and manage to toss the book in the printers and be done and say, “Thank God.” The goal is to maintain appropriate and up-to-date content for the lifespan of that work.’

Apart from managing the asset store and ensuring that the most recent, corrected or updated files are being held in the system, you also have to be able to deliver content in appropriate formats to various external clients. Quark files are sent directly to the printer from the digital archive, but other files may need to be converted to other formats before they can be used by clients and vendors. So, for example, if an ebook is stored in ePub format, it may need to be converted to an ebook vendor’s proprietary format, if an audio file is stored in the WAV format, it may need to be converted to an AIFF file for a particular audiobook vendor, and so on. In these cases, a transformation tool will convert the file before it’s passed on to a distribution tool that sends it out to the appropriate client or vendor. Of course, it would be much easier if all these clients and vendors used the same file formats but they don’t, so the transformation and distribution tools have to be geared to providing each client and vendor with files in the particular formats they require.

In addition to supplying files to vendors, some publishers will also store files in certain formats in a digital repository that is accessible by others, with rules governing what can be accessed by whom and under what conditions. In some cases the repository is hosted on the publisher’s own server; in other cases it is outsourced to a third party, like LibreDigital. The purpose of the repository is to provide the publisher with a digital face to the outside world, to enable some of its content to be seen in the online world and, indeed, actively to project some of its content into this world, while at the same time retaining control over it. So, for example, the content held in the repository can be made available to Amazon and Google for their book search programmes through a dynamic web call (more on this below).

Building all these systems is complicated and expensive, and company’s like Steve’s had to do it when they were seeing very little if any return in terms of revenue. They were building their digital archives when the overwhelming share of their revenue was still being generated by the sale of the various print editions – hardcover, trade paperback, mass market, etc. ‘I’d say it’s somewhere around 97, 95 per cent,’ says Steve, clutching for some rough percentages (this was 2008). ‘Ebooks do make some money, audio does make some money, licensed deals do make some. So there’s a bunch of little money ways in there but the vast, vast majority comes from the print file.’

In fact, of these alternative sources of non-print revenue (and leaving aside rights income, which is another issue altogether), audio was the main revenue generator during the time when the investment in the digital archive was being made. The sales of audiobooks represent a small proportion of the overall revenues of the major trade houses – probably around 5 per cent; but they were still generating far more revenue than ebooks in the period up to 2008–9. The general rule of thumb in the audiobook business is that if you have a big book that is suitable for audio, you can generate audio sales of up to 10 per cent of the hardcover print sale. So if you sell 100,000 hardcovers, you can sell up to 10,000 audios, but never more than that. And of course there are many books that are not suitable for audio and are never turned into audiobooks – given the 10 per cent rule, the audio division of a large trade house normally doesn’t even consider a book unless there is a clear expectation that it will sell a minimum of 50,000 hardcovers. And then they have to consider whether it’s going to work in audio – ‘You know, cookbooks don’t work in audio except in rare cases; diet books don’t work in audio,’ explained the manager of one audio division. ‘And there are particular categories that can work extremely well in audio – memoirs, for example, books that have a personality-driven narrator. The commercial fiction, plot-driven fiction also works very well in audio. Literary fiction, intricate, more character-driven fiction, does not work as well in audio because your mind just can’t track, the narrative has sort of gone on and you’re about to try and figure out what that sentence meant.’

These principles haven’t changed much since audiobooks first began to appear in the 1980s, but the way audiobooks are delivered to consumers has evolved from one format to another – ‘Much of the story of audio has been about this transition, the march through the different formats.’ The early audiobooks were sold as tape cassettes, but the cassette was phased out and replaced by CDs, which is now the dominant medium for the sale of audiobooks. The transition from tape to CD was a transition from analogue to digital, but the digital product was still being delivered in a physical form – the CD, often a set of five or six depending on the length of the recording, commonly sold as a boxed set. However, digital downloads are now accounting for a growing share of the market. In 2005, CDs accounted for around 90 per cent of the revenue from audio sales for the audio division of one large trade house, and digital downloads accounted for around 10 per cent; by 2007, the proportions were 85 per cent CD, 15 per cent digital download. ‘I think there’s going to be a point that we’re probably approaching where things start to accelerate in terms of CDs dropping off and downloads accelerating and that has to do with just the penetration of the iPod,’ explained the manager. However, given the importance of the iPod as an audio device, the extent to which the digital download market expands, and the rate at which it expands, are likely to depend on which retailers can sell the download that is iPod-compatible. ‘Up till now, you’ve had only Audible and Apple who are able to sell iPod-compatible download files with DRM [digital rights management] protection,’ she continues. ‘And so as powerful a retailer as iTunes is, I think that there’s probably been some kind of suppression of the growth of digital downloads because of only having those two retailers sell. If you went to a situation tomorrow where every retailer who sells CDs was also selling digital downloads, I think the growth rate would probably be much faster.’

Steve, for his part, was well aware that audio was the main revenue generator among the non-traditional, non-print products sold by the company and that the revenue generated by ebooks remained insignificant, but he was convinced that ebooks would grow. In his view, the print-on-paper book is just a tool like any other, a piece of technology which has some strengths and some weaknesses – ‘It’s a good tool, it doesn’t require batteries, it’s easy to use, you know, it’s all that stuff. But as soon as we have a better tool, I think books are just gone. They’re expensive, heavy, can’t search them, there’s all these things against them.’ He has limited sympathy for the idea that a book might be more than that for many readers – a cultural object that might have aesthetic and emotional value for them, something they might value as they value a work of art. Sure, there are some books for which this may be true, but they’re a minority – ‘20 per cent maybe.’ Books will not cease to be printed, in Steve’s view, any more than the invention of the television killed the radio or the invention of the DVD killed television. But the ‘triage’ will be different:

Books aren’t the same thing every time. Triage them better. Would you rather be handed the phone book or Google? Well, so don’t print the phone book anymore, that’s not a work of art, right? Who cares about the phone book? Most of the paperbacks you buy in the airport, wouldn’t you rather just walk over to the kiosk and have it beam them into something you can take on the plane and read if it was cheaper and easier and faster? But then you say, ‘Well, I want a coffee table book,’ well, that’s always going to exist, you’re not going to replace that with digital. There are books that are objects of art and that are going to endure history and go down through the ages and you definitely want to print those. But crappy romance novels? You really want to kill a tree for them?

Of course, Steve could be wrong (though with the benefit of hindsight, his comments, made in 2008, seem remarkably prescient). But the beauty of Steve’s position is that he doesn’t have to be right. He just has to be ready. ‘Publishers should focus on making sure they get content that sells. They don’t really care how it sells, they just want to sell it. The fact that print is the medium they sell it in and they’ve got to make their numbers this year, that’s not my responsibility. My responsibility is to make sure that we’re positioned so that when something changes, it’s seamless here. So when they say, “We’re finding that 10 per cent of our market is now electronic,” that’s an opportunity, not a problem. So I would like us to be agnostic to whatever succeeds. Do I care whether or not people buy the book on a particular medium? They shouldn’t care either, whether it’s audio, ebook, electronic, subscription – if we sell it, that’s all that we really care about.’ Provided that the digital archive is built and functioning well, and provided that the company’s content is stored in appropriate digital formats and properly maintained, then the publisher is in a position to respond to changes in the market. They are able and ready to deliver the content down a different pipeline – if, in fact, consumers demonstrate through their purchasing decisions that they prefer to use a different pipe.

So are there dangers in this situation for a trade publisher? What do Steve and other managers in his and similar companies worry about? They worry about many things but there are two that preoccupy them most: piracy and price.

The threat of piracy

There is nothing new about the unauthorized reproduction of books and parts of books: it has long been a feature of the world of print, exacerbated by the photocopying machine but by no means invented by it. However, with the conversion of the book into a digital file, the risks of unauthorized reproduction and circulation of book content are raised to an entirely new level. Once content is in a digital form and provided it is unsecured, it is quick, easy and cheap to produce multiple copies and to share it with others – a PDF can easily be sent to any number of recipients, or made available online for others to view or download. And all of this could be done without permission or remuneration, infringing a publisher’s copyright and depriving them and the author of revenue. One need look no further than the music industry to see the havoc that can be wreaked in a creative industry by rampant file sharing facilitated by peer-to-peer distribution systems like Napster. Publishers knew they couldn’t ignore the dangers. So how are they trying to deal with this threat? Essentially in three ways: security, policing and proactively supplying the market.

Security is a matter of taking care to retain control over one’s digital assets and protect them against unauthorized reproduction. These issues, generally referred to as digital rights management or DRM, are an important topic of discussion and policy within most publishing companies today. Each company must form a view about what digital content it is going to make available to whom and in what form. ‘What is our philosophy of distribution? Are we distributing generic files or encrypted files? Are they trusted partners or not? Are we distributing letters to go in their encrypted envelopes or are we securing the letters before we distribute them? These are another set of questions that we as a publisher need to answer,’ says Steve. At his company, they’ve decided to distribute unencrypted files to their principal retail customers like Amazon and let them create the lock and the keys – the DRM envelope, as it were – which will be added to the file before it is sold on to a customer as an ebook. They’re willing to do this because they have a clear contractual agreement with their customers that stipulates the conditions under which they can sell their ebooks, just as they have a clear agreement about the conditions under which they can sell or return their physical books: ‘We’re contracted with them and we assume that they won’t abuse this relationship. It’s the same in a way as if we send you a block of 10,000 books: we trust that you will sell them, report the sales and then return only the ones that haven’t sold.’ The auditing of ebook sales raises fresh issues, however, simply because you don’t have the same physical calculus as you have in the world of physical books, where copies shipped out less copies returned = copies sold. This reaffirms the need to be extra vigilant when choosing your retail customers.

However, when it comes to digital sampling, Steve’s company takes a more cautious view. They want to participate in Amazon’s Search Inside the Book and Google Book Search but they’re wary of handing over their digital content to third parties and allowing them to hold it on their servers – especially very powerful third parties like Amazon and Google. Partly it’s a matter of trust, or rather the lack of it: while publishers know that their own fate has become inextricably interwoven with powerful web-based companies like Amazon and Google, they also know that their interests don’t entirely coincide and they worry about ceding control of their most important asset – their content – to them. ‘Many publishers in this building just like elsewhere are still not totally comfortable about giving their files to Amazon and Google,’ explained one of Steve’s colleagues. ‘Partly it’s because we’re unsure what they’ll do with it’ – they may be cooperative now, but as they became larger and more powerful they might simply disregard the concerns and requests of publishers. There were also quality issues and more straightforward practical reasons. The publisher would not be controlling quality if it handed books over to Amazon and Google to scan, and it could not vary the rules that governed the amount of text that users could view. If the publisher held on to its content then it could devise its own access rules, stipulating exactly how much of each book can be viewed and how it can be viewed – whether 10 per cent, 20 per cent, one chapter, first chapter only, etc. It could also take down or replace an old version of a book quickly, rather than waiting the eight weeks that Amazon takes to remove a book from its programme. So Steve’s company chose to build its own repository to hold digital files that could be accessed by third parties via a dynamic web call. For the consumer, the Search Inside the Book experience looks exactly the same as any other Amazon search-inside experience – you don’t leave the Amazon environment. But when Amazon calls up the sample pages, it calls them up from the publisher’s server rather than from Amazon’s server. This enables the publisher to keep control of its own content and decide exactly what to make available and how, while at the same time benefiting from online browsing schemes like Amazon’s Search Inside the Book.

While introducing measures to safeguard their content, many publishers and agents constantly monitor the internet and search for unauthorized content, and they are willing to take action against those who are deemed to be infringing their copyright. ‘Harry Potter is rigorously controlled,’ explained an agent who worked at the agency that managed J. K. Rowling’s rights. ‘We have agencies who spend all their time on the internet, surfing the internet, constantly looking out for illegal content.’ And when they find it? ‘Then you do your best to track down the perpetrator and either warn them off or slap a lawsuit on them.’ Other agencies and publishers do similar things. Most of the large publishers either have people in-house working on this or employ outside firms, looking for pirated material online, serving notices on sites to take down unauthorized content and taking them to court if they fail to comply, all the time seeking to make it harder and harder for sites to make pirated material available while recognizing that it will be a constant struggle. ‘We can wrap the stuff all we want’, observed a senior manager in one of the large houses, ‘but anybody can buy a book, take it home, put it through their scanner and post it on the internet – it’s pretty fucking easy. You can do it in an hour and a half. Who are we kidding with our iron-clad DRM?’ Security and policing are important but at the end of the day the crucial thing is to create an environment in which consumers can acquire the content, and are inclined to do so, through legitimate channels and at reasonable prices, a point to which we shall return.

Publishers as well as authors’ associations have also been willing to take collective action against what they see as the illegal infringement of copyright – the most significant example being the class action launched against Google in 2005. The source of concern for publishers, agents, authors’ associations and others in the publishing world was the Google Library Project. This was one part of an ambitious project developed by Google that was aimed at strengthening its position in the search engine wars – that is, in the struggle for market share between Google and its main rivals, Yahoo and MSM. In the early 2000s Google took the view that one way it could increase its market share vis-à-vis its competitors was to look for ways to ensure that more high-quality content turned up in search results. Rather than relying only on information retrieved from the web by its crawlers, they wanted to add more high-quality content to their database so that searches would have a richer body of material to draw on. Scanning books and adding them to the database was one way to do this. Google therefore launched two programmes, the Partner Program and the Library Project, in order to add book content to their database. The Partner Program involved persuading publishers to give Google permission to scan their books; in response to a search query, a user would get a link to relevant text in the book and would be able to view a limited number of pages. The benefit to the publisher was that the book would be called to the attention of the user, who would be able to browse a few pages in Book Search and click on a link to Amazon, to the publisher’s website or to another retailer to buy the book – it was, in effect, a free form of online marketing. Since the publisher had a contract with Google that regulated the conditions under which the text could be viewed and enabled the publisher to remove any title at any time, this programme was not a source of concern for most publishers.

The Library Project was another matter entirely. Google also entered into agreements with several libraries – Harvard, Stanford, the Bodleian at Oxford, the University of Michigan and New York Public Library – to scan their materials and add them to their database.⁹ In response to search requests, users would be able to browse the full text of public domain materials but only a few sentences of text – what Google calls a ‘snippet’ – in books still under copyright. Each library would receive in return a digital copy of the scanned books in its collection. Google took the view that displaying snippets was within fair use legislation. It also announced that it would enable copyright holders to opt out of the Library Project by providing Google with a list of titles that they wished to exclude. For many copyright holders, however, Google’s opt-out was turning the basic principle of copyright on its head. Rather than requiring a user to seek and be granted permission to use copyrighted material, Google was requiring the copyright holder to inform Google if it didn’t want its copyrighted material to be used.

On 20 September 2005 the Authors Guild and several authors launched a class action against Google for copyright infringement, and a month later five publishers – McGraw-Hill, Pearson, Penguin, Simon & Schuster and John Wiley & Sons – filed a suit against Google. After many months of negotiations, the plaintiffs and Google announced a settlement on 28 October 2008.¹⁰ In essence, the settlement proposed to create a mechanism – the Books Rights Registry or BRR – for Google to pay rights-holders for the right to display books. Google would make an upfront payment of at least $45 million to the BRR for distribution to rights-holders whose books had been scanned. Google would also be able to generate revenue by selling the ability to see full text and print out books, at prices that can be set by the rights-holder (failing which Google would set the price using a pricing algorithm); any revenues generated in this way would be split 37:63 between Google and the BRR, which would distribute its share among the rights-holders. The settlement distinguished between three categories of books – in-copyright and commercially available (meaning roughly in print or available through print on demand), in-copyright and not commercially available, and public domain – and it established default rules for what Google could do with the two categories of in-copyright books. Google estimates that the majority of published works fall into the category of in-copyright and not commercially available – as much as 70 per cent, compared to 20 per cent in the public domain and 10 per cent in copyright and commercially available. Since the rights-holders could remove specific books from Google’s database, vary the default rules or opt out altogether, the category that would probably be most affected by the settlement is that of ‘orphan works’ – that is, works that are in copyright but are not claimed by any rights-holder.

The proposed settlement has been the subject of a great deal of criticism, both from within the US and from abroad, and it has had a rough ride in the US justice system. In September 2009 the US Department of Justice raised objections to the settlement, prompting the parties to withdraw the original agreement and submit a revised version, which they did on 13 November 2009. The revisions dealt primarily with the mechanisms for handling orphan works and with the restriction of the settlement to books published in the US, UK, Australia or Canada. The latter restriction was intended to meet objections from the French and German governments, which argued that the settlement did not abide by copyright laws in their countries; since a large proportion of the books in the libraries partnering with Google are not in English (perhaps as much as 50 per cent), this represented a significant reduction in the scope of the settlement.¹¹ The revised settlement was subject to approval by the US District Court for the Southern District of New York, and on 22 March 2011 US Circuit Judge Denny Chin announced that he was rejecting the settlement on the grounds that it ‘is not fair, adequate or reasonable’. By placing the onus on copyright owners to come forward to protect their rights, the settlement was, argued Chin, inconsistent with the basic principles of copyright law – and in this respect he was affirming what many publishers had always thought. Chin also contended that the settlement would give Google ‘a de facto monopoly over unclaimed works’, rewarding it for engaging in the unauthorized copying of books and giving it a significant advantage over any potential competitor. While Chin’s judgment was undoubtedly a serious blow to those who had worked out the settlement, he did leave the door ajar, noting that some of the objections could be met if the settlement were converted from an opt-out to an opt-in agreement. In Chin’s view, the status of orphan works should be dealt with separately, by Congressional legislation rather than by an agreement among private, self-interested parties.¹²

How the parties proceed from this point on is unclear. They could revise the settlement in a way that seeks to meet the objections raised by Judge Chin or they could abandon the settlement, continue the litigation and allow the matter to be settled in the courts – it’s too early to say. But whatever the eventual outcome, the dispute illustrates all too well the way in which publishers and others in the publishing industry have found themselves caught up in developments that are not of their own making, where the pace of change is being set by players much larger than themselves who are fighting different battles and pursuing different goals. No one in the industry has a very clear sense of where all this is heading and where it will end, nor could they be expected to – there are simply too many imponderables. There are many in the industry who would be happy to see some version of the settlement approved: they see it as a victory of sorts and they are undoubtedly right to do so, since it formally recognizes the rights of copyright holders, obliges Google to provide financial compensation for those whose books have already been scanned and places clear restrictions on what Google can do with in-copyright material. It also establishes some standards in what was otherwise completely uncharted territory, so that neither Google nor any other player can start digitizing libraries and think they can do as they wish with the content. On the other hand, there are some who fear, not unreasonably, that any settlement of this kind would put Google in an even more powerful position in the new information economy, making it effectively unassailable as the largest repository of digitized book content, a monopoly in all but name, and who argue that the cultural heritage represented by the vast numbers of books previously published – both public domain and orphan works – is simply too important to be left in the hands of a private corporation whose future direction and priorities will be decided by shareholders rather than by the public interest.¹³

The third way that publishers can respond to the threat of piracy and the infringement – actual or alleged – of copyright is to be proactive about supplying the market with content in suitable electronic formats. Many publishers take the view that nothing would do more to stimulate the illegal trade in electronic files than an inability or unwillingness of the copyright holders to meet a genuine demand for content when a reading device appears that is widely adopted by users. ‘We just want to make sure, when that happens, that the industry is there to support the right kind of sell through to that device so that we don’t end up with a piracy-dominated industry, as opposed to a legitimately sold industry,’ explained one senior executive in a large trade house. Hence the amount of time, effort and cost that is being invested by most large houses in ensuring that their content is in appropriate digital formats and their digital archives are in good order. Like Prohibition, the non-availability of desirable content through legitimate channels is likely only to stimulate the illegal trade in contraband goods.

While issues of piracy and copyright infringement are of real concern to those involved in digital content distribution, there is another issue that is a source of growing anxiety among trade publishers. I was interviewing the senior executive just quoted in November 2007, on the very day that Amazon launched the Kindle in the US, and he, like everyone else in the publishing industry, was taken completely by surprise when Amazon announced that they were going to sell New York Times bestsellers and new releases for $9.99 on Kindle. ‘Do you know where they got that price?’ he asked me, in a tone suggesting he was still reeling from the shock. ‘Didn’t get it from us. As a matter of fact, they’re losing money on most of the books they sell. What are they thinking?’

The spectre of price deflation

In the period up to 2008 all of the major trade houses had their own policies on ebook prices – policies that, in some cases, fluctuated rather confusingly over the months and years. In the early days of ebooks, many took the view that the list prices of ebooks should be less than the list prices of the physical books, since there are some real savings to be achieved by distributing content digitally rather than as a bound and printed book (though less than most people think, as we noted earlier). To reflect this, some publishers decided to price their ebooks at 20 per cent less than the price of the hardcover or paperback price, whichever edition is current; so a new book selling at, say, $24.99 in hardcover would be sold at $19.99 as an ebook. Others decided to sell their ebooks at a set price – say $16.99 – regardless of the price of the print version. While some price reduction for ebooks was common practice among trade houses, there were some that decided not to reduce the price at all and to sell ebooks at the same price as the prevailing print edition, on the grounds that the savings were minimal and the primary value of the book was its content, not the particular medium in which it was delivered to the consumer.

Whatever pricing policy they adopted for their ebooks, publishers would give their normal discount to their retail customers – say 48 per cent off the publisher’s list price – and the retailer would be free to discount off the publisher’s list price, just as they do with printed books. So even though Penguin were selling its ebooks at the same price as its printed books, Sony was selling them at 20 per cent off – in this case the discount was Sony’s, not the publisher’s. Most publishers expected Amazon to adopt a similar strategy, discounting off the publisher’s ebook list price. What they didn’t expect at all was for Amazon to announce a fixed price of $9.99 for all New York Times bestsellers and new releases.

The figures simply didn’t add up. If a new hardcover was selling for a list price of $25 and the publisher was setting the ebook price at 20 per cent off, then the ebook list price would be $20. With a discount of 48 per cent to the retailer, the cost to Amazon would be $10.40. For Amazon to sell these ebooks at $9.99 means that they were losing 41 cents on each copy they sold, let alone making any margin to cover their costs. And if the new hardcover was selling for more than this – say it was Alan Greenspan’s The Age of Turbulence selling at a list price of $35 – and if the publisher was not offering a discounted price for the ebook (as was the case with Penguin, who published Greenspan), then Amazon’s loss on every copy they sold of the Kindle edition would be in the region of $8.20. It didn’t make sense.

Of course, from Amazon’s point of view there was a rationale. It wanted to make a statement: buy the Kindle (selling at $399 when it was launched) and all New York Times bestsellers and new releases will cost you only $9.99 – much less than the $25 or $26 you would have to pay, possibly discounted to $17 or $18, if you were to buy the hardcover edition. It was setting the price of a new book at just below the symbolic threshold of $10. Like Apple and iTunes, it was using book content as a lever to drive the sales of its hardware. It would make its money from the sale of the hardware; it would devalue content to $9.99 and, at least for the time being, subsidize any losses incurred, in the hope that this would enable it to sell enough hardware devices to establish a dominant position in the market.

So why were publishers troubled by this? Two reasons. First, it devalues the book and creates the impression in the minds of consumers that a new book is ‘worth’ $9.99. But this is an illusion, created by the fact that a particular, powerful player in the field has decided, for reasons largely unconnected to costs, to fix the price at a low and symbolically significant level. ‘The danger with digital goods’, explained one manager in a large trade house, ‘is the danger that happened in the music industry. Why are songs 99 cents? Because Apple said so. Can the music industry make money at 99 cents? No. But now what does everyone think that a song should be worth? 99 cents. If books come down to $9.99, that’s not realistic for us. It would kill us. We can’t make any money on that price level.’

Of course, the low price was being subsidized by Amazon, which was willing to accept losses in the short term in order to establish its market position. ‘But the worry’, continued this manager, ‘is that they’re going to get people in the mindset that this is what the value is and then they’re going to come back to us and say, “Everybody wants this and these other publishers are doing it and we don’t want you to sell it to us for $10 anymore, we want you to sell it to us for $5.”’ So the second reason to be concerned is that if Amazon succeeds in establishing a dominant position in the ebook marketplace, it will use its muscle to put pressure on publishers to reduce their ebook prices and/or increase their discounts, so that it can continue to sell frontlist bestsellers and new releases for $9.99 without making a loss.

The more powerful Amazon’s position is in the ebook marketplace, the greater the danger to the publisher. ‘There will be a monopoly, just like Apple with the iPod is a closed loop. Amazon is going to be a closed loop with the Kindle and they’re going to say, “In this closed loop world, this is the pricing.”’ Since only Amazon can sell content onto the Kindle, the consumer has to buy Kindle ebook content through Amazon. Amazon will have the same kind of monopoly on content onto the Kindle for books that Apple has in terms of DRM-protected audio onto the iPod for music. ‘If that’s the case,’ said another manager in the same publishing house, ‘what does it do to your negotiations with that retailer?’

So they then want to force you, the publisher, to offer your content to them at cheaper and cheaper and cheaper prices, so eventually they say, ‘Look, we’ve grown a market here, we’ve taken it on the chin for a number of years because we have made no money on the content that we’re selling, whereas we’ve been paying you, the publisher, the money you asked for. But now there’s a big market there and we can’t afford to do this anymore. So if you want to keep selling content onto the Kindle, now you need to give us a 75 per cent discount or you need to reduce your prices to $5.’ And, you know, where do you go from there?

Of course, if the publisher agreed to increase the discount to 75 per cent, then it would have to give the same discount to anyone in the same sales channel and in the same ebook format – the Robinson-Patman Act would require this. ‘But if there’s nobody else really in the game, then it doesn’t matter.’

So how does a publisher respond to this threat that is looming on the digital horizon? In the view of many publishers, the great danger is that Amazon’s aggressive pricing strategy will create the impression in the minds of consumers that most of the value of a new book priced at $25 is accounted for by the paper and the print, that is, by the physical container, and that the value of the content is only worth $9.99, just as Apple created the impression that a song is worth only 99 cents. The more widespread this impression becomes, the greater the risk that this devaluation will lead to a haemorrhaging of value in the publishing industry – a draining of value out of the industry that would be greater than the savings that could be achieved by moving into a world of electronic content delivery. So the key issue for publishers is to get clear in their own minds about what the value of their content is and then do what they can to stand by their convictions when negotiating with powerful players in the ebook marketplace. One publisher put it like this:

As the publisher we have to say with clear conviction that the value of the book is the content in it and that value is $15, $20, whatever we determine it is – and by the way the value may be different for different books depending on who the author is, what the length is, what the topic is. We have to have the courage of our conviction and maintain the pricing level that we want, and then enter the negotiations with the retailer saying, ‘Here is the discount we’re going to offer you.’ In the digital world, maybe the discount shouldn’t be 50 per cent – there’s no inventory, you don’t have to run a distribution centre, you don’t have to maintain a physical bookstore. Maybe we only need to give you a 25 per cent discount and then we kick in extra money for marketing. But we control the purse strings. So I think that it’s very important right from the outset to be very firm in terms of our resolve to keep the content as valued as we need it to be. Amazon is in a period right now where they need publishers’ cooperation in terms of enrolling titles in their program, and so it’s not as though we don’t have any cards to play.

Like most publishers, she wants to see Amazon succeed with the Kindle but she doesn’t want them to be too successful. Publishers want to see a diversified ebook marketplace with other hardware suppliers and retailers flourishing alongside Amazon. They want to see Barnes & Noble and Sony succeed as well as Amazon, and they would like to see Apple and Google, among others, become significant players. ‘If Amazon has 35 per cent of the physical sales channel and 90 per cent of the digital channel then we’re all screwed,’ said one CEO. The closed loop is the publisher’s nightmare scenario – and all the more so if the player who controls this loop also happens to be the dominant player on the physical side of the book retail business.

Given the sensitivity of the issues surrounding price, most trade publishers have proceeded with caution when it comes to supplying content for the ebook market. They are perfectly happy to sell their books in electronic rather than traditional printed formats and to see ebooks grow, but only if this is done under conditions that will not, as one senior executive put it, ‘undercut the very lifeblood of the industry’. He continued:

I don’t think the authors or the publishers are in any mad rush to generate what would ultimately be a cannibalistic, or at least partially cannibalistic, phenomenon of having the material bought digitally rather than in physical form. We’re happy for it to be bought digitally as long as it doesn’t create a kind of cataclysmic decline in the industry’s revenue. And so there’s no reason to rush and underprice things to create such a decline. We’ll be happy for this to develop in its own appropriate way so long as – and this is a big caveat – another industry doesn’t grow up underneath. It’s wholly illegal to furnish the same product to people who actually like some sort of experience but don’t want to pay for it. So what we’re trying to do is develop an electronic ebook industry that delivers appropriate values from attractive reading platforms at a price that seems, both to the consumer and to the author, to be an appropriate price for the value that’s being delivered. I don’t think we can forget that this is a very cheap form of entertainment relative to any other form of entertainment, for what it is. When you compare it to movies or games or newspapers or anything else, you look at hours of enjoyment let’s say or edification or anything else that’s delivered per dollar, this is a very competitive industry with the pricing today. You don’t need to go to a tenth of today’s pricing to deliver that kind of value, nor do I think we can say that by going to prices that are a tenth of what we have today, we will see the volume increase by tenfold. That’s an impossibility, because with the demands on people’s time they just won’t have ten times as much time to read as they have today and they won’t read ten times more just because something is cheaper.

So from this publisher’s point of view, the key challenge is twofold: first, to try to keep prices of electronic content at levels that reflect their assessment of the real value of that content, that maintain the health of the industry and enable publishers to continue to reward authors, while at the same time not setting the prices so high that people feel they’re being scammed; and second, to ensure that, when there are devices out there that people actually want to use, the content is readily available for those devices in appropriate digital formats so that people won’t be tempted to share files illegally, as they did with music. So providing content at prices that are appropriate for the value delivered and making sure that the industry can support whatever devices turn out to become reading devices of choice for consumers: ‘This is the fundamental issue for publishers to navigate over time.’

There is no need to try to speed things up – ‘You’re only cutting off your nose to spite your face and you’ll probably be unsuccessful because consumers will come when they come.’ But you don’t want to slow it down either, since ‘artificially slowing it down by not providing the product or having pricing that’s way off the map is equally deleterious to your interests because people will find another way of getting the contents.’ Provided the publisher has created a digital workflow that outputs digital files, provided they have created a robust digital archive and populated it with content in suitable digital formats, and provided they can maintain their pricing and discount structures in a way that will enable them to get the same economic benefit out of the sale of a digital edition or a print edition, then the publisher can remain indifferent about whether ebook sales become 10 per cent or 20 per cent or 50 per cent of their revenue; they can remain indifferent about whether ebook sales cannibalize print sales (as they undoubtedly would to some extent); they can remain indifferent too about the speed with which the migration to ebooks happens in those categories where it does. In other words, under these conditions the publisher can remain agnostic on the question of whether the future is digital: their house is in order and they are prepared for any number of possible future scenarios. But whether, as ebook sales grow, these conditions would actually hold – whether, in particular, publishers could hold the line on prices and discounts in the face of determined pressure from powerful retailers like Amazon – is, of course, another matter.

It was March 2009 and a year had passed since I listened to senior executives in the big trade houses in New York stressing the need for publishers to have the courage of their convictions and stand by the value of their content, maintaining prices at the levels they believe their content is worth. I was interviewing the CEO of a large US trade house and I’d barely had a chance to sit down when he started to tell me what was foremost on his mind:

The biggest thing that’s happened since the Kindle came out has been Amazon’s decision to price these books at no higher than $9.99 and then my cowardly peers in the business going along with dramatically reducing the cost of an ebook, even at a time when the author advances hadn’t gone down and there isn’t a publisher anywhere that can disregard the revenue generated from the hardcover sales of the book. The major publishers went up and down with their prices; they were all over the place with not a lot of discernable rhyme or reason. Amazon didn’t put any pressure on anybody. They had announced this price and they were sort of dancing around the issue of whether they were going to keep this or whether it was an introductory price. And the other publishers decided they were going to set a value to their ebooks that was dramatically lower, in most cases it was at least $10 lower, than the hardcover.

He was angry. He was upset. He was annoyed with his colleagues in other trade houses. Publishers speak fine words about having the courage of their convictions and holding the line on price but as soon as they’re faced with a major retailer taking an aggressive position in the market they collapse. ‘It drives me crazy. It’s like all the things publishers have screamed about for years and years and years, that archaic distribution model we have all inherited, are finally going away and they say, you know what, let’s take $5 or $6 and just throw it away. I can’t understand it.’

The confusion over pricing continued throughout 2009. Publishers experimented with different ways of pricing and publishing ebooks – some were releasing ebooks at the same time and the same list price as the hardcover edition and letting Amazon discount as they wished, others were windowing ebooks, that is, delaying the release of the ebook for five or six months to try to protect hardcover sales, and some were doing both. At the same time, there was growing concern among publishers about the potentially deleterious consequences of Amazon’s pricing strategy. Their concerns were amplified by the price war that broke out between Amazon, Wal-Mart and Target in October 2009, which saw prices on some new hardcover bestsellers falling to under $10.

The situation came to a head in early 2010. The new ingredient in the mix that proved to be a catalyst for change was Apple. In late 2009 Apple began talking with the big trade publishers about acquiring content for iBooks – an ebook store that it was developing for the iPad, which it was planning to launch in April 2010. It quickly emerged that Apple would prefer to use an agency model – the same model it used for music – rather than the wholesale or discount model that was traditional in the physical book trade. In the agency model, the publisher sets the price and the retailers act as the publisher’s agents, taking a commission – in this case 30 per cent – on sales. In January 2010 John Sargent, CEO of Macmillan, the group of US companies owned by Holtzbrinck, flew out to Seattle to propose new terms of trade to Amazon that would be based on the agency model. Amazon rejected the proposal and retaliated by removing the buy buttons from all of Macmillan’s books, both print and Kindle editions, on the Amazon site – exactly the kind of aggressive action by Amazon that many publishers had long feared. Over a weekend at the end of January 2010 many in the publishing industry were riveted to their computer screens, watching in astonishment as one of the first great conflicts of the new digital age unfolded before them. After several days of tense stand-off, Amazon backed down. It reluctantly agreed to accept the agency model, which meant that Macmillan would control the price of its ebooks and its frontlist titles could no longer be priced at $9.99. Amazon’s reputation took a serious battering. ‘It was appalling what they did,’ commented the CEO of one large house who watched the events unfolding from the sidelines, ‘and they were humiliated very quickly into switching the books back on. They had been winning the PR battle quite successfully with a number of agents until that point, and then when the agents saw the belly of the beast it was not something they liked. It was strategically a very poor move on their part.’ It was, by contrast, a bold move by Macmillan in the new price wars that were emerging around ebooks, and it soon became clear that other major trade publishers would be following suit. By the summer of 2010 Hachette, HarperCollins, Simon & Schuster and Penguin had all moved over to the agency model. Of the big six only Random House held out, but in March 2011 it too moved over to the agency model.

While the adoption of the agency model by the big six trade publishers may have averted a major deterioration of prices, it’s too early to say whether this is anything more than temporary. There are critics who view the agency model as a case of price-fixing and argue that it is a breach of competition rules, and it is being probed by antitrust investigators in the US, the UK and Europe. In June 2010 Texas Attorney General Greg Abbott launched a preliminary investigation, and in August a similar investigation was announced by Connecticut Attorney General Richard Blumenthal; these investigations mirror similar inquiries into Apple’s business practices that are being conducted by the Federal Trade Commission and the Department of Justice. In the UK, the Office of Fair Trading began an investigation of ebook pricing in January 2011, and in March the European Commission launched morning raids on several publishing houses suspected of fixing the prices of ebooks. Amazon may have lost its battle with Macmillan and the other big trade houses in early 2010 but there are many in the business who are under no illusions about Amazon’s willingness to renew the struggle. ‘There’s a perception in the world that there was a seismic shift in the industry: the content guys said “fuck you” and they came out ok,’ reflected one CEO who had been through the switch-over from the wholesale to the agency model but suspected that the battle was far from over. ‘Round one. It’s like a 30-round fight that’s going to go on for 10 years. I have no way of knowing that the agency model will continue to work. It could all start again.’

There is much more at stake in this debate than what might seem to the outside observer to be a fine point about comparative pricing. For one of the greatest threats facing the creative industries today is, as one perceptive retailer put it, ‘the increasing commoditization of content by non-content players, which is driving down the value of intellectual property’. On the positive side, the delivery of content in digital formats could, at least in principle, enable the creative industries to eliminate or reduce some of the long-standing inefficiencies associated with traditional supply chains. But at the same time, it carries the risk – by no means hypothetical, as the music industry shows – that content becomes cannon fodder for large and powerful technology companies and retailers that use content to drive the sales of their devices and services and increase their market share, thereby devaluing intellectual property and sucking value out of the content creation process. Some would undoubtedly benefit from this; others would lose. But however this plays out in terms of the reconfiguration of the creative industries, a major devaluing of intellectual property, and a constant driving down of the price of content, is unlikely to lead to an overall increase in the quality of content over time.

¹ Joseph L. Bower and Clayton M. Christensen, ‘Disruptive Technologies: Catching the Wave’, Harvard Business Review (Jan.–Feb. 1995), pp. 43–53.

² The line of thinking developed by the digital advocates tends to be well represented in the press, since journalists and other commentators love to write about technologies that seem capable of ushering in radical change. There are also countless websites and blogs speculating about it, think tanks dedicated to it (see, for example, the Institute for the Future of the Book, at www.futureofthebook.org) and even a minor subgenre of literature – for the most part published, as it happens, as old-fashioned print-on-paper books, an irony not lost on some of the authors – that heralds the imminent demise of the book and either mourns or celebrates its passing, from Sven Birkerts’ elegant lament in The Gutenberg Elegies: The Fate of Reading in an Electronic Age (London: Faber & Faber, 1994) to Jeff Gomez’s brash, no-doubts, no-regrets manifesto, Print is Dead: Books in Our Digital Age (New York: Macmillan, 2008).

³ See Thompson, Books in the Digital Age, ch. 15, on which some of the following analysis is based.

⁴ See Chris Anderson, The Long Tail: Why the Future of Business is Selling Less of More (New York: Hyperion, 2006).

⁵ The following analysis draws on the framework developed in Thompson, Books in the Digital Age, pp. 318–29.

⁶ See Thompson, Books in the Digital Age, ch. 13.

⁷ Consumer Attitudes toward E-Book Reading, Report 2 of 3 (New York: Book Industry Study Group, March 2010), p. 13; Consumer Attitudes toward E-Book Reading, vol. 2, Report 2 of 4 (New York: Book Industry Study Group, April 2011), p. 14.

⁸ Consumer Attitudes toward E-Book Reading, Report 2 of 3, p. 12.

⁹ For a more detailed account of the issues at stake, see Jonathan Band, ‘The Google Library Project: The Copyright Debate’ (American Library Association, Office for Information Technology Policy, Jan. 2006), at www.policybandwidth.com/doc/googlepaper.pdf.

¹⁰ The full text of the settlement can be found at www.googlebooksettlement.com/agreement.html. For a helpful summary see Jonathan Band, ‘A Guide for the Perplexed: Libraries and the Google Library Project Settlement’ (American Library Association and Association of Research Libraries, 13 Nov. 2008), at www.arl.org/bm~doc/google-settlement-13nov08.pdf.

¹¹ For a more detailed account of the main changes in the amended settlement agreement, see Jonathan Band, ‘A Guide for the Perplexed Part III: The Amended Settlement Agreement’ (American Library Association and Association of Research Libraries, 23 Nov. 2009), at www.arl.org/bm~doc/guide_for_the_perplexed_part3.pdf.

¹² The full judgment can be found at: ‘The Authors Guild et al. against Google Inc.: Opinion’, at www.nysd.uscourts.gov/cases/show.php?db=special&id=115.

¹³ Robert Darnton has been the most vocal and thoughtful critic of the Google settlement and has developed a forceful argument along these lines. See Robert Darnton, ‘Google and the Future of Books’, New York Review of Books, vol. 56, no. 2 (12 Feb. 2009), reprinted in Darnton’s The Case for Books: Past, Present, and Future (New York: Public Affairs, 2009), pp. 3–20; and Robert Darnton, ‘Six Reasons Google Books Failed’, New York Review of Books (28 Mar. 2011).