In 1998, a pianist called Christopher Donison wrote that ‘one can divide the world into roughly two constituencies’: those with larger hands, and those with smaller hands. Donison was writing as a male pianist who, due to his smaller than average hands, had struggled for years with traditional keyboards, but he could equally have been writing as a woman. There is plenty of data showing that women have, on average, smaller hands than men,1 and yet we continue to design equipment around the average male hand as if one-size-fits-men is the same as one-size-fits-all.
This one-size-fits-men approach to supposedly gender-neutral products is disadvantaging women. The average female handspan is between seven and eight inches,2 which makes the standard forty-eight-inch keyboard something of a challenge. Octaves on a standard keyboard are 7.4 inches wide, and one study found that this keyboard disadvantages 87% of adult female pianists.3 Meanwhile, a 2015 study which compared the handspan of 473 adult pianists to their ‘level of acclaim’ found that all twelve of the pianists considered to be of international renown had spans of 8.8 inches or above.4 Of the two women who made it into this exalted group, one had a handspan of nine inches and the other had a handspan of 9.5 inches.
The standard piano keyboard doesn’t just make it harder for female pianists to match the level of acclaim reached by their male colleagues: it also affects their health. A range of studies carried out on instrumentalists during the 1980s and 90s found that female musicians suffered ‘disproportionately’ from work-related injuries, and that keyboard players were among those ‘most at risk’. Several studies have found that female pianists run an approximately 50% higher risk of pain and injury than male pianists; in one study 78% of women compared to 47% of men had developed RSI.5
It seems likely that this is related to hand size: another study from 1984, which included only male pianists, identified twenty-six ‘successful performers’ defined as ‘well-known soloists and winners of international competitions’, and ten ‘problem cases’: those who had ‘struggled with technical or injury problems over a long period’.6 The former group’s average handspan was 9.2 inches compared to the problem cases’ 8.7 inches – which is nevertheless still substantially larger than the average female handspan.
It was while Christopher Donison was practising the coda of the G minor Chopin Ballade on his Steinway concert grand ‘for about the thousandth time’, that he had the thought that led to his designing a new keyboard for people with smaller hands. What if it wasn’t that his hands were too small, but that the standard keyboard was too large? The result of this thought was the 7/8 DS keyboard, which, Donison claimed, transformed his playing. ‘I could finally use the correct fingerings. Broken-chord formations could be played on one hand position, instead of two. [. . .] Wide, sweeping, left-hand arpeggiated figures so prevalent in Romantic music become possible, and I could actually get on with the business of cultivating the right sound, rather than repeatedly practicing the same passage.’7 Donison’s experience is backed up by numerous studies which have also found that a 7/8 keyboard dispels the professional and health disadvantages imposed by the conventional keyboard.8 And yet there remains a strange (that is, if you don’t accept that sexism is at play here) reluctance in the piano world to adapt.
The reluctance to abandon design that suits only the largest male hands seems endemic. I remember a time back in the early 2000s when it was the smallest handsets that were winning phonemeasuring contests. That all changed with the advent of the iPhone and its pretenders. Suddenly it was all about the size of your screen, and bigger was definitely better. The average smartphone is now 5.5 inches,9 and while we’re admittedly all extremely impressed by the size of your screen, it’s a slightly different matter when it comes to fitting into half the population’s hands (not to mention minuscule or non-existent pockets). The average man can fairly comfortably use his device one-handed – but the average woman’s hand is not much bigger than the handset itself.
This is obviously annoying – and foolish for a company like Apple, given that research shows women are more likely to own an iPhone than men.10 But don’t expect to uncover a method to their madness any time soon, because it’s extraordinarily difficult to get any smartphone company to comment on their massive-screen fixation. In desperation for answers I turned to the Guardian’s tech reporter Alex Hern. But he couldn’t help me either. ‘It’s a noted issue,’ he confirmed, but ‘one I’ve never got a straight answer on.’ Speaking to people informally, he said, the ‘standard response’ was that phones were no longer designed for one-handed use. He’s also been told that actually many women opt for larger phones, a trend that was ‘usually attributed to handbags’. And look, handbags are all well and good, but one of the reasons women carry them in the first place is because our clothes lack adequate pockets. So designing phones to be handbag-friendly rather than pocket-friendly feels like adding injury (more on this later) to insult. In any case, it’s rather odd to claim that phones are designed for women to carry in their handbags when so many passive-tracking apps clearly assume your phone will be either in your hands or in your pockets at all times, rather than sitting in your handbag on your office desk.
I next turned to award-winning tech journalist and author James Ball, who has another theory for why the big-screen fixation persists: because the received wisdom is that men drive high-end smartphone purchases, women in fact don’t figure in the equation at all. If this is true it’s certainly an odd approach for Apple to take given the research about women being more likely to own iPhones. But I have another, more fundamental complaint with this analysis, because it again suggests that the problem is with women, rather than male-biased design. In other words: if women aren’t driving high-end smartphone purchases is it because women aren’t interested in smartphones, or could it be because smartphones are designed without women in mind? On the bright side, however, Ball reassured me that screens probably wouldn’t be getting any bigger because ‘they’ve hit the limit of men’s hand size’.
Good news for men, then. But tough breaks for women like my friend Liz who owns a third-generation Motorola Moto G. In response to one of my regular rants about handset sizes she replied that she’d just been ‘complaining to a friend about how difficult it was to zoom on my phone camera. He said it was easy on his. Turns out we have the same phone. I wondered if it was a hand-size thing’.
Almost certainly, it was. When Zeynep Tufekci, a researcher at the University of North Carolina, was trying to document tear-gas use in the Gezi Park protests in Turkey in 2013, the size of her Google Nexus got in the way.11 It was the evening of 9 June. Gezi Park was crowded. Parents were there with their children. And then the canisters were fired. Because officials ‘often claimed that tear gas was used only on vandals and violent protesters’, Tufekci wanted to document what was happening. So she pulled out her phone. ‘And as my lungs, eyes and nose burned with the pain of the lachrymatory agent released from multiple capsules that had fallen around me, I started cursing.’ Her phone was too big. She could not take a picture one-handed – ‘something I had seen countless men with larger hands do all the time’. All Tufekci’s photos from the event were unusable, she wrote, and ‘for one simple reason: good smartphones are designed for male hands’.
Like the standard keyboard, smartphones designed for male hands also may be affecting women’s health. It is a relatively new field of study, but the research that does exist on the health impact of smartphones is not positive.12 But although women’s hand size is demonstrably smaller than men’s, and although women have been found to have a higher prevalence of musculoskeletal symptoms and disorders,13 research into the impact of large smartphones on hands and arms does not buck the gender data gap trend. In the studies I found, women were significantly under-represented as subjects,14 and the vast majority of studies did not sex-disaggregate their data15 – including those that did manage to adequately represent women.16 This is unfortunate because the few that did sex-disaggregate their data reported a statistically significant gender difference in the impact of phone size on women’s hand and arm health.17
The answer to the problem of smartphones that are too big for women’s hands seems obvious: design smaller handsets. And there are of course some smaller handsets on the market, notably Apple’s iPhone SE. But the SE wasn’t updated for two years and so was an inferior product to the standard iPhone range (which offers only huge or huger as size options). And it’s now been discontinued anyway. In China, women and men with smaller hands can buy the Keecoo K1 which, with its hexagonal design, is trying to account for women’s hand size: good.18 But it has less processing power and comes with in-built air-brushing: bad. Very bad.
Voice recognition has also been suggested as a solution to smartphone-associated RSI,19 but this actually isn’t much of a solution for women, because voice-recognition software is often hopelessly male-biased. In 2016, Rachael Tatman, a research fellow in linguistics at the University of Washington, found that Google’s speech-recognition software was 70% more likely to accurately recognise male speech than female speech20 – and it’s currently the best on the market.21
Clearly, it is unfair for women to pay the same price as men for products that deliver an inferior service to them. But there can also be serious safety implications. Voice-recognition software in cars, for example, is meant to decrease distractions and make driving safer. But they can have the opposite effect if they don’t work – and often, they don’t work, at least for women. An article on car website Autoblog quoted a woman who had bought a 2012 Ford Focus, only to find that its voice-command system only listened to her husband, even though he was in the passenger seat.22 Another woman called the manufacturer for help when her Buick’s voice-activated phone system wouldn’t listen to her: ‘The guy told me point-blank it wasn’t ever going to work for me. They told me to get a man to set it up.’ Immediately after writing these pages I was with my mother in her Volvo Cross-Country watching her try and fail to get the voice-recognition system to call her sister. After five failed attempts I suggested she tried lowering the pitch of her voice. It worked first time.
As voice-recognition software has become more sophisticated, its use has branched out to numerous fields, including medicine, where errors can be just as grave. A 2016 paper analysed a random sample of a hundred notes dictated by attending emergency physicians using speech-recognition software, and found that 15% of the errors were critical, ‘potentially leading to miscommunication that could affect patient care’.23 Unfortunately these authors did not sex-disaggregate their data, but papers that have, report significantly higher transcription error rates for women than men.24 Dr Syed Ali, the lead author of one of the medical dictation studies, observed that his study’s ‘immediate impact’ was that women ‘may have to work somewhat harder’ than men ‘to make the [voice recognition] system successful’.25 Rachael Tatman agrees: ‘The fact that men enjoy better performance than women with these technologies means that it’s harder for women to do their jobs. Even if it only takes a second to correct an error, those seconds add up over the days and weeks to a major time sink, time your male colleagues aren’t wasting messing with technology.’
Thankfully for frustrated women around the world, Tom Schalk, the vice president of voice technology at car navigation system supplier ATX, has come up with a novel solution to fix the ‘many issues with women’s voices’.26 What women need, he said, was ‘lengthy training’ – if only women ‘were willing’ to submit to it. Which, sighs Schalk, they just aren’t. Just like the wilful women buying the wrong stoves in Bangladesh, women buying cars are unreasonably expecting voice-recognition software developers to design a product that works for them when it’s obvious that the problem needing fixing is the women themselves. Why can’t a woman be more like a man?
Rachael Tatman rubbishes the suggestion that the problem lies in women’s voices rather than the technology that doesn’t recognise them: studies have found that women have ‘significantly higher speech intelligibility’,27 perhaps because women tend to produce longer vowel sounds28 and tend to speak slightly more slowly than men.29 Meanwhile, men have ‘higher rates of disfluency, produce words with slightly shorter durations, and use more alternate (‘sloppy’) pronunciations’.30 With all this in mind, voice-recognition technology should, if anything, find it easier to recognise female rather than male voices – and indeed, Tatman writes that she has ‘trained classifiers on speech data from women and they worked just fine, thank you very much’.
Of course, the problem isn’t women’s voices. It’s our old friend, the gender data gap. speech-recognition technology is trained on large databases of voice recordings, called corpora. And these corpora are dominated by recordings of male voices. As far as we can tell, anyway: most don’t provide a sex breakdown on the voices contained in their corpus, which in itself is a data gap of course.31 When Tatman looked into the sex ratio of speech corpora only TIMIT (‘the single most popular speech corpus in the Linguistic Data Consortium’) provided data broken down by sex. It was 69% male. But contrary to what these findings imply, it is in fact possible to find recordings of women speaking: according to the data on its website, the British National Corpus (BNC)32 is sex-balanced.33
Voice corpora are not the only male-biased databases we’re using to produce what turn out to be male-biased algorithms. Text corpora (made up of a wide variety of texts from novels, to newspaper articles, to legal textbooks) are used to train translation software, CV-scanning software, and web search algorithms. And they are riddled with gendered data gaps. Searching the BNC34 (100 million words from a wide range of late twentieth-century texts) I found that female pronouns consistently appeared at around half the rate of male pronouns.35 The 520-million-word Corpus of Contemporary American English (COCA) also has a 2:1 male to female pronoun ratio despite including texts as recent as 2015.36 Algorithms trained on these gap-ridden corpora are being left with the impression that the world actually is dominated by men.
Image datasets also seem to have a gender data gap problem: a 2017 analysis of two commonly used datasets containing ‘more than 100,000 images of complex scenes drawn from the web, labeled with descriptions’ found that images of men greatly outnumber images of women.37 A University of Washington study similarly found that women were under-represented on Google Images across the forty-five professions they tested, with CEO being the most divergent result: 27% of CEOs in the US are female, but women made up only 11% of the Google Image search results.38 Searching for ‘author’ also delivered an imbalanced result, with only 25% of the Google Image results for the term being female compared to 56% of actual US authors, and the study also found that, at least in the short term, this discrepancy did affect people’s views of a field’s gender proportions. For algorithms, of course, the impact will be more long term.
As well as under-representing women, these datasets are misrepresenting them. A 2017 analysis of common text corpora found that female names and words (‘woman’, ‘girl’, etc.) were more associated with family than career; it was the opposite for men.39 A 2016 analysis of a popular publicly available dataset based on Google News found that the top occupation linked to women was ‘homemaker’ and the top occupation linked to men was ‘Maestro’.40 Also included in the top ten gender-linked occupations were philosopher, socialite, captain, receptionist, architect and nanny – I’ll leave it to you to guess which were male and which were female. The 2017 image dataset analysis also found that the activities and objects included in the images showed a ‘significant’ gender bias.41 One of the researchers, Mark Yatskar, saw a future where a robot trained on these datasets who is unsure of what someone is doing in the kitchen ‘offers a man a beer and a woman help washing dishes’.42
These cultural stereotypes can be found in artificial intelligence (AI) technologies already in widespread use. For example, when Londa Schiebinger, a professor at Stanford University, used translation software to translate a newspaper interview with her from Spanish into English, both Google Translate and Systran repeatedly used male pronouns to refer to her, despite the presence of clearly gendered terms like ‘profesora’ (female professor).43 Google Translate will also convert Turkish sentences with gender-neutral pronouns into English stereotypes. ‘O bir doktor,’ which means ‘S/he is a doctor’ is translated into English as ‘He is a doctor’, while ‘O bir hemsire (which means ‘S/he is a nurse’) is rendered ‘She is a nurse’. Researchers have found the same behaviour for translations into English from Finnish, Estonian, Hungarian and Persian.
The good news is that we now have this data – but whether or not coders will use it to fix their male-biased algorithms remains to be seen. We have to hope that they will, because machines aren’t just reflecting our biases. Sometimes they are amplifying them – and by a significant amount. In the 2017 images study, pictures of cooking were over 33% more likely to involve women than men, but algorithms trained on this dataset connected pictures of kitchens with women 68% of the time. The paper also found that the higher the original bias, the stronger the amplification effect, which perhaps explains how the algorithm came to label a photo of a portly balding man standing in front of a stove as female. Kitchen > male pattern baldness.
James Zou, assistant professor of biomedical science at Stanford, explains why this matters. He gives the example of someone searching for ‘computer programmer’ on a program trained on a dataset that associates that term more closely with a man than a woman.44 The algorithm could deem a male programmer’s website more relevant than a female programmer’s – ‘even if the two websites are identical except for the names and gender pronouns’. So a male-biased algorithm trained on corpora marked by a gender data gap could literally do a woman out of a job.
But web search is only scraping the surface of how algorithms are already guiding decision-making. According to the Guardian 72% of US CVs never reach human eyes,45 and robots are already involved in the interview process with their algorithms trained on the posture, facial expressions and vocal tone of ‘top-performing employees’.46 Which sounds great – until you start thinking about the potential data gaps: did the coders ensure that these top-performing employees were gender and ethnically diverse and, if not, does the algorithm account for this? Has the algorithm been trained to account for socialised gender differences in tone and facial expression? We simply don’t know, because the companies developing these products don’t share their algorithms – but let’s face it, based on the available evidence, it seems unlikely.
AI systems have been introduced to the medical world as well, to guide diagnoses – and while this could ultimately be a boon to healthcare, it currently feels like hubris.47 The introduction of AI to diagnostics seems to be accompanied by little to no acknowledgement of the well-documented and chronic gaps in medical data when it comes to women.48 And this could be a disaster. It could, in fact, be fatal – particularly given what we know about machine learning amplifying already-existing biases. With our body of medical knowledge being so heavily skewed towards the male body, AIs could make diagnosis for women worse, rather than better.
And, at the moment, barely anyone is even aware that we have a major problem brewing here. The authors of the 2016 Google News study pointed out that not a single one of the ‘hundreds of papers’ about the applications for word-association software recognised how ‘blatantly sexist’ the datasets are. The authors of the image-labelling paper similarly noted that they were ‘the first to demonstrate structured prediction models amplify bias and the first to propose methods for reducing this effect’.
Our current approach to product design is disadvantaging women. It’s affecting our ability to do our jobs effectively – and sometimes to even get jobs in the first place. It’s affecting our health, and it’s affecting our safety. And perhaps worst of all, the evidence suggests that when it comes to algorithm-driven products, it’s making our world even more unequal. There are solutions to these problems if we choose to acknowledge them, however. The authors of the women = homemaker paper devised a new algorithm that reduced gender stereotyping (e.g. ‘he is to doctor as she is to nurse’) by over two-thirds, while leaving gender-appropriate word associations (e.g. ‘he is to prostate cancer as she is to ovarian cancer’) intact.49 And the authors of the 2017 study on image interpretation devised a new algorithm that decreased bias amplification by 47.5%.