Our goal is not to treat existing classifications as “ground truth” labels and build machine learning tools to mimic them, but rather to use computation to better quantify the variability and uncertainty of those classifications.
—Peter M. Broadwell, David Mimno, and Timothy R. Tangherlini, “The Tell-Tale Hat: Surfacing the Uncertainty in Folklore Classification,” 20171
In reality, the whole point of numbers is to handle questions of degree that don’t admit simple yes or no answers. Statistical models are especially well adapted to represent fuzzy boundaries—for instance, by characterizing an instance’s mixed membership in multiple classes, or by predicting its likelihood of membership in a single class as a continuous variable. One important reason to map genre algorithmically is that it allows us to handle these fuzzy boundaries at scale.
—Ted Underwood, Understanding Genre in a Collection of a Million Volumes, 20142
The conventions for data representation covered in the previous chapter are constraints of computational thinking. If we want to use digital computers to represent and then analyze any cultural phenomenon as data—be it a collection of video recordings of all performances at the Eurovision Song Contest (1956–), a list of all exhibitions in the MoMA during its history (1929–)3, or the experiences of all visitors to a particular exhibition—we need first to translate this phenomenon into a medium of data that algorithms can work on. This translation is not a mirror of the phenomenon. Only some characteristics of the artifacts, users’ behaviors, and their sensorial, emotional, and cognitive experiences can be captured and encoded as data.
Data is a medium. Like photography, cinema or music, it has both affordances and restrictions. It allows us to represent many kinds of things in many ways, but it also imposes limits on what we can represent and express and how we think about it. In particular, it dictates that we chose one of the available data types to represent any characteristic of the phenomenon. There are a number of ways to categorize available data types. There are three commonly used schemes—we already talked about the first in previous chapter, but not the other two.
In the first scheme, data types refer to the types of phenomena and media being represented. In this scheme, we have geospatial data (which can be further broken into spatial coordinates, trajectories, shapes, etc.), 3-D data (polygonal, voxels, point clouds), 2-D image data (raster and vector), temporal data, network data, sound data, text data, and so on.
The second scheme distinguishes between categorical and quantitative data. Quantitative data can be further divided into discrete and continuous.4 The same phenomenon often can be represented using any of these data types, and the choice of representation strongly influences how we can think about it, imagine it, and analyze it.
For example, we can represent cultural time using discrete temporal categories, such as centuries or periods, like Renaissance and Baroque. Such representation may encourage us to think about each period as one entity, and to start comparing such entities. (The foundational book of modern art history—1915’s The Principles of Art History by Heinrich Wölfflin—is a perfect example of such an approach.) But if we represent time in culture as continuous or discrete data—for example, as years—this more neutral and detailed scale makes it easier to conceive of culture in terms of gradually changing values. Suddenly, labels such as Renaissance and Baroque, modernism and postmodernism, or Russian avant-garde and socialist realism fade away, allowing us to see continuities and gradual evolution.
Let’s look at another example that does not involve time. We can represent colors in an image using the terms provided by natural languages or as sets of numbers using RGB, HSL, HSV, or other color models. For example, the X11 color system supported by browsers has names for nine red colors—LightCoral, IndianRed, FireBrick, and so on. These same colors can be specified as RGB values: (240, 128, 128), (205, 92, 92), and (178, 34, 34). But numerical representation has a key advantage over natural language. Using three numbers between 0 and 255, we can represent 16,777,216 different color values; no human language can come close to this. This example illustrates how a choice of one data type over another allow us to represent some phenomena with more fidelity—which will then shape how a phenomenon can be analyzed. If we compare mature paintings by Piet Mondrian, a few color categories will be sufficient, but for paintings of Giorgio Morandi, which use very desaturated colors that are close to each other in brightness, numerical representation is much better.
The third scheme distinguishes among structured, unstructured, and semi-structured data. Structured data is organized according to a predefined model. For example, to analyze and visualize patterns in a large collection of art images, we create a table containing information about each image. Each column contains only the type of information stored in a particular format: for example, image filename, creation date, title given by the author, and some extracted visual features. Such a table is ready for statistical analysis and visualization. Other examples of structured data include web logs, point-of-sale data, stock trading data, and sensor data.
Examples of unstructured data include text, music, digital images, and videos. Historically computational data processing dealt only with structured data, so everything else that does not fit into a table or a database format was called unstructured data.
Semi-structured data falls in between the structured and unstructured types. An example of such data is an email. The email has a body of text (an unstructured part) and a number of elements stored by a computer in fixed formats, such as sender and recipient email addresses and the date and time the email was sent.
The choice of a particular data representation type may strongly influence our understanding of the phenomenon we want to study. Once we encode it in a particular way, it becomes more difficult to imagine alternatives. For example, if we represent time using regular intervals, this mechanical view of time prevents other views of time, such as cyclical.5 If we represent a space as a set of discrete points defined by numerical coordinates (i.e., the Cartesian coordinate system introduced by Descartes in 1637), it is more difficult to envision it different ways. The discrete geometric representation will be suitable for the geometric abstractions of Kazimir Malevich or Bridget Riley, but not for those of Francis Bacon or Park Seo-Bo, which “modulate” a surface. (The two types of representation correspond to the difference in computer graphics between vector and raster images or between polygons and voxels.)
The choice of data type imposes additional constraints on the representations, but it can also open up unique possibilities. Some methods become hard to think about it, but others become easy to work with. In fact, in statistics some methods only make sense for continuous variables, while others only work for categorical data. For example, calculating mean or standard deviation is only appropriate for quantitative data.
The adoption of the data science paradigm by many new professional and research fields and industries seen in the first two decades of the twenty-first century is likely to continue to grow. This affects how contemporary society represents knowledge and information. Until the twenty-first century, only selected knowledge areas (i.e., natural sciences, biological sciences, and quantitative social sciences) and business have relied on quantitative data. But the “datafication” of many other fields and areas of human life means that they also now use quantitative representations. And this makes them subjectable to the same techniques of data mining, statistical modeling, and predictive analytics. These techniques now form the “senses” and the “mind” of contemporary society: how it perceives itself and makes endless decisions.
This adoption of data science may lead to a more nuanced view of phenomena as we shift from categorical to continuous data. However, systems of categories defined using words from natural languages (e.g., as color terms) that traditionally have been humans’ preferred way of representation did not go away. They remain central to political, social, and cultural life. In the second part of this chapter, I will look at the contemporary society of categories and at some ways to analyze systems of cultural categories.
How do we create data? One common method used especially in the sciences is measuring a phenomenon. Regardless of what we measure, we need to use some system to encode these measurements. In 1946, psychologist Stanley Stevens defined scales of measurements. Although alternative scales have been proposed by others, Stevens’s system is the one used the most widely. The system contains four scales. The nominal scale describes qualitative measurements (i.e., categories); the other three scales—ordinal, interval, and ratio—describe quantitative measurements.6
Steven’s theory of scales is very important. It specifies how phenomena can be represented as different data types and what kinds of descriptive statistics can be used with each type. This applies equally to physical, psychological, social, and cultural phenomena. For instance, in the psychology of art and sociology of culture fields, researchers study audience perceptions of artifacts by representing these perceptions using one of these scales. Perceptions can be measured by asking people to fill out questioners; by analyzing videos of their faces during the events to automatically measure the types, levels of expressiveness, and valence of their emotions; or by other methods.7
Depending on the measurement scale, different statistical techniques can be used to summarize numerical data. For example, mode is used to represent the central tendency of nominal data: it is the most commonly occurring number. Another summary representation is the median—a number that separates the data evenly into two parts. Both mode and median can be used for ordinal data. With interval and ratio data, we are allowed to use mode, median, mean, and also measures of dispersion: standard deviation and range.
Let’s continue learning about measurement scales and how they apply to cultural analytics research. We will start with nominal data. One example of such data is parts of speech types: verbs, nouns, adjectives, and so on. Nominal values have no order and no hierarchy. Many categorical systems are like this: all categories are equally important and there is no inherent order among them. However, things are different in social and cultural realms. Some of the most important systems of categories used by many societies—such as sex, race, gender, and ethnicity—are hierarchical. The struggles against these hierarchies have been central to political, social, and intellectual agendas of many countries for a number of decades, while in other countries this started only more recently.
Standard accounts of culture today are also deeply hierarchical. When we start analyzing some cultural data, its metadata often has categories that encode such historical or contemporary hierarchies. This means that category members are implicitly or explicitly ranked according to perceived importance, prestige, and value. European academic art of the seventeenth and eighteenth centuries had a strict hierarchy of genres of paintings, with historical paintings considered to be the most important and still lifes the least important. The culture of the twentieth century was based on a number of binary categories, with one category considered more valuable than another: fine arts versus decorative arts (or design), high culture versus popular culture (or mass culture), avant-garde versus kitsch.8 Our social, lifestyle, and technological systems have changed significantly even just in the last two decades, but such cultural dichotomies established in the nineteenth and twentieth centuries continue to structure how we think about culture. In fact, the dominant view of culture today as used by museums, foundations, and government agencies has not changed from the one expressed by British critic Matthew Arnold in his 1875 book—“culture being a pursuit of our total perfection by means of getting to know, on all the matters which most concern us, the best which has been thought and said in the world . . .” This formulation establishes hierarchical categories. Culture refers to “the best”; everything else is not “culture.” The endless world outside this “best” now has some labels for its different vast continents—culture industries, creative industries, user-generated content, mass media, and so on—but it can’t enter the art museums that carefully guard Arnold’s hierarchy. So you are unlikely to see any respectable art museum presenting an exhibition of great advertising designs, interface designs, Instagram photography, fashion, or any other vibrant area of contemporary culture. This is only possible in design or applied arts museums, such as the London Design Museum, the Victoria and Albert Museum, the Museum of Applied Arts in Vienna, and the Cooper Hewitt Smithsonian Design Museum, or an art museum that has a design or fashion department (the MoMA and the Metropolitan Museum in New York).
It is relevant to recall here that the modern Western opposition between high and low cultures was not universally accepted in the twentieth century. In communist countries—Russia after 1917, and Eastern Europe, China, Cuba, and a number of other countries later—this opposition was not part of the intellectual discourse. Instead, a different opposition was at work—between professional creators and everybody else. The professionals were educated in state schools and belonged to creative unions run by the state. They also had access to resources and were given state commissions. The Western opposition between art and mass media was also not meaningful because the system of professional education, membership in creative unions, and access to resources was the same for painters, composers, film directors, actors, and architects.
We already saw how oppositions between high culture and mass culture and professional versus amateur creators structure the work with big cultural data today. Researchers in computer science and social sciences study social networks, media sharing networks, online forums, blogs, discussion sites (e.g., Reddit and Quora), recommendation sites (e.g., Yelp), and professional networks (e.g., Behance). For me, this is the contemporary culture, and its scale, diversity and global reach is what makes it so interesting to study. But for many academic humanists and people who professionally work in high culture, such as art curators, critics, or festival organizers, social media is mass culture (or pop culture) and should not enter museums and galleries, or be the subject of academic studies. In such a perspective, an artist is only somebody who has a degree from an accredited art program and an exhibition career that includes commercial galleries, art centers, and art museums.
As an example, in 2017, after I delivered a lecture about our lab projects in a seminar organized by PhD students in art history at one of the most prestigious private US universities, one student asked me: “Why study Instagram at all?” I answered that for me it is a unique window into contemporary photography and global visual imagination. The student responded: “Instagram is a company, and like every company, its goal is to make money. Therefore, it is irrelevant for us art historians.” Unfortunately, I have encountered many people in the academy and the arts with this attitude. Of course, professional artists, collectors, galleries, and art fairs are also concerned with making money, so what is the real difference? I think that academics and high culture professionals do not consider people sharing their creations on social networks or sites such as DeviantArt to be professional artists because they don’t have art degrees or exhibition histories. And therefore they all are automatically dismissed as not worth paying attention to.
The examples I just discussed show how deeply hierarchical categories are ingrained even in the communities (such as particular academic fields) engaged in thinking critically about culture or society. More generally, many social and cultural categories may appear to not have any order, but in reality they do. When you are using such already existing categories as part of your dataset metadata, you may unintentionally reproduce their hierarchy in your analysis.
You probably noticed that many of my examples of such hierarchical categories are binary: male/female, high culture/mass culture, art/design, and so on. Although Stevens’s scheme does not separate them into a special type, such binary categories are extremely important in human history and culture. Structuralism in particular emphasized the role of binary oppositions in language, literature, and myths. According to the anthropologist Claude Lévi-Strauss, who was the most influential structuralist, binary oppositions are essential to human thinking.
Another analysis of binary oppositions was developed by linguists Roman Jakobson and Nikolai Trubetzkoy in the 1930s. It is called the theory of markedness. In many linguistic phenomena, one term dominates another term. This first, unmarked term is seen as more common or as requiring less mental effort; the second, marked term is defined in opposition to it. For example, in the opposition honest/dishonest, the first term is unmarked and the second one is marked. Another example is old/young: in English, a question about somebody’s age usually uses the unmarked term (How old are you?). In his 1972 article, Jakobson suggested that “every single constituent of a linguistic system is built on an opposition of two logical contradictories: the presence of an attribute (‘markedness’) in contraposition to its absence (‘unmarkedness’).”9 The theory of markedness was adopted by social sciences and humanities to critique how oppositions such as gender function in society and language. For example, as one researcher pointed out, “In English order matters. Therefore, what comes first is seen as first in the metaphorical sense—higher ranked. Thus, in the phrase ‘men and women’ women do indeed come second.”10
Let’s now look at the other measurement scales in Stevens’s scheme. Data that uses an ordinal scale has an explicit order. For example, questionnaires often ask people to choose one option using a scale that has the following five choices: strongly agree, agree, disagree, neutral, and strongly disagree. This type of ordinal scale was introduced by American social psychologist Rensis Likert in his 1932 PhD thesis.11 Social sciences, marketing research, and opinion and attitude measurements are among the fields that use questionnaires organized according to Likert’s scale. The other two important characteristics of ordinal data are that it has no absolute zero and that the distances between the points cannot be quantitatively defined.
For data that has an interval scale, we can quantify a degree of difference, but there is no absolute zero. An example of interval data is spatial coordinates. Every point on Earth’s surface can be defined via its longitude and latitude. The choice of zero for longitude and latitude is an accepted convention that allows us to quantify distances and positions of points and regions on the Earth’s surface. Spatial coordinates are often used in cultural analytics when we compare cultural characteristics of different geographic areas, study geographic diffusion of new inventions or cultural phenomena, analyze spatial distributions of social media posts, or visualize movements of cultural creators (for an excellent use of this idea, see “A Network Framework of Cultural History”12).
Finally, we come to the ratio scale. Data that uses a ratio scale can both be qualitatively compared and be measured in relation to zero. Examples of ratio data include weight, length, and angle. Zero weight, zero length, and zero angle are all meaningful concepts corresponding to phenomena in physical reality. As I will discuss in the next section, using ratio data whenever possible as opposed to only nominal and ordinal data (i.e., categories) is a key part of the cultural analytics methodology. Rather than only relying on discrete categories that natural languages provide for describing analog dimensions of culture, we can use numerical measurements that better represent values on these dimensions (such as brightness or saturation of a color in an image, or speed and trajectory of movement in a dance).
Most representations of physical, biological, and cultural phenomena constructed by artists, scholars, and engineers so far only capture some characteristics of these phenomena. A linear perspective represents the world as seen from a human-like viewpoint, but it distorts the real proportions and positions of objects in space. A contemporary one-hundred-megapixel photograph made with a professional camera captures details of human skin and separate hairs—but not what is inside the body under the skin.
If the artifacts are synthetic, sometimes it is easy to represent them more precisely. Engineering drawings, algorithms, and manufacturing details used to construct such artifacts are already their representations. However, nature’s engineering can be so complex that even all representational technologies at our disposal can barely capture a miniscule proportion of information. For example, currently the best fMRI machines can capture a brain scan at a resolution of 1 mm. This may seem like a small enough area—yet it contains millions of neurons and tens of billions of synapses. The most detailed map of the universe produced in 2018 by Gaia (the European Space Agency craft) shows 1.7 billion stars—but according to estimates, our own galaxy alone contains hundreds of billions of stars.
And even when we consider a single cultural artifact created by humans and existing on a human scale—a photograph you took, a mobile phone you used to take it with, or your outfit made of items you purchased at Zara or COS—data representations of these artifacts often can only capture some of their characteristics. In the case of a digital photograph, we have access to all the pixels it contains. This artifact consists of 100 percent machine data. However, these pixels to us will look a bit different from one display to the next, depending on its brightness, contrast, and color temperature settings and its technology. Moreover, what we can do with this image is only partially determined by its data. In my 2011 article “There Is Only Software,” I argued that “depending on the software I am using, the ‘properties’ of a media object can change dramatically. Exactly the same file with the same contents can take on a variety of identities depending on the software being used.”13
A digital pixel image is a synthetic artifact fully defined by only one type of data in a format ready for machine processing (e.g., an array of numbers defining pixel values). But what about physical artifacts, such as fashion designs that may use fabrics with all kinds of nonstandard finishes; combine multiple materials, textures, and fabrics; or create unusual volumes? This applies to many collections created by fashion designers by Rei Kawakubo, Dries Van Noten, Maison Margiela, Raf Simons, and Issey Miyake, among others. How do we translate such clothes into data? The geometries of pattern pieces will not tell us about visual impressions of their clothes or the experience wearing them. Such garments may have unique two-dimensional and three-dimensional textures, use ornaments, play with degrees of transparency, and so on. And many fashion designs are only fully “realized” when you wear them, with the garment taking on a particular shape and volume as you walk.
The challenge of representing the experience of material artifacts as data is not unlike calculating an average for a set of numbers. While we can always mechanically calculate an average, this average does not capture the shape of the distribution, and sometimes it is simply meaningless.14 In a normal distribution, most data lie close to the average, but in a bimodal distribution, most data are away from it, so the average does not tell us much.
Similarly, when we try to capture our sensorial, cognitive, and emotional experience of looking at or wearing a fashion garment, all methods we have available—recording a heartbeat, eye movements, brain activity, and other physiological, cognitive, and affective processes, or asking a person to describe their subjective experience and fill out a questionnaire—can only represent some aspects of this experience.
But this does not mean that any data encoding automatically loses information or that our intellectual machines (i.e., digital computers) are by default inferior to human machines (i.e. our senses and cognition). For example, let’s say I am writing about artworks exhibited in a large art fair that features hundreds of works shown by hundreds of galleries. What I can say depends on what I was able to see during my visit and what I remembered—and therefore is constrained by the limitations of my senses, cognition, memory, and body, as well as by the language (English, Russian, etc.) in which I write.
In humanities, the common method of describing artifacts and experiences was to observe one’s own reaction as filtered by one’s academic training and use natural language for describing and theorizing these experiences. In social sciences and practical fields concerned with measuring people’s attitudes, tastes, and opinions, researchers used questionnaires, group observations, and ethnographies, and these methods remain very valuable today. Meanwhile, since the 1940s engineers and scientists working with digital computers have been gradually developing a very different paradigm—describing media artifacts such as text, shapes, audio, and images via numerical features. (These descriptions use the ratio scale in Steven’s scheme; alternatively, they fall into the continuous data category if we use a continuous vs. discrete data scheme.) Cultural analytics adopts the same paradigm, and it is crucial to understand why it is such a good idea. My explanation is summarized in the next paragraph, and the rest of the chapter expands it.
Numerical measurements of cultural artifacts, experiences and processes, give us a new language to describe and discuss culture. This language is closer to how the senses represent analog information. The senses translate their inputs into values on quantitative scales, and this is what allows us to differentiate among many more sounds, colors, movements, shapes, and textures than natural languages can describe. So when we represent analog characteristics of artifacts, interactions, and behaviors as data using numbers, we get the same advantages. This is why a language of numbers is a better fit than human languages for describing analog aspects of culture. (For the examples of our visualizations that use computational measurements of images, see plates 3–8, 10 and 16, and figures I.2 and 10.1.)
Natural languages was the only mechanism in humanities for describing all aspects of culture until the recent emergence of digital humanities. Natural or ordinary language refers to a language that evolved in human evolution without planning. Although the origins of natural languages are debated by sciences, many suggest that they developed somewhere between two hundred thousand and fifty thousand years ago. Natural languages cannot represent small differences on analog dimensions that define aesthetic artifacts and experiences such as color, texture, transparency, types of surfaces and finishes, visual and temporal rhythms, movement, speed, touch, sound, taste, and so on. In contrast, our senses capture such differences quite well.
The aesthetic artifacts and experiences the human species created over many thousands of years of its cultural history exploit these abilities. In the modern period, the arts started to systematically develop a new aesthetics that strives to fill every possible “cell” of a large multidimensional space of all possible values on all sense dimensions, taking advantage of the very high fidelity and resolution of our senses. Dance innovators from Loie Fuller and Martha Graham to Pina Bausch, William Forsythe, and Cloud Gate group defined new body movements, body positions, compositions, and dynamics created by groups of dancers or by parts of a body, such as fingers, or by speeds and types of transitions. Such dance systems are only possible because of our eyes’ and brains’ abilities to register tiny differences in shapes, silhouettes, and movements.
In the visual arts, many modern painters developed lots of variations of a white on white monochrome painting—works that feature only one field of a single color or a few shapes in the same color that differ only slightly in brightness, saturation, or texture. These include Kazimir Malevich (Suprematist Composition: White on White, 1918), Ad Reinhardt (his “black paintings”), Agnes Martin, Brice Marden, Lucio Fontana, Ives Klein, members of the Dansaekhwa movement in South Korea, and many others.
In the twenty-first century, works by contemporary product designers often continue the explorations that preoccupied so many twentieth-century artists. For example, in the second part of the 2010s, top companies making phones—Huawei, Xiaomi, Samsung, Apple—became obsessed with the sensory effects of their designs. Designers of phones started to develop unique surface materials, colors, levels of glossiness of a finish, and surface roughness and waviness. As the phone moves closer and closer to becoming a pure screen or transparent surface, this obsession with sensualizing the remaining material part may be the last stage of phone design before the phone completely turns into screen—although we may also get different form factors in the future, in which the small material parts become even more aestheticized.15
For instance, for its P20 phone (2017), Huawei created unique finishes, each combining a range of colors. Huawei named them Morpho Aurora, Pearl White, Twilight, and Pink Gold. When looking at the back of a phone at different angles, different colors would appear.16 The company proudly described the technologies used to create these finishes on its website: “The Twilight and Midnight Blue HUAWEI P20 has a high-gloss finish made via a ‘high-hardness’ vacuum protective coating and nano-vacuum optical gradient coating.”17 (Huawei Mate 20 Pro I have been using during 2019 had such a screen.)
What about the minimalism that has become the most frequently used aesthetics in the design of spaces in the early twenty-first century, exemplified by all-white or raw concrete spaces, with black elements or other contrasting details? From the moment such spaces started to appear in the West in the second part of the 1990s, I have been seeking them out so I can work in them—hotel areas, cafes, lounges. Today you can find such places everywhere, but in the late 1990s there were quite rare. In my book The Language of New Media, completed at the end of 1999, I thanked two such hotels because large parts of that book were written in their public spaces—the Standard and Mondrian, both in Los Angeles. While not strictly minimalist in a classical way (they were not all white), the careful choice of textures and materials and elimination of unnecessary details was certainly minimalist in its thinking. Later in 2006–2007, I spent summers in Shanghai working on a new book and moving between a few large minimalist cafes there; at that point, Shanghai had more of them than Los Angeles.
On first thought, such spatial minimalism seems to be about overwhelming our perception—asking us to stretch its limits, so to speak, in order to take in simultaneously black and white, big and tiny, irregular and smooth. I am thinking of the famous Japanese rock gardens in Kyoto (created between 1450 and 1500), an example of kare-sansui (“dry landscape”): large black rocks placed in a space of tiny grey pebbles. In 1996, a store for Calvin Klein designed by London architect John Pawson opened in New York on Madison Avenue around Sixtieth Street, and it became very influential in the minimalist movement. Pawson was influenced by Japanese Zen Buddhism, and an article in the New York Times about his store used the phrase “less is less.”18 The photographs of the store show a large open white space with contrasting dark wood benches.19 So what is going on with these examples?
I think that minimalist design uses both sensory extremes for aesthetic and spatial effect, and small subtle differences that our senses are so good at registering. The strong contrast between black and white (or smooth and textured, wood and concrete, etc.) helps us to better notice the variations in the latter—the differences in shapes of tiny pebbles in Kyoto Garden or the all-white parts of the 1996 Calvin Klein store space, which all have different orientations to the light coming in from very large windows.
The most famous early twenty-first-century examples of minimalist design are the all-white and or silver-grey Apple products designed by Jonathan Ive in the 2000s. The first in this series was the iPod in 2001, followed by the PowerBook G4 in 2003, iMac G5 in 2004, and iPhone in 2007. In his article “How Steve Jobs’ Love of Simplicity Fueled a Design Revolution,” Walter Isaacson quotes Jobs talking about his Zen influence: “‘I have always found Buddhism—Japanese Zen Buddhism in particular—to be aesthetically sublime,’ he told me. ‘The most sublime thing I’ve ever seen are the gardens around Kyoto.’”20 In the most famous Kyoto garden, which I was lucky to visit, the monochrome surface made from small pebbles contrasts with a few large black rocks. In Apple products of the 2000s, the contrast between the all-white object and the dark, almost black screen when the device is turned off, made from different materials, works similarly. It makes us more attentive to the roundness of the corners, the shadows from the keys, and other graduations and variations in the graytones and shape of the device.
In general, minimalism is everything but minimal. It would be more precise to call it maximalism. It takes small areas on sensory scales and expands them. It makes you see that between two grey values there are in fact many more variations than you knew (I call this aesthetics common in South Korea today “50 shades of grey”): that the light can fall on a raw concrete surface in endless ways; that the edge in the textured paper cut into two parts by hand contains fascinating lines, volumes, and densities. Our senses delight in these discoveries. And this is likely one of the key functions of aesthetics in human cultures from prehistory to today—giving our senses endless exercises to register small differences, as well as bold contrasts. Minimalism cleans our visual, spatial, and sound environment from everything else, so we can attend to these differences between a few remaining elements. To enjoy that less is less.
For thousands of years, art and design have thrived on human abilities to discriminate between very small differences on analog dimensions of artifacts and performances and to derive both pleasure and meaning from this. But natural languages do not contain mechanisms to represent such nuances and differences. Why? Here is my hypothesis. Natural languages emerged much later in evolution than the senses, to compensate for what the latter cannot do—represent the experience of the world as categories. In other words, human senses and natural languages are complementary systems. Senses allow us to register tiny differences in the environment, as well as nuances of human facial expressions and body movements, whereas languages allow us to place what we perceive into categories, to reason about these categories and communicate using them.
Evolution had no reason to duplicate already available functions, and that is why each system is great at one thing and very poor at another. Our senses developed and continued to evolve for billions of years—for instance, the first eyes developed around five hundred million years ago during the Cambrian explosion. In comparison, the rise of human languages and their categorization capacities is a very recent development.
When we use a natural language as a metalanguage to describe and reason about an analog cultural experience, we are doing something strange: forcing it into small number of categories that were not designed to describe it. In fact, if we can accurately and exhaustively put into words an aesthetic experience, it is likely that this experience is an inferior one. In contrast, using numerical features instead of linguistic categories allows us to much better represent nuances of an analog experience. (In Stevens’s scheme, numerical means data that uses ordinal, interval, and ratio scales. Or, if we use a simpler scheme of continuous versus discrete data, numerical means continuous.)
Our sensors and digital computers can measure analog values with even greater precision than our sense organs. You may not be able to perceive a 1 percent difference in brightness between two image areas or 1 percent difference in the degree of smile between two people in a photo, but computers are able to measure these differences. For example, for Selfiecity we used computer vision software that measured the degree of smile in each photo on a 0–100 scale. I doubt that you will be able to differentiate between smiles on such a fine scale.
Consider another example—representation of colors. In the 1990s and 2000s, digital images often used twenty-four bits for each pixel. In such a format, each pixel can encode grayscale using a 0–255 scale. This representation supports sixteen million different colors—while human eyes can only discriminate among approximately ten million colors. Today many imaging systems and image editing software programs use thirty, thirty-six, or forty-eight bits per pixel. With thirty bits per pixel, more than one billion different colors can be encoded. Such precision means that if we want to compare color palettes of different painters, cinematographers, or fashion designers using digital images of their works, we can calculate it with more than sufficient accuracy. Certainly, this precision goes well beyond what we can do with the small number of terms for colors available in natural languages.21 Some natural languages have more terms for different colors then other languages, but no language can represent as many colors as digital image formats.
In summary, a data representation of a cultural artifact or experience that uses numerical values or features computed from these values can capture analog dimensions of artifacts and experiences with more precision than a linguistic description. However, remember that a natural language also has many additional representation devices besides single words and their combinations. They include the use of metaphors, rhythm, meter, intonation, plot, and other devices that allow us to represent experiences, perceptions, and psychological states in ways that single words and phrases can’t. So though natural languages are categorical systems, they also offer rich tools to go beyond the categories. Throughout human history, poets, writers, and performers have used these tools, and the best hip- rap artists today such as Oxxymiron and Tatarka create exceptional works by employing them as well.
Not everybody can invent great metaphors. Numerical features allow us to measure analog properties on a scale of arbitrary precision and they can be extracted automatically from any number of aesthetic artifacts using computers. But this does not mean that data representations of artifacts, processes, and performances that use numbers can easily capture everything that matters.
In the beginning of the twentieth century, modern art rejected figuration and narration and decided instead to focus on sensorial communication—what Marcel Duchamp referred to as “retinal art.” But over the course of the twentieth century, as more possibilities were fully explored and later became new conventions, artists started to create works that are harder and harder to describe using any external code, be it language or data. For example, today we can easily represent the flat geometric abstractions of Sonia Delaunay, František Kupka, and Kasimir Malevich as data about shapes and colors and sizes of paintings and drawings—and we can even encode details of every visible brushstroke in these paintings. (Computer scientists have published many papers that describe algorithmic methods to authenticate the authorship of paintings by analyzing their brushstrokes.) But this becomes more difficult with new types of art made in the 1960s and 1970s: light installations by James Turrell, acrylic 3-D shapes by Robert Irvin, earth body performances by Ana Mendieta, and happenings by Allan Kaprow (to mention only the most canonical examples), as well as the works of thousands of other artists in other countries, such as the Движение art movement in the USSR. The works of the latter included Cybertheatre, staged in 1967 and published in the Leonardo journal in 1969.22 The only actors in this theatre performance were fifteen to eighteen working models of cybernetic devices (referred to as cybers) capable of making complex movements, changing their interior lighting, making sounds, and omitting colored smoke. For something less technological, consider Imponderabilia by Marina Abramović and Ulay (1977): for one hour, the members of public were invited to pass through the narrow “door” made by the naked bodies of the two performers.
The experience of watching the documentation left after an art performance is different from being present at this performance. What can we measure if an artwork is designed to deteriorate over time or quickly self-destructs, like Jean Tinguely’s Homage to New York (1960)? Similarly, while the first abstract films by Viking Eggeling, Hans Richter, and May Ray made in the early 1920s can be captured as numerical data as easily as geometric abstract paintings by adding time information, how do we represent Andy Warhol’s Empire (1964), which contains a single view of the Empire State Building projected for eight hours? We certainly can encode information about every frame of a film, but what is crucial is the physical duration of the film, its difference from the actual time during shooting, and very gradual changes in the building’s appearance during this time. The film was recorded at twenty-four frames per second and projected at sixteen frames per second, thus turning a physical six and a half hours into eight hours and five minutes of screen time. (Very few viewers were able to watch it from beginning to end, and Andy Warhol refused to show it in any other way.)
Modern art of the last sixty years is one of the cultural areas that presents a challenge for cultural analytics methods. What about other cultural expressions that exploit three-dimensional textures, transparency, volumes, and our senses of touch, taste and smell—for example, fashion, perfume, food, architecture, and space and object design? I have already discussed the challenges of capturing in numbers the experiences of seeing or wearing many fashion items that work with dimensions that can’t be easily read from fashion photographs—volumes, structures, differences in material surfaces. This is how Hadley Feingold describes a famous dress from Hussein Chalayan’s 2000 spring/summer collection in her post “Sculptural Fashion”: “‘Remote Control’ (often referred to as the Airplane dress) . . . is made of fiberglass and resin composite and has flaps that open via remote control. What is revealed beneath is a soft mass of tulle. What is truly striking here is the modular fabrication, at once sleek and impersonal, that opens up to a soft, more human-like interior that is at the same time no less structured.”23
How can we capture as data at least some of the unique characteristics and subtle differences in numerous designed products and experiences? Think of descriptions of new perfumes, cars, or drinks offered to consumers: here writers employ adjectives, historical references, and metaphors. Now think of the product development, marketing, advertising, and other departments of companies that make and offer these products in many markets. They perform user studies by employing questionnaires, focus groups, in-depth interviews, ethnographic methods, self-reports; they analyze people’s behaviors and interactions with brands and ads on social networks using interaction data; they capture biometric and brain activities using a variety of techniques ranging from hear rate measures to fMRI.
Many techniques used in consumer marketing research, advertising research, and brand management are the same as in sociology, anthropology, human-computer interaction, political science, and especially psychology. In all areas, methods such as surveys, questionnaires, and physiological recordings are used to quantify aspects of human experiences and human understanding of themselves, other people, products, and situations. These methods compensate for our inability to directly measure human cognitive and emotional processes and states. For example, rather than using some objective measurements of pain, a doctor asks you to characterize your experience of pain on 1–10 scale. (In 2013, researchers were able to successfully measure levels of pain using fMRI, but given its cost and equipment required, you will not see this method in a regular doctor’s office today.24) Over time, new technologies and improvements in existing ones gradually improve our ability to directly measure such states, but this is a slow process.
What does this mean for cultural analytics methods? Instead of measuring cultural artifacts, we can instead measure human perceptions and experiences of these artifacts and our interactions with them. In this paradigm, human experience becomes the common denominator that allows us to bypass the challenge of measuring multisensory or ephemeral offerings. To do this, we can rely on methods used in HCI,25 marketing research, attitude and opinion measurements, and experimental psychology, as well as draw on theories of cultural reception in humanities. So if we feel that important dimensions of certain types of cultural objects such as fashion, food, designed spaces, and music can’t be captured directly, we can instead measure people’s perceptions of these objects and experiences with them.
But will such a paradigm shift—from measuring artifacts and communication (i.e., extracting features from texts, music, images, video, 3-D designs, etc.) to measuring sensations, perceptions, emotions, feelings, meanings, and attitudes—fully solve the problem? Focusing on a human receiver rather than on artifacts and messages (e.g., blog posts) brings its own challenges.
If we measure body or brain activities using various technologies such as eye tracking, EEG, fMRI, and so on, the output are numerical metrics. This data can be then analyzed algorithmically to create more compact numerical representations or mapped into categories. For example, Emotiv makes consumer and professional headsets that records EEG readings and translate them into measurements of levels of interest, excitement, relaxation, focus, and stress. The measurements are presented as values on a 0–100 scale.26 Another company, Affectiva, infers emotional and cognitive states using video and audio of a person. One of its products is the Automotive AI, which measures degrees of alertness, excitement, and engagement, levels of drowsiness, joy, anger, surprise, and laughter, and overall positivity or negativity of mood. Such monitoring can be used for alerts and recommendations when the driver is engaged, and also can automatically hand control from a driver to a car in semiautonomous vehicles.27
At present the theories behind such measurements do not differentiate among many kinds of stimuli that cause the same kinds and levels of emotions, focus, or relaxation. They can’t help us understand how interactions between a person and an interactive art installation are different from interactions between a driver and a car—or distinguish among the interactive computer installations of Myron Krueger, David Rockeby, Jeffrey Shaw, Masaki Fujihata, or Char Davies (to use examples of important artists who pioneered this genre);28 our responses to fashion collections by Rei Kawakubo and Rick Owens; or buildings by Zaha Hadid and MAD. The same goes for popular sentiment measurements of texts: they can’t distinguish between you reading a political speech and reading Tolstoy’s Anna Karenina.
In general, our technological measurements and data representations of human emotional and cognitive states, memories, imagination, or creativity are less precise and detailed than our measurements of artifacts. While computer vision algorithms can extract hundreds of different features from a single image, EEG or FMRI at present measure fewer dimensions of human cognitive processes. The gradual progress of technology will allow for more precision and specificity—but it can take decades. In contrast, measures of artifacts’ properties are very easy to obtain on a large scale, which is important for cultural analytics. Think of histograms of three color channels (R, G, and B) that your camera or editing software constructs for any photo. Now imagine similar histograms for line orientations, texture, shapes, faces and objects, and dozens of other dimensions computed over millions of photographs. And now compare this to the measurements of seven “universal human emotions” or levels of excitement and engagement popular today. The difference in fidelity is obvious. Perhaps most importantly, I can carry out image measurements of these millions of photographs on my computer—as opposed to recruiting humans to participate in research that uses EEG, FMRI, and other such technologies29 or getting their permission to use data captured by personal devices such as fitness trackers.
The challenge of precisely characterizing human responses to cultural objects and situations and doing it at scale can’t be solved in a simple way. This is why in our own lab we have been privileging analysis of objects rather than reactions—which I also could have been doing due to my background in experimental psychology. However, I think that quantitative analysis of responses, participation, and interaction will eventually become a popular or even the most important method for studying culture. In academic disciplines concerned with culture—from literary studies, architecture theory, and visual culture to media studies, urban anthropology, and internet studies—researchers so far have not adopted the techniques for cognitive and emotional measurements widely used in interaction design or marketing research (this field is called neuromarketing). So if you want to get two degrees—one in humanities, social sciences, or design, and the second in neuroscience—you will be equipped to participate in the future theoretical and practical shift from studying objects to studying perceptions and experiences. For example, imagine a film review that does not talk about the film’s story and characters; instead, it will analyze human perception, cognition, and emotional data.
I have argued that extracting numerical features is often a better way to represent cultural phenomenon than using linguistic categories. However, would you want to visit a museum of contemporary art that does not use any categories at all and instead organizes all its objects regardless of their origins only by numerical measurements, such as image size, proportions, colors, grayscale histogram, or line curvatures? Actually, I would. To suspend all categories in this way would be a refreshing experience. However, given the role played by languages in how we understand the world and communicate with others and the fact that natural languages developed evolutionarily to categorize, cultural categories are not going to disappear tomorrow.
Cultural categories are instruments of powers, used to include, exclude, dominate, and liberate. Evolution of human culture includes changes in categorical systems and “wars of categories.” For example, during the Cold War (1947–1991), countries were categorized as belonging to the first world (capitalist countries), the second world (developed communist countries), or the third world (developing nonaligned countries). After the collapse of communist governments in 1989–1991, the third world eventually came to be called the Global North—as opposed to the Global South. This new term “emerged in part to aid countries in the southern hemisphere to work in collaboration on political, economic, social, environmental, cultural, and technical issues.”30
As a part of this reconfiguration of conceptual geography, the distinction between the first and second world faded away. Was it appropriate to dissolve their differences? How do we account for the hysteresis effect of the communist past of the former second world today? Recently, the term Global East has been used to refer to these countries.31 Like Global South, the new term aims to increase visibility and to compensate for “a double exclusion in analysis, whereby post-socialist cities are neither at the centre nor the periphery, neither mainstream nor part of the critique.”32
Categories structure our views of social, economic, and cultural phenomena. They can reshape both how we view the past and the futures we construct—by channeling our energy and resources in particular directions. The goal of cultural analytics is not to abandon all historical, genre, media, and other cultural categories. Instead, we want to examine systems of cultural categories that are taken for granted. This means asking a number of questions:
Given that all cultural institutions still use categorical systems to structure their own and our understandings of cultural artifacts, as well as their production, exhibition, and archiving, examination of these categories is an important part of the cultural analytics program. For example, Wikipedia’s “List of Subcultures” article has 130 entries,34 while Japan’s manga industry classifies manga using four categories of readers defined by age and gender, and further by a few dozen genres. Whenever you are working with existing or creating new cultural datasets, remember that you can modify existing categories for research purposes or define new ones. You should also consider how the existing categories are organized (e.g., as a hierarchy or as a flat system), how they developed and changed over time, the differences in categories used in different geographic regions and different institutions, and the relations between categories used by professionals, academics, and general audiences.
Similar questions are often asked in humanities and social sciences, so is there something unique that cultural analytics brings here? I think so. In their influential 1999 book Sorting Things Out: Classification and Its Consequences, Geoffrey Bowker and Susan Leigh Star write: “A classification is a spatial, temporal or spatio-temporal segmentation of the world. A classification system is a set of boxes (metaphorical or literal) into which things can be put in order to then do some kind of work—bureaucratic or knowledge production.”35 The idea of segmentation is very relevant. Cultural phenomena and their particular dimensions are often continuous, but the representations of these phenomena by cultural and academic institutions and discourses segment them into discrete categories. However, because we can now measure continuous dimensions with algorithms and represent them as numbers with arbitrary precision, discrete categories are no longer the only choice. Instead, we can represent cultural phenomena as distributions of continuous features. These distributions then can be compared with existing discrete categories for the same phenomena.
One of the key ideas of cultural analytics is to combine two directions of analysis: top-down analysis using existing categories and bottom-up analysis using extracted continuous features. Bottom-up here means extracting features and then visualizing the phenomenon using these features. We can visualize distributions of single features (histograms), two features together (scatter plots or heatmaps), or multiple features (pair-wise scatter plots; parallel coordinates; scatter plots that use MDS, PCS, t-SNE, UMAP, or other dimension-reduction techniques). Top-down here means superimposing existing cultural categories for our data onto these visualizations (using color or another technique). In addition to visualization, we have a number of unsupervised machine learning methods for examining the structure of feature space, such as cluster analysis and dimension reduction. These methods are part of the cultural analytics toolkit. We used them in many of our lab projects (e.g., see the visualization in plate 3). They are covered in numerous data science classes, textbooks, and tutorials, so you can learn them on your own. The section called “Analysis Examples” later in this chapter illustrates using top-down and bottom-up analysis together.
I believe that the majority of cultural artifacts and phenomena in human history have continuous dimensions best represented by numerical features. In contrast, the majority of our conceptual landscapes of cultural fields today and in the past consist of discrete categories. By superimposing these discrete categories on continuous distributions of features, we can better see how the categories divide the phenomena and if they are adequate. For example, the categories may correspond to breaks in features’ distributions, which means that they do capture real divisions. Or they may arbitrarily divide a continuous distribution, leading us to think that there are distinct classes when in reality they don’t exist. Or the distributions may have distinct breaks, but they are not reflected in the categories (see plate 9).
We can now add two more questions to the three listed earlier:
Only after we ask these questions can we with some confidence decide what is more appropriate for describing a given phenomenon: categories or numerical features.
Let’s say you receive a cultural dataset from a museum, a public depository on the web, or any other source. The dataset lists some artifacts and includes some categories. Will you focus on analysis of data using categories, without having to extract features? Or is it better to extract various features right away? I believe that you can begin with one of the following two hypotheses as a starting point.
Hypothesis 1: If the cultural artifacts were created using prescriptive aesthetics, the existing categories are likely to be meaningful. This is often the case with historical cultural phenomena before the twentieth century. However, you may still want to extract numerical features from your dataset to examine the variations within each category and to check how good the fit is between categories and features. Following this analysis, the categories may need to be revised.
Hypothesis 2: If the cultural artifacts were created by authors that did not follow any prescriptive aesthetics (i.e., there were no explicit rules) and we can easily observe significant variability in the dataset, representing these artifacts using continuous features is appropriate. This is often the case for the modernist period after 1870 when the goal of many artists was to keep inventing the new. (However, many others continued to practice prescriptive aesthetics.)
As an example of prescriptive aesthetics, consider architectural orders in Ancient Greek and Roman civilizations. Originally the Greeks used three orders—the Doric, Ionic, and Corinthian. Romans added two more: the Tuscan and the Composite. The orders define details of building columns, including their proportions, type of decoration and profiles, and other building elements. Like Instagram filters today, choosing a particular order does not dictate the complete design of the building—but it does give it a particular “look.” In European art, prescriptive aesthetic systems were particularly important during the seventeenth and eighteenth centuries. For example, in theatre, French playwrights followed a system of rules that called for unity of time, unity of place, and unity of action. The action has to unfold within a single twenty-four-hour period, in a single location, and follow a single plot line.
Modernist artists revolted against prescriptive aesthetics and conventions. But this revolt took a strange direction: groups of artists aided by art theorists or journalists started to create their own new prescriptive aesthetics, and each group claimed that their aesthetics was the only true modern art. These movements included futurism, fauvism, cubism, orphism, rayonism, expressionism, vorticism, constructivism, surrealism, and others.
A few of these styles, such as cubism, constructivism, and surrealism, became popular and were adopted by other artists. Still, all twentieth-century artworks that follow various isms account for only a tiny part of professional art produced during the century. This larger part is omitted from the standard art historical narrative of modern art presented as progression through isms. All these other artists did not write manifestoes and did not create brands. We only have a single category for these millions of artworks: figurative art (or realist art). Obviously, this one category can’t capture all the different visual languages, types of content, feelings, and sentiments seen in figurative artworks created in different countries during the century. Creating large datasets of these artworks, extracting numerical features, and then visualizing and clustering them using these features may allow us to develop more inclusive maps of modern art.
Modernism also gives rise to many prescriptive aesthetic systems. The twelfth tone technique in music (1921–), neoplasticism, writing by members of the Oulipo group (1960–), Dogma 95 cinema movement (1995–), Apple’s Human Interface Guidelines (1987–), and the flat design movement in UI (2006–) are some of the relevant examples. It is important to remember that a prescriptive aesthetic, design, or communication system does not lock down all elements of the work; it only limits parameter variations on some dimensions or only a single dimension, such as the original 140 characters restriction in Twitter or the original square image format in Instagram. Extracting features and using numerical data representation allows us to capture variations across many works that follow some prescriptive aesthetics.
However, introducing subcategories inside existing categories or adding new categories to an existing system can be also a radical move. If we have a system of five categories describing some cultural field, and we enrich it by adding fifteen more, this increases the resolution of the map these categories provide. Sometimes we may need five hundred categories, and other times five hundred thousand. Today the processes of creating such large categorical systems are algorithmic (e.g., cluster analysis), so we don’t need to decide a priori how many we will have or fix the criteria.
Thus, categorical systems function in new ways in the data science era—they can be generated on the fly, changed at any time, and have as many members as necessary. As opposed to always being rigid and constraining, categories acquire dynamism, flexibility, and plasticity. Therefore, it is wrong to assume that cultural analytics aims to always in all situations replace existing cultural categories by continuous numerical features. Quantifying a phenomenon and using data science methods to establish a more detailed categorical system can be as productive. In this respect, cultural analytics is the opposite of the movement in structuralism to reduce the variety of cultural phenomena to a small number of fundamental structures and binary oppositions (de Saussure, Lévi-Strauss, Greimas). Instead of such reduction, cultural analytics wants to multiply and diversify categories, replace rigid categories by fuzzy ones, and recategorize the phenomena using computational methods.
For a good example of such algorithmic recategorization, consider maps of science—network graphs showing connections among many publications in academic fields.36 One such well-known map created in 2007 used 7.2 million papers published in sixteen thousand academic journals and indexed in Elsevier’s Scopus (2001–2005) and Thomson Reuters’s Web of Science indexes for science, social science, and arts and humanities (2001–2004).37 The map shows connections among research paradigms with colors indicating larger divisions, such as social sciences, humanities, brain research, and so on. Rather than using standard lists of academic disciplines, the authors proposed a new method for algorithmically discovering science paradigms. In the words of the researchers: “The problem is simple: disciplines don’t capture the unique multidisciplinary activities of sets of researchers. Researchers at a university or located in a region (state or nation) tend to self-organize around sets of multi-disciplinary research problems.”38 Therefore, rather than taking a system of academic disciplines for granted, the authors clustered individual articles using joint citations. The clustering produced 554 research paradigms. The comparison of the maps of science generated by new and old methods showed that the former better captures the research strengths of a single university or a country. One of our 2008 designs for a hypothetical cultural analytics interface uses a map of science layout, with clusters of works or types of aesthetics acting as the equivalent of academic research paradigms (see plate 1).
Today we encounter computationally generated categories in our everyday digital lives. The following example illustrates this using the targeting option in Twitter Ads—a Twitter service available to all its users. Targeting means selecting a particular audience for advertising messages. In Twitter’s case, this may mean showing my particular tweets to additional Twitter users in addition to my followers, who may see these tweets anyway. The standard targeting method is to select the audience by using explicit categories: for example, I want my tweets to be shown to people of both genders, age 25 to 34, in particular countries. But Twitter also offers a newer algorithmic method (2014–) to select “follower look-alikes.” Facebook also has “lookalike audiences” option. To use this method, I first need to specify some users—for example, by uploading a list of particular accounts. The algorithm then automatically finds a new audience with characteristics similar to these users on the list. Alternately, I can ask Twitter to build a new audience of users with characteristics similar to my existing followers.
Importantly, the category of follower look-alikes is not defined explicitly; that is, I don’t need to select any values on any parameters. Twitter’s algorithms compute features of my followers automatically and find users with similar characteristics. And if this system uses supervised machine learning, it’s likely that nobody can tell what these features are and how they are combined because this information is distributed over the millions of connections of a neural network. Here we have a new type of category: it is not defined by a human and can change at any moment. According to many reports, this method works better than traditional audience selection, so it’s used by millions of people and businesses advertising on social networks every day.
Imagine that one million people are using the look-alike method today, so Twitter’s algorithms build one million categories. We don’t know how they are defined exactly, but they perform. This is quite a radical departure from traditional category systems.
Humanists like to refer to a quote from a 1942 essay by Jorge Luis Borges entitled “The Analytical Language of John Wilkins” that became famous after Foucault used it in his book The Order of Things: An Archeology of Human Sciences (1966).39 The story says that “according to some Chinese encyclopedia . . . animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.” This quote is often invoked to support the idea that categories can be arbitrary, be nonsystematic, and differ from culture to culture. This is all true—but to me it is more interesting to ask about how categories function differently in our data society and how this is different from earlier periods. Using algorithms that can process very large datasets almost instantly allows generation of dynamic categories, categories that are not defined explicitly but use patterns that computers detect in the data, and systems that can have lots of categories as opposed to only a few.
Having introduced the idea that top-down and bottom-up analysis can be combined, I will now illustrate this with two examples from our lab projects. The first example uses a dataset of 776 digital images of paintings Vincent van Gogh created between 1881 and 1890. The images were collected by students in my classes in 2010 from public websites. We included this dataset in the free distribution of our ImagePlot visualization software (2011–).40 Along with metadata for each painting, such as the title, I added a number of visual features extracted from each image: mean, median, and standard deviations of brightness, saturation, and hue; numbers of distinct shapes; and average shape size. (A shape here is any area in the image that is perceived as distinct because it has a different color or brightness from other shapes.) We developed software that uses MATLAB and OpenCV to extract hundreds of other visual features, but since the van Gogh dataset was meant for learning ImagePlot, I wanted to keep the list of included features short.
I decided to use digital images of van Gogh’s paintings for a few reasons. With many artists, we don’t know exactly when their works were created. But for van Gogh, we know the year, the month, and often even the week for most paintings because he described his new paintings in over seven hundred letters to his brother, who supported the artist financially. (All letters with links to the images of the paintings they describe are available from the excellent vangoghletters.org website.) Because we know the month when each painting was created, these 776 paintings covering only ten years (1881–1890) are perfect for studying gradual changes in the artist’s visual and semantic language.
The second reason is the existence of well-established categories for understanding and presenting van Gogh’s art. These categories are places where he lived: Belgium and Netherlands (1880–1886), Paris (March 1886–January 1888), Arles (February 1888–April 1889), Saint-Rémy (May 1889–May 1890), and Auvers-sur-Oise (May–July 1890). Many art historical and popular accounts divide van Gogh’s artistic biography into style periods corresponding to these places. Thus, geographic categories are used to rationalize stylistic categories.
For instance, this is how the Vincent van Gogh Museum in Amsterdam describes the changes in van Gogh’s style after he moves to Paris in 1886: “Soon after arriving in Paris, Van Gogh senses how outmoded his dark-hued palette has become. . . . His palette gradually lightens, and his sensitivity to color in the landscape intensifies . . . his brushwork [becomes] more broken.”41 And this is a description of the artist’s works that he created after he moved to Arles in the South of France in 1888: “Inspired by the bright colors and strong light of Provence, Van Gogh executes painting after painting in his own powerful language. Whereas in Paris his works covered a broad range of subjects and techniques, the Arles paintings are consistent in approach, fusing painterly drawing with intensely saturated color.”42
Are the descriptions of the differences between these periods accurate? Do changes in the artist’s style indeed perfectly correspond to his moves from one place to another? And is it appropriate to think of his (or any other artist’s) development as a succession of distinct periods? If art historians acknowledge that some changes are gradual (“his palette gradually lightens”), can we make these statements more precise and quantify such changes? These are all good questions for the combined top-down and bottom-up analysis, which I will illustrate with three visualizations.
In the first visualization, in plate 6, we see images of 776 van Gogh paintings positioned according to their average brightness (y-axis) and their dates, represented by year and month (x-axis). Here I use the median as the measure of the average brightness. Although we are only considering a single visual dimension, it already becomes clear that the well-established view—that the artist’s style systematically changes as he moves from place to place—does not hold. During van Gogh’s time in Paris, Arles, and Saint-Rémy, he continues to create some very dark paintings typical of his earlier years. Moreover, even within short periods of time, the range in brightness in the paintings the artist makes is significant. This to me suggests two ideas. One is that we should not think of a style as a narrow line that moves through time. Instead, it is more like a wide river that does change directions, but only gradually. The second idea is that the new visual inventions by van Gogh made in each place where he comes to live do not apply to all his works created there. Instead, the new coexists with the old (e.g., very dark paintings that we still find after van Gogh moves to France).
The second visualization, shown in plate 7, allows us to compare all 776 images using two visual features together: median brightness (x-axis) and median saturation (y-axis). On the brightness axis, the earlier paintings created from 1881 to 1885 are mostly on the left; the paintings created from 1885 to 1890 occupy the center and the right part. On the saturation dimension, his mature paintings occupy the lower part; that is, their average saturation is not high. Even his famous Arles paintings still rarely fall into the upper part. This is a surprising finding given the museum characterization of these paintings as having “intensely saturated color.” The statement is not wrong, but it is imprecise. Some of the Arles paintings are more saturated than the Paris ones, but not all of them.
To investigate further the differences between the Paris and Arles periods, I created a third visualization, which compares all Paris and all Arles paintings in our dataset side by side (see plate 8). We use the same features as in the previous visualization: median brightness (x-axis) and median saturation (y-axis). Notice how the brightness and saturation values of the Paris and Arles paintings significantly overlap. This strengthens what the first visualization already suggested: the commonly accepted division of van Gogh’s works into stylistic periods based on places where he was living may need to be reconsidered.
One of the reasons for the conventional opinion that van Gogh’s style was radically changing is that we are used to looking at a small number of works. In general, often when we think about a particular artist, we consider only their most famous works. Such works may exaggerate the differences among periods. The history of the historical formation of van Gogh’s canon—his most often reproduced works—is complex and long, but it is possible that one of the reasons that certain works were selected over others is that they emphasize these differences. When we systematically compare most of the paintings created in Paris and Arles by visualizing them using extracted features, we can see that the differences between the two sets are smaller than we could have expected by only looking at the famous works.
We can also now better understand the nature of these differences. First, van Gogh’s paintings created in Paris have significantly more variability in both brightness and saturation values than the paintings created in Arles. Second, the center of the “cloud” formed by all of Arles paintings in plate 8 is shifted to the left and to the top. In other words, the Arles paintings are overall both lighter and more saturated than the Paris paintings. However, this difference is smaller than the usual narrative about van Gogh’s art may suggest. Calculating averages of the mean and standard deviations of brightness and saturation for the paintings from the two periods quantifies these observations. The brightness averages of all paintings created in the two cities are 129.83 (Paris) and 158.51 (Arles); the averages of saturation are 95.70 and 109.28, respectively (on a 0–255 scale). The standard deviations of brightness values are 51.65 (Paris) and 34.71 (Arles); the standard deviations of saturation values are 40.59 and 36.30, respectively.
The measurements of the spread of brightness and saturation values support one of the statements on the museum site: “Whereas in Paris his works covered a broad range of subjects and techniques, the Arles paintings are consistent in approach, fusing painterly drawing with intensely saturated color.”43 Indeed, standard deviations of both brightness and saturation averages across all Paris paintings are smaller than those of all Arles paintings. We also now understand that the intensity of many Arles paintings is achieved by changes in two visual characteristics working together, as opposed to only changes in saturation. In other words, van Gogh simultaneously increases both saturation and brightness of his colors. In fact, the average change in brightness is larger than the change in saturation: 18 percent versus 12.4 percent.
Of course, these two features do not cover all aspects of van Gogh paintings; if we want to characterize his visual languages more fully, we will need to create a number of such representations using different combinations of features.44 (Three different visualization programs we wrote in the lab support the rapid creation of such multiple visualizations.) Alternatively, we can use unsupervised machine learning methods that project many features onto a lower-dimension 2-D or 3-D space that we can see directly. Together these methods are referred to as dimension reduction. For example, the visualization in plate 3 uses a method called principle component analysis (PCA), applied to two hundred features we extracted from images of impressionist paintings. In contrast to plots that sort images according to two features (e.g., average brightness and saturation), PCA and other dimension-reduction methods can group images according to many features at the same time.
Let’s now look at the second example of using existing cultural categories and feature extraction together. One prominent example of cultural categories is genres. In digital humanities, the quantitative analysis of literary genres in history led to some of the most interesting work in the field. They include the investigation of patterns in the rise and fall of forty-four genres of British novels from 1740 to 1990 by Franco Moretti;45 tracing the gradual separation of the literary languages of novels, poetry, and drama from the ordinary everyday language in 1700 to 1900 by Ted Underwood and Jordan Sellers; and other projects.46
But genres are also important for contemporary culture. We can look at genre categories established in the culture industry, using as a starting point many theories of genres proposed by theorists of popular music, cinema, and other fields. We can also apply computational text analysis to discussions in online fan communities and media sharing sites to study how audiences understand and use genre categories. In all these cases, applying top-down and bottom-up analysis allows us to compare two maps of a cultural field: one with discrete categories and another showing continuities using extracted features. I will describe our lab project One Million Manga Pages (2010–2011) to illustrate how these ideas work in practice.47
The project analyzes 883 manga publications, using data and over one million images downloaded from the fan site OneManga (onemanga.com) in 2009.48 At the time, this was the most popular site for scanlation—manga publications scanned by fans, with the text translated into different languages. The manga publications are structured as series of chapters that may appear over periods ranging from a few months to many years. The metadata for each series we downloaded included names of authors and artists, publication periods, the intended audience, and tags describing their genres.
The Japanese manga publishing industry divides the market for manga into four categories based on age and gender: teenage girls, teenage boys, young women, and young men (shoujo, shōnen, seinen, and josei). Each manga publication series was assigned one of these categories. The fans used thirty-five genre categories to tag manga series on the site. Many series had multiple tags. For example, the very popular series Naruto was tagged as action, adventure, anime, comedy, drama, and fantasy; Bleach had all these tags, plus supernatural; Nana was tagged as anime, drama, live action, and romance. I found that the average number of genre tags was 3.17 for shoujo manga and 3.47 for shōnen manga, indicating more genre diversity in tags used for titles intended for male teenagers.
Figure 7.1 shows a network graph representing connections among all genre categories. Line thicknesses indicate the strength of a connection—that is, how often the two tags are used together. The size and brightness of the genre name indicates the frequency of each tag: more frequent ones are bigger and darker. We can see that some genres are connected to many others; others are connected to only a few. Thus, this graph maps what we can think of as genre affordances: which combinations are very popular, which combinations are still possible although they not popular, and which combinations are impossible.
Figure 7.2 is a graph showing connections between genre categories and audience categories. To make the graph easier to see, I included only the twelve most frequent tags. Some connections are predictable (male teenagers and young men prefer action), but others are less so. For example, both male and female teenagers turn out to prefer comedy, but not young men. The connection between the romance and drama genres is stronger than for romance and comedy; most romances are set in schools (tagged as school life). Interestingly, none of the top genres is exclusive; all of them are connected to other genres. Serious manga fans and industry professionals are maybe aware of such patterns—but likely not all of them since some patterns only appear when we map hundreds of titles together.
In addition to the metadata for 883 manga publications, we also used for analysis the images of all pages of these publications available on the site—1,074,790 unique pages in total. Each manga page contains a few frames featuring grayscale drawings. In figure I.1, you see an image plot all of all pages organized by two visual features. Together, a number of pages make one chapter, and a number of these chapters together make up one publication. Each publication is drawn by a single author (sometimes with the help of assistants) and has a consistent visual style. We wanted to see if there are connections among these styles, the four audience segments of the manga industry, and the forty-one genres tagged by fans. In other words, are there systematic differences in some aspects of the style of manga publications authored for different audiences? And are there also some systematic differences among manga in different genres?
Defining and measuring aspects of visual style is a challenging problem, and we were able to only start exploring this for manga in the One Million Manga Pages project. However, these initial explorations led to interesting results. We extracted eight grayscale features from each page in our dataset and compared their summary statistics for publications designed for female and male audiences. The average brightness of all manga drawings in our dataset from shoujo publications (for teenage girls) was 203.17 (measured on a 0–255 scale). For manga drawings from shōnen publications (for teenage boys), it was significantly darker: 184.19. The difference was even stronger between manga aimed at young females and young males (josei and seinen manga): 205.45 versus 184.44. Comparing the average values for all other visual features confirmed the presence of statistically significant differences.
This analysis shows that visual style in manga is used to construct gender differences. However, the gender spaces defined by style do not form absolute sets. Instead, they have a strong overlap. This for me was the most interesting result because it reveals a more complex picture of gender/style spaces. This picture is revealed when we plot distributions of features. Plate 9 shows histograms of the average grayscale for shoujo and shōnen manga pages and a scatter plot in which each manga drawing is represented by a point. A standard deviation of a drawing’s grayscale values is mapped to the x-axis; the entropy of the grayscale values is mapped to the y-axis. As we can see, the distributions for shoujo and shōnen features overlap. (The two types of manga are marked using blue and pink colors in the visualizations.) Not all shoujo manga is lighter than shōnen manga; some are also rather dark in tone. Similarly, not all shōnen manga is always darker; some titles are relatively light. Each distribution has a bell-like shape. Use of numerical features allows us to plot these distributions and see how discrete categories (manga audiences) and continuous features are related to each other.
I have shown how the use of categories in the data and extracted numerical features allows us to make interesting discoveries using examples of one artist’s career (e.g., van Gogh) and a popular cultural field (e.g., manga). Let’s now look at two other strategies for investigating cultural categorical systems.
One strategy is to examine how such systems change over time. Does the number of artifacts that belong to each category become smaller or grow bigger? Are there new categories that were added in particular moments? As an example, we can look at the growth in the types and numbers of programs in creative fields in universities and specialized art schools and academies around the world. Consider Nanjing University of the Arts, the first modern art academy in China, established in 1918. In 2018, it was offering twenty-five undergraduate disciplines and seventy postgraduate disciplines ranging from critical history of design and new media to three specializations in traditional Chinese painting (mountains and rivers, human figures, and flowers and birds).49 In the same year, the New School in New York City, which includes the famous Parsons School of Design (established in 1896), was offering forty-five undergraduate majors and many minors ranging from food studies and creative coding to sustainable cities—in addition to older disciplines such as fashion design and photography.50 When were each of these creative disciplines and specializations first offered, and what are their growth patterns over time and in space (i.e., when do they appear in different cities and countries)? Collecting, analyzing, and visualizing such data will be a very interesting project.
While analyzing one million artworks shared on deviantart.com from 2001 to 2010, we looked at categories that were gradually introduced by contributors and administrators to organize the artworks.51 Since the launch of DeviantArt in 2000, the number of top-level categories and subcategories systematically grew to accommodate the variety of techniques and subjects in artworks submitted by contributors. The category system is organized as a tree, with a number of top-level categories containing further subcategories. By 2011, many branches of this tree had up to six levels of subcategories under them, and the total number of all subcategories was over 1,700.
For the temporal analysis, we zoomed into the two top categories: “traditional art” and “digital art.” We then compared patterns of growth of their subcategories and numbers of artworks shared by artists over the ten-year period studied. Because subcategories can describe content, medium, or techniques, analysis of the development of subcategories and numbers of shared artworks in each subcategory allows us to better understand the effects of digital tools on art. Many media theorists, critics, and artists have written about this topic, but our study was the first to examine such effects quantitatively using large samples of digital artworks created over time.
Although both categories had a similar growth rate, the number of digital art subcategories was always approximately two times larger than the number of traditional art subcategories. By the end of 2001, traditional art had ten subcategories, while digital art had twenty-two; in 2005, these numbers were eighty-one and 162; and in 2010, they were 113 and 216. One possible explanation for this difference is that digital art has more categories that describe specific digital techniques (vector graphics, pixel art, 3-D art, fractals, etc.) and artistic scenes corresponding to these authoring techniques and specific software tools or applications. By scenes I mean groups of nonprofessional and semiprofessional artists who are passionate about particular digital techniques and authoring applications and who exchange information and learn from each other using publications, local interest groups, and online networks such as DeviantArt. While the tools of traditional art did not change for decades, the digital tools kept changing during our analysis period, leading to the formation of such scenes around new tools and new techniques.
My inspiration for quantitative analysis of systems of cultural categories comes from the fields of bibliometrics and scientometrics that developed in the 1960s.52 They offer many methods, tools, and research examples that can be carried over from their original context—analysis and measurement of academic publications, patterns and relationships among scientific fields, and growth of innovation—to many areas of culture. For example, many scientometrics researchers quantitatively analyzed the growth of science as a whole and its particular disciplines. In one study, the authors used 38,508,986 publications from 1980 to 2012, looking at the numbers published each year, and also at 256,164,353 cited references covering the years 1650–2012. The analysis identified three stages in science growth: less than 1 percent until the middle of the eighteenth century, 2–3 percent until World War II, and 8–9 percent after that to 2012. They also found that the growth in numbers of publications was very similar among different disciplines.
The key challenge for using this paradigm in cultural analytics is the lack of formal methods for citation in culture that would be equivalent to those for science. In design, architecture, fashion, cinema, literature, or visual arts, authors borrow visual ideas and elements from other works, but this is not documented. However, in some fields, such as pop music, numerous “citations” are explicit because the authors and publishers have to get the rights to any samples or whole works they want to use. In pop music, the use of electronic and later digital recording media for all published works made sampling central to its functioning since the early 1980s, when hip hop producers such as Grandmaster Flash started using sampled breaks. The organizations that do music rights management include ASCAP (ten million works by 680,000 songwriters, composers, and publishers), BMI (twelve million works by 750,000 members), Sony/ATV (four million works), and Universal Music Publishing Group (UMPG; 3.2 million).53 Such a structured practice of rights for samples does not exist in other cultural areas, but we can invent ways to get around this limitation. In the landmark project Culturegraphy (2014), visualization designer Kim Albrecht used references between films recorded in the Internet Movie Database (IMDB). As the project’s designer explains, IMDB contains “nine different reference types (alternate language version of, edited from, features, follows, references, remake of, spin off from, spoofs & version of),” with “119,135 such connections from 42,571 movies.”54 One of the project’s fascinating results was the first quantitative and graphic view of the “rise of the post-modern cinema”—specifically, the relative quantities of references in later films to earlier films.55 This view revealed that the quantities of such references quickly grew after 1980. This trend was noticed earlier by film scholars, but Albrecht’s project for the first time demonstrated that it was not limited to particular movies. You should look at this and other detailed visualizations in the project yourself; they “connect the macro view with the micro view and show the references of each movie that give rise to the larger pattern of the graphic.”56
By considering categories used in institutional collections and the number of items in each category, we can make visible the “shapes” of these collections. In 2013, the Museum of Modern Art in New York gave my lab access to approximately twenty thousand digitized photos from its photography collection, covering the history of photography from the 1840s to the present. One of many visualizations we created shows all photos sorted by the year of creation as recorded by MoMA (see figure 7.3). Although we may expect that a major institutional art or design collection, including that of MoMA, is more representative of some periods and some types of images than others, the extreme unevenness in this coverage, revealed by our visualization, is rather striking. The period between the First and Second World Wars dominates over all other periods—and in this period, modernist artistic photography dominates, with other types such as photojournalism, industrial, and amateur photography practically absent.57
What will be a more representative representation of photography history, if we are to start a new photography museum? We may want each temporal unit of equal size—for example, five- or ten-year intervals—to have approximately an equal number of artifacts. An alternative strategy is to progressively increase the numbers of artifacts in a collection as the field of professional photography was growing. Or we can employ another sampling strategy. The key thing is to have some strategy in order to construct a more representative history of a cultural field (photography in this case).
Categorical data is central to modern society’s representation and “processing” of its people, cultures, and social interactions. It became even more important in the 2010s when the success of adopting supervised machine learning and neural networks for automatic classification led to use of such classification in new areas and on a new scale. Companies, nonprofits, and academic researchers use categorical data and analytical statistical techniques developed for this data to capture and analyze what people think and believe, how they understand social and economic phenomena, their perceptions of products and brands, and their interactions with each other. For example, ordinal scales such as Likert’s five-point scale for questionnaires have been widely employed to analyze attitudes and opinions for many decades. Today, companies and nonprofits such as Nielsen, Gallup, Pew Research Center Project, and numerous others continue to use such questionnaires in their surveys.
The use of polls to determine public opinions was popularized by Gallup. The company was established in New Jersey in 1935; in 1939 it started to conduct market research for advertising and film industries. Nielsen was founded in Chicago in 1923; it started to measure the radio industry in the 1930s. The Gallup Global Well-Being survey asks one thousand randomly selected individuals from every country included in the survey to rate aspects of their lives on a ten-point scale. The description of the method Gallup publishes demonstrates the kinds of decisions and choices being made in such measurements—which, of course, affects the published results: “Gallup measures life satisfaction by asking respondents to rate their present and future lives on a ‘ladder’ scale with steps numbered from 0 to 10, where ‘0’ indicates the worst possible life and ‘10’ the best possible life. Individuals who rate their current lives a ‘7’ or higher and their future an ‘8’ or higher are considered thriving. Individuals are suffering if they report their current and future lives as a ‘4’ or lower. All other individuals are considered struggling.”58 The people being polled are asked to use an interval scale: numbers from zero to ten. Their responses are then mapped into an ordinal scale that has only three categories: thriving, struggling, and suffering. This is a good example of how measurement scales are used to collect and represent information and how the final reported information is a result of a number of mappings from scale to scale. If the thresholds used for mapping from interval to ordinal scales were set differently, the proportions of people in each country reported in three categories would be different. Here are the examples of the results from the 2010 survey for three countries, with the proportions of people in the thriving, struggling, and suffering categories: Costa Rica (63, 35, and 2), United States (57, 40 and 3), and Cuba (24, 66, and 11).
In the twentieth century, many statistical methods were invented to analyze categorical data obtained with questionnaires and surveys. For example, all later works by famous French sociologist Pierre Bourdieu used the technique of correspondence analysis developed by statistician Jean-Paul Benzécri.59 These techniques are also central to the most influential book in the sociology of culture—Bourdieu’s Distinction: A Social Critique of the Judgement of Taste (1979). The book’s empirical analysis uses the results of two surveys of the tastes of the French public carried out in the 1960s.
Cotemporary digital social networks are engineered to have their hundreds of millions of users create discrete signs of attention and interest. Favorites, likes, and shares are the signs that are already a priori quantified by the networks’ interfaces. In other words, they are asking people to translate their feelings into categorical data. On Facebook, we can see counts of likes and shares for every post. Weibo shows the numbers of fans, discussions, and reads for every topic. (For example, in the second week of August 2016, the most popular post of the week on Weibo was shared 1,155,243 times and liked 575,389 times.60) These counts, along with the usernames of people who added their likes or shared posts, are available to network and third-party algorithms that drive social media monitoring and publishing dashboards, contextual advertising, and other media analytics applications. (They are also often available to researchers via social networks’ APIs.) To use the data science terms, the networks convert users’ attention, interests, social connections, and tastes into structured data, which is easier to analyze than unstructured data, such as texts of social media and blog posts, images and videos they share, or even video interviews conducted in a research study. Certainly, these unstructured forms of human media encode our feelings, thoughts, attitudes, desires, and imaginations, but both human receivers and algorithms often struggle to decode them. This ambiguity of human expression and communication is delightful, desirable, and satisfying for us in many situations—but not for the industry and its algorithmic decision-making systems controlling marketing, advertising, logistics, prices, and other elements of business.
In fact, the world of social media networks can be compared to a massive global marketing study in which people are presented with numerous cultural objects—products, songs, movies, images, influencers, normal users, and everything everybody posts—and they have to choose the ones they prefer, expressing their preferences in a binary way: I like this, I share that. But social networks and normal marketing research also differ in important ways. Historically they developed only to host user-generated content, and commercial content came later. In their present state, user-generated content and commercial content appear next to each other on users’ walls and in their streams.
As it became clear that a few people click on ads presented next to personal content in social networks and apps (this measure is called click rate), advertisers started to further blur the boundary between these categories. One method to do so is native advertising, in which ads match the normal formats and styles of the platforms on which they appear.61 Types of native advertising include ads shown in search results, recommendation widgets (“you may also like”), and stories written as though they come from the platform editors. Another method is paying social media “stars” and “influencers” (people who have many followers) to feature particular products in their regular posts, an update on the older product placement method. (In 2016, a person who had over ten thousand followers on Instagram could already be compensated to promote products.62)
Shall we conclude that by forcing people to use the same mechanisms of appreciation, such as ratings, likes, and shares, on all posted content, the networks are “commodifying” human relations? Or maybe it is the opposite: products are humanized and “emotionalized” because they are appreciated and liked in exactly the same way as photos of your friends or posts about their important life events? Or does this humanize business objects and simultaneously dehumanize expressions of individuals? Regardless, there is an uncanny symmetry today between the networks collecting and making available perfectly organized structured data of users’ likes and shares and the computational analysis of this data by companies, advertisers, marketers, nonprofits, and political parties. The formats in which opinions and interests are collected match statistical methods and algorithms developed much earlier.
It is tempting to conclude that all these formats were coldly engineered so that companies can extract our preferences and interests—but this is not correct. For example, the original motivation for a web hyperlink comes from early hypertext and UI research in the 1960s, decades before the web emerged and started to be used commercially. But even something that may look like it was designed only to collect data—the Like button—was not. Facebook engineers started to prototype the button in the summer of 2007. The idea was to create a design element that would allow users to express that they liked a post. After seeing a protype, people from platform marketing, the feed team, and the ads team got interested and began imagining how they could use such feature for their own purposes. But it took two years before it was launched in early 2009.63 Some of the initial ideas for the use of the button did not work. Only after the design team was able to show that the presence of the button did not take away from a number of post’s comments was the decision made to implement the button for all the users.
The preceding reflections add a new angle to the popular discussions of artificial intelligence and its future. AI can anticipate our behaviors only when we are acting in predictable ways. Predictable here refers to having consistent routines and opinions and behavioral and shopping patterns. All users of AI and predictive analytics may prefer us to have complete consistency, but we also are expected to be spontaneous and periodically change so that we can discover and adopt new brands and offerings.
But we do not always behave or think in predictable ways. For instance, the 2015 study “Navigating by the Stars: Investigating the Actual and Perceived Validity of Online User Ratings” analyzed user ratings for 1,272 products across 120 product categories. The authors found that “average user ratings lack convergence with Consumer Reports scores, the most commonly used measure of objective quality in the consumer behavior literature.”64
In the nineteenth and twentieth centuries, there was no social media with photos and posts about products and companies people could rate, like, or share. Organizations and companies had to administer questionnaires or have people do product comparisons in formal settings. (Today such methods are still widely used because they have a number of advantages over social media data, as I already mentioned.) This is one of the reasons that statisticians developed methods and concepts for using and analyzing different kinds of categorical data. So though basic statistics textbooks and most introductory classes today focus on analysis of quantitative (numerical) data, this is only one part of modern statistics. This primacy of quantitative data analysis reflects the developments of statistical methods in the second part of the nineteenth century and first third of the twentieth century in the context of numerical measurements—human physical characteristics (Quetelet), educational tests (Spearman), or agricultural experiments (Fisher).
Cultural analytics certainly can use both quantitative and categorical statistics. However, representing large-scale cultural phenomena or processes as either categorical or quantitative data is not always necessary for their computational analysis. As I will show in part III, visualization methods allow us to explore large collections of visual cultural artifacts or samples from a cultural process without measuring them.
In other words, we do not have to use either numbers or categories. The ability to explore collections of cultural data and information, see patterns at different scales, confront our stereotypes, and make discoveries—but without having to necessarily quantify them—is the reason that visualization is as important for cultural analytics as statistics and data science. Far from being only one of the tools in quantitative cultural analysis, visualization is an alternative analytical paradigm. It allows us to see patterns that we cannot observe by reading, viewing, or interacting with individual cultural artifacts directly one at a time or by quantifying collections of artifacts and using statistical and computational methods for analysis.