11   Phillip Isola’s Pix2Pix: Filling in the Picture

Pix2Pix allows mixing of science and art together, offering a means to show data in a way that’s provocative, emotional, and compelling.

—Phillip Isola97

“Translating one image into another is like translating between languages, like between English and French. They are two different representations of the same world,” says Phillip Isola.98

Isola and his coworkers invented a variation on GANs that he calls conditional generative adversarial networks (CGANs). They are conditional because instead of starting the generator network (G) from noise, from nothing, they condition it by using an actual image. Rather than feeding the discriminator network (D) on huge caches of images, they use pairs of images, such as a black-and-white image of a scene and the same scene in color. Then, they input a new black-and-white scene into the generator network. Initially D rejects the new scene, so G colorizes it. In other words, the output is conditioned by the input, which is what GANs are all about. As a result, Pix2Pix, as Isola calls his system, requires a much smaller set of training data than other supervised learning algorithms.

Thus Isola discovered how to translate an image of one sort into another sort: Pix2Pix, pixels to pixels.99 As he puts it, all those “little problems in computer vision were just mapping of pixels to pixels.”100 While style transfer transfers the style of one image onto another, creating an image “in the style of” a painting by Picasso, for example, Pix2Pix goes further. Like Leon Gatys, who invented style transfer, Isola is interested in perception, how we see.

As a graduate student, Isola studied cognitive science, a highly interdisciplinary subject that includes computer science, psychology, and neuroscience. He concluded that a good way to study the human mind would be to build machines that “think kind of like we do.”101 This took him back to computer science, which he had studied as an undergraduate.

Pix2Pix seems almost like magic. “Yes,” agrees Isola enthusiastically. In fact, he continues, all this is more than just a “math thing, it offers a new space to explore. Artists like that.”102 But to make Pix2Pix accessible to artists entailed structuring an interface so that it could be used right out of the box, directly from the GitHub online repository.

Isola was particularly impressed with the work of Chris Hesse, who fed one thousand pairs of images into the discriminator network.103 Each consisted of a photograph of a cat and the outline of the same photograph. Then he constructed an interface that allowed the user to insert a rough sketch of a cat. The network filled in the rough sketch, creating what looks like a photograph of a cat-like creature (figure 11.1).

Figure 11.1

Chris Hesse, edges2cats, 2017.

Hesse called his creation edges2 cats. The interface is essentially a black box: the user does not need to know about the mathematics of Pix2Pix, just as a writer does not need to know how a ballpoint pen works to use it.

Hesse’s work went viral. edges2cats enabled users to make a drawing and see it change in real time. Hesse also used databases of fifty thousand shoes and 137 thousand handbags, all in outline only (figure 11.2). Freehand sketches generated weird, photograph-like images.104

Figure 11.2

Chris Hesse, edges2handbags, 2017.

“Pix2Pix empowers people who may not have the requisite motor skills and technical skills to express their creativity,” says Isola. “It allows mixing of science and art together, offering a means to show data in a way that’s provocative, emotional, and compelling.” He sees Pix2Pix as a first step to developing a user-friendly interface for artists. In this way, Pix2Pix has opened new vistas in art, with artists using Pix2Pix “in ways we had not imagined.”105

Isola and his colleague Jun-Yan Zhu are optimistic that machines may one day be creative, but they feel we are a long way from this goal. They are excited about AlphaZero, the newest version of AlphaGo, which has been given only the rules of Go without any samples of games played previously. This they consider a “whole other level of machine intelligence.”106 Jun-Yan adds, “There’s a long way to go, but in the history of the universe it’s happening right now. This is a moment in history.”107

Mario Klingemann Changes Faces with Pix2Pix

I really got hooked with Pix2Pix.

—Mario Klingemann108

Pix2Pix came along at just the right time for Mario Klingemann, the artist who invented X Degrees of Separation. He was looking for new fields in which, he says, “I can have my solitude for a while.”109 He was convinced that the material the machine was trained on was the key to its creativity, and set about exploring how to clarify or enhance an image. Using Pix2Pix, he fed his machine with thousands of pairs of images in which one is a blurred version of the other. Then he put in an arbitrary blurred image. Pix2Pix sharpened it up (figures 11.3 and 11.4).

Figure 11.3

Mario Klingemann, Transhancement Sketch, 2017.

Figure 11.4

Mario Klingemann, Transhancement Sketch, 2017.

The “machine had to get creative in order to restore lost information,” he says. The result was somewhat “creepy, crazy.”110 These were faces never seen before. And they emerge from a creative source that happens to be a machine. Klingemann calls this process transhancement: “new types of artefacts were generated lying somewhere on the spectrum between the painterly and the digital.”111

Klingemann based another use of Pix2Pix around facial markers. Facial markers or facial landmarks are a system of sixty-eight localized points delineating key facial structures that, when mapped onto a photograph, enable a computer to recognize and identify a particular face. The markers include the corners of the eyes, the outline of the face, the nose, the jaw, the mouth, and so on—and all these can be generated with an algorithm. For data, Klingemann fed in photographic portraits alongside the same images reduced to facial markers.

Next he fed in the facial markers of his own face and ran the output image a couple more times (figure 11.5). “There is not much left of me,” he reported.

Figure 11.5

Mario Klingemann, Neurographic Self-Portrait, 2017.

Klingemann’s pièce de résistance for Pix2Pix is his disquieting video of the French singer Françoise Hardy, in which her face morphs into that of Kellyanne Conway when she appeared at her first press briefing as Donald Trump’s presidential counsellor. Conway’s voice emerges from Hardy’s mouth, asserting that Sean Spicer, Trump’s then press secretary, who defended Trump’s lie about the size of the crowd at his inauguration, “gave alternate facts.”

To construct the video, Klingemann downloaded 1960s’ music videos of Hardy from YouTube. Then he used face-marker detection to create pairs of face markers and fed these along with the original video frames into the network. Next, he extracted Conway’s face markers from her interview footage and fed them into the trained model, melding the two faces. Klingemann calls this piece Alternative Face—the epitome of “fake news.”112 It is a concrete example of post-truth, with Conway’s voice emerging from Hardy’s mouth.

In 2018 Klingemann was the seventh recipient of the Lumen Prize Gold Award for Art and Technology. It was the first time a work created using AI won gold.

On the subject of creativity, Klingemann asserts that we humans are incapable of it because we only build on what we have learned and what others have done while machines can create from scratch—a surprising and fascinating statement. Machines, he says, will one day liberate us. “I hope machines will have a rather different sort of creativity and open up different doors.” He sees his artwork as a step in this direction.

Anna Ridler’s Fall of the House of Usher

For me, input becomes the creative act.

—Anna Ridler113

Artist Anna Ridler’s work is all about memory, narrative, and performance. She was fascinated by the possibilities of Pix2Pix, among them the fact that it could be trained using only a few images. She was concerned about the vast number of images from sources such as ImageNet that usually have to be fed into a machine to train it. She felt she was not in complete control of the material and was also concerned about the gender and racial biases that might be lurking in it. She wanted to use her own datasets—and Pix2Pix enabled her to do so.

She decided to make a piece exploring the 1928 silent film Fall of the House of Usher, adapted from Edgar Allen Poe’s short story, using her own images. This is a macabre tale about decay and destruction, life and death, and includes Poe’s favorite plot device: a person entombed alive. “It’s a horror story and there’s a lot of talk about AI and horror,” she tells me.114

She felt machine learning might be able to enhance the message of the film. What emerges from Pix2Pix can be entirely unexpected, such as a disintegrated version of a photograph.

Ridler began by taking two hundred frames from the first four minutes of the film and redrawing each in pen and ink. Her rationale was that digital was pure and clear, whereas ink was very difficult to control: “it had entropy.”115 In physics, entropy is a measure of disorder and decay. Similarly, ink is fragile, prone to blot and run.

She trained the discriminator on pairs made up of her drawings and actual stills from the movie. Then she fed every frame from the entire movie into Pix2Pix. What emerged was a somewhat abstract film that the network had generated from her art, based on the types of images it thought should follow each frame. Next she took each of these generated frames and redrew it in ink to be inserted into the next cycle of Pix2Pix, giving rise to even more abstract images. The machine began to misremember, and the frames became blobs of black, white, and grey. Entropy was high; almost complete decay had set in, just as in Poe’s original story. The result is a highly original take on Poe, a ghostly succession of increasingly abstract images, orchestrated with eerie background music.116 It’s also a meditation on memory and misremembering.

Ridler studied English language and literature at Oxford, then turned to computers and completed an MA in information experience and design at the Royal College of Art. She has brought her love of art, literature, and technology to bear on her work. Within AI, all three are fused. “I think of AI as trying to give you something. You use your own drawings and creations as input, purity of line, and allow it to suggest something.”117

Ridler argues that machines cannot be artists. “Would you say that my paintbrush is an artist? They cannot be creative.”118

Her great inspiration is Jorge Luis Borges’s flight into highly speculative fiction, “Tlön, Uqbar, Orbis Tertius.” In this story, Borges challenges the boundary between fiction and nonfiction and questions what is real and what is not. In her work, Ridler makes us ask: Which is the real film? The original frames or those produced by the generator of Pix2Pix on its journeys through latent space?

No doubt Borges would have enjoyed wandering multi-multi-dimensional latent space, the space of our imaginations.

Notes