9   What Came after DeepDream?

Damien Henry and a Machine That Dreams a Landscape

It will be hard for the art world to ignore what machine learning can do and what artists can do with machine learning.

—Damien Henry44

Damien Henry’s computer-generated Music for 18 Musicians—Steve Reich, based on train journeys he took across France, provides another thought-provoking demonstration of how a neural net awakens.

Henry was always amazed at the fact that deep neural networks can learn without explicit instructions. He wanted to find a way to demonstrate this and decided to feed a deep neural network with videos taken from a moving train, giving the network many examples so that it could “figure out how to do things.”45

Then he chose a single still image from the sequence and fed it into the machine. He used an algorithm that would predict the next frame in the sequence as a likely outcome of the preceding one again and again, over one hundred thousand times.

To his amazement, he found that the algorithm captured phenomena he “would not have thought about to capture himself.” The distant blue sky hardly moves, while the foreground moves by rapidly.46 The result is a mesmerizing fifty-six minutes of an extraordinary landscape eternally passing by. “At some point,” he recalls, “I found Steve Reich’s music, Eighteen Musicians, which I completely fell in love with.” Like the train sequence, it was repetitive, and it made a perfect match. Reich’s music spurred Henry to enlarge his initial premise. One new idea was to use several videos for training, including trips through countryside and cities. “The result was more complex,” he says.

Damien Henry has a goatee, warm smile, and Gallic sense of humor. He is the technical program manager at the Google Cultural Institute in Paris, where he kindly invited me to spend some time. He became interested in coding, he tells me, at the age of eight or ten. But his interest was not in video games. Wasn’t there something else coding was good for? He pondered this, but put these thoughts aside because he felt they were nothing more than “procrastination,” which he prefers to avoid.47 At university, instead of studying computer science, he opted for aeronautical engineering. But he couldn’t stay away from coding, and when he resumed his explorations, he concluded that his previous efforts should be called “creative coding.”

He posted a version of his train journey music video, almost an hour long, on YouTube, in which a blurry landscape unfolds from an impressionistic haze, accompanied by Steve Reich’s hypnotic music. It is not a “real landscape,” but the world as seen by a machine. The machine makes choices, it employs probability, but that does not devalue the final result. We too reason probabilistically. We take in data, figure out ways to deal with it, and choose the option most likely to succeed.

Henry firmly believes that machines cannot be creative by themselves. “The more complex the tool, the more creative we can be,” he says. The idea of totally removing people from the equation makes no sense to him.

Mario Klingemann and His X Degrees of Separation

I try to get away from the “default” setting, the out-of-the-box setting, tweak it and add my own touch to it.

—Mario Klingemann48

People “still have problems accepting that you can do art using machines,” says Mario Klingemann.49 He is well aware that he is part of a new avant-garde and feels that “this is where the really interesting work is.”

In 2015, he began a large-scale project to classify the million or more out-of-copyright digital images from 1500 to 1899 that have been put online by the British Library. To do this, he trained a machine that organized them into classes—portraits, horses, dogs, men, women, rocks, ancient and modern tools, and so on. These groupings existed in a multi-multi-dimensional space—a latent space made up of the encodings of the numerous pixels that make up the artworks. He then projected them into two dimensions. Eventually he tagged about four hundred thousand. He put some of his results on Twitter.

Shortly afterward, the phone rang. It was Damien Henry from Google—the call independent machine artists dream of. “We have this huge amount of cultural data and see that you do interesting stuff. Why not drop by?” Henry said.50 “I would never pass a Google engineer interview because I have only certain knowledge,” Klingemann tells me, alluding to Google’s legendarily rigorous interviews. But Henry sensed something interesting and extraordinary and Klingemann became an artist in residence.

Mario Klingemann is another artist who showed his work at AMI. Always smiling, always brimming with ideas, he started coding at the age of twelve but still cut and pasted graphics for his school newspaper by hand. He didn’t go to art school or have any higher education. Painting and drawing held no interest for him, nor did informatics. Initially he worked in advertising on projects that combined code and art and kept his eyes open for developments. But he felt that he lacked the seal of approval as an artist. “I’m trying to make up for that by doing interesting things in fields not yet taught in art schools,” he says.51

He now calls himself an artist—a code artist.

At Google, Klingemann was excited about the data available and the machines he had access to for his “experiments,” as he calls his artwork. Along with another computer artist, Simon Doury, Klingemann was asked to create a way of making connections, two at a time, between the millions of artworks available from the British Library, based on “six degrees of separation,” the idea that everything in the world can be connected to everything else in a maximum of six steps. Doury and Klingemann’s X Degrees of Separation is a readily available website.52 On it, you pick any two artworks from the million in the Google cache. Using machine learning, the computer quickly finds a pathway connecting them through a chain of five, six, or seven artworks using visual clues. Thus a four-thousand-year-old clay figure can be linked to van Gogh’s The Starry Night through seven intermediary works.

Klingemann and Doury based their website on the fact that artificial neural networks transform (encode) the pixels in the chosen image into numbers. The machine scans the works available, seeking similarities in their encodings. These translate into the visual similarities that we see in the successive images, similarities that we might otherwise overlook between, for example, the four-thousand-year-old clay figure and van Gogh’s painting, as well as those in the intermediary works.53 The degrees of separation vary with the images chosen. Sometimes there are five, sometimes eight.

In his time off, Klingemann also explored DeepDream, his “trigger into this neural network field.”54 He studied the online code and read what others were doing. “It was all puppyslugs and puppy dogs and psychedelic images. I wanted to do something else,” he tells me. He asked himself: How can I use the same methods and get different results? Instead of feeding his machine with images from ImageNet, as other artists had, he fed in decorative letters of the alphabet, such as we see in woodcuts and antique books. To check that the machine had correctly learned what he wanted it to, he fed in decorative images from old books. The machine was able to pick out a particular letter that was largely hidden. In one image, the machine indicated that there was a B present. Klingemann thought it had made a mistake until he looked closer. Sure enough, there was indeed a B, though it was virtually invisible.

Having equipped his machine with a huge number of stylized letters and the code for DeepDream, he fed in the Mona Lisa. Out came a Mona Lisa transformed into an alphabetized linocut (figure 9.1).

Figure 9.1

Mario Klingmann, Mona Lisa transformed by DeepDream, 2016.

Given that he had fed the machine only with wood block prints of letters, it looked for these in everything it saw.

Angelo Semeraro’s Recognition: Intertwining Past and Present

Maybe what happens right now has already happened in other forms a long time ago.

—Angelo Semeraro55

Recognition is the work of a team at the Fabrica Research Centre in Treviso, in Italy. It is sometimes called the “Italian ideas factory” in reference to its highly creative output in design, visual communication, photography, and journalism.56 The team is made up of Angelo Semeraro, who focuses on technological input; Corallie Gourguechon, who focuses on product design; and project manager Monica Lanaro. In 2016, they won the IK Prize sponsored by Tate Britain, awarded to groups or individuals who propose the best original idea based on digital technology for exploring the art on display at Tate and on its website. That year, Tate partnered with Microsoft. The challenge was to use AI for this purpose.

Semeraro discovered computers as a teenager. He loved the guitar, and thought of combining technology and music; he soon extended this to include art. He studied mathematics and computer science at the University of Bologna, then went to Fabrica, where he found people who “were interested in how a subjective field like art could be seen through the rational eye of AI.”57

The catalyst for their work was news images: “accidental Renaissances,” modern images composed by accident that bear striking similarities to Renaissance art. The title originates from photographs of drunken late-night revelers sprawled on the streets of Manchester, England, on New Year’s Eve in 2016, which look startlingly akin to a Renaissance painting.58 “This inspired us to show how humans can take two images from two completely different eras and then find comparisons between them,” Semeraro tells me.59 Perhaps, the Fabrica team thought, “Comparison between the images can be greater than the sum of the two, to create a new content. That was how the idea began to take shape.”

Winning the IK Prize gave them the opportunity to work with Tate’s vast collection of five hundred years of British art, with Microsoft, and with the Reuters news agency and its huge library of photographs. This, they felt, would enable them to show how the present and past intertwine. “Maybe,” Semeraro says, “what happens right now has already happened in the past.”60 Semeraro points out that we often say that the “past can help us to understand. But why cannot the present help us to understand Tate Britain’s five-hundred-year-old collection?”

It took the team two days to find a single good match between 1,250 news images and 30,000 art works, while in three months the machine made 4,502 matches. The machine does not deal with pixels but with the numbers that encode each pixel. An artificial neural network compares these numbers with the numbers characterizing other art works and looks for similarities, as shown in the two images in figure 9.2.

Figure 9.2

A pairing by Recognition of a 2016 photograph with a painting from 1660, 2017.

Sometimes the machine spotted surprising matches that at first sight made no sense, but on further reflection turned out to be very interesting. Some remained ambiguous. There were also mistaken matches—which shows, Semeraro muses, “that art is a nice playground.”61 The machine saw what Semeraro and his colleagues could not.

Leon Gatys’s Style Transfer: Photography “In the Style Of”

In a way, style transfer is also brain surgery.

—Leon Gatys62

Have you ever wondered what Picasso would have done with one of your favorite photographs, or how he would have portrayed you yourself? Leon Gatys and two colleagues at the University of Tübingen in Germany figured out a way to find out. They called their method style transfer,63 and it appeared two months after Mordvintsev, Tyka, and Olah launched DeepDream.

Gatys studied physics at University College London and was inspired by the deeply philosophical debates he read about on the interpretation of quantum physics—in particular, the ideas of Niels Bohr and Werner Heisenberg. At Tübingen, he explored how complex neural networks can help explain how we see the world around us. DeepDream had shown how machines can elucidate the complexity in the way we process incoming perceptions. At times we see things that aren’t really there. Perception occurs in the brain, and it is the same with machines.

From this Gatys evolved “a neural algorithm of artistic style.” If we can “see” a scene with our eyes but imagine it as something different, rendered by Rembrandt or Cézanne, why can’t a convolutional neural network do the same? Gatys puts great weight on the analogy, however loose, between ConvNets and human vision. “I want to have a machine that perceives the world in a similar way as we do, then to use that machine to create something that is exciting to us.”64

Style is a nebulous concept that art historians argue over. It takes years of training and experience for an art historian to be able to distinguish an artist’s work by its style. Could it really be something a machine could grasp? Could a neural network separate style from content and miraculously produce a work that looked as if it had been recovered from the studio of a long-dead master? It seems it could.

Style transfer works by recreating an arbitrary photograph in the style of a classic painting. In the 1980s, before artificial neural networks came into their own, this was a hugely difficult task. In those days, machines dealt with representations of the world using objects and symbols set into complex networks made up of statements and rules about things and their relationships. Computer scientists were essentially programming the world as we see and perceive it into their machines. Applying these networks to a painting was impossible.

But neural networks deal not with statements about a painting, but with numbers. To interchange the pixels in a painting with the pixels in a photograph would be extremely complex, if possible at all. But in deep neural networks, each pixel is replaced with the numbers that encode it, as are the pixels in the photograph to be altered. The question is how to mix the numbers in a photograph with those in a classic painting so as to create something new.

In style transfer, the style image is a photograph of a painting and the content image is an arbitrary photograph. Gatys begins by feeding the encodings for the style image and the content image into the layers of the network, then puts in an image made up of white noise. This will be the basis of the new image—the blank canvas. At this stage, the pixels in it are distributed at random, with no connections between them, in a dense conglomeration of black and white dots with no discernible pattern. This blank canvas is then propagated back and forth through the system so that it captures as much as possible of the style of the painting and the content of the photograph, mixing them both together until it finally reproduces the image in the photograph in the style of the painting. Even so, the process is rather mysterious and the results a little magical, as in figure 9.3, where style transfer takes a photograph of the Neckarfront at Tübingen and transforms it in the style of van Gogh’s The Starry Night to create a new, rather extraordinary, and exciting image.

Figure 9.3

The content image, a photograph of the Neckarfront at Tübingen (upper left). The style image, van Gogh’s The Starry Night (lower left). The image on the right results from combining the style and content images to make The Neckarfront at Tübingen according to van Gogh, as van Gogh might have painted it. [See color plate 2.]

Style transfer proved instantly and deservedly popular, so Gatys, along with colleagues at Tübingen, set up a website (https://deepart.io/page/about/) where anyone can play with it. You insert a photograph together with a painting, by, for example, Picasso, click a button, and in a short time the machine creates an image in the style of your chosen painting—thus yielding, for example, a cubist image of yourself.

How does style transfer compare with DeepDream? Both use representations of the features at various depths in a deep neural network to change and synthesize images. But whereas DeepDream finds an image of, for example, a cat, that maximally stimulates a certain neuron that has been trained to identify it, in style transfer both style and content are extracted from an image. The deeper the level you sample from the style image’s journey through the layers of the network, the more you can match the structure of the content image, even though the appearance is completely changed.

This, says Gatys, is like having a vague image of what a forest was like last time you saw one, then using this information to imagine a brand-new forest. He is optimistic that style transfer will offer new insights into the way people see and hopes that it is a first step toward understanding perception. “That’s something I’m very excited about.”65

One of the offshoots of style transfer is photographic style transfer, in which the style image is also a photograph.66 The issue here is how to control the way in which the style image is imported over to the content image so that straight lines remain straight and curved lines remain curved, thus yielding an output photograph with no abstraction. The result will be akin to what a very advanced version of Adobe Photoshop could produce.

Other recent avenues of research for style transfer include “generation of audio in the style of another piece of audio or, perhaps, generating text,” says Gene Kogan, another prolific artist/technologist.67

Another is fashion. In this case, an artificial neural network is trained to distinguish between popular and less popular trends in clothing based on elements such as collars, pockets, sleeves, and patterns. The process here is to transfer the style of, say, one shirt to another, then keep reprocessing.68

As for Gatys’s views on creativity, he feels that we will need to understand human creativity better before we can evaluate our progress toward machine creativity.

Notes