Programming Should Be About Transforming Data

If you come from an object-oriented world, then you are used to thinking in terms of classes and their instances. A class defines behavior, and objects hold state. Developers spend time coming up with intricate hierarchies of classes that try to model their problem, much as Victorian scientists created taxonomies of butterflies.

When we code with objects, we’re thinking about state. Much of our time is spent calling methods in objects and passing them other objects. Based on these calls, objects update their own state, and possibly the state of other objects. In this world, the class is king—it defines what each instance can do, and it implicitly controls the state of the data its instances hold. Our goal is data-hiding.

But that’s not the real world. In the real world, we don’t want to model abstract hierarchies (because in reality there aren’t that many true hierarchies). We want to get things done, not maintain state.

Right now, for instance, I’m taking empty computer files and transforming them into files containing text. Soon I’ll transform those files into a format you can read. A web server somewhere will transform your request to download the book into an HTTP response containing the content.

I don’t want to hide data. I want to transform it.

Combine Transformations with Pipelines

Unix users are accustomed to the philosophy of small, focused command-line tools that can be combined in arbitrary ways. Each tool takes an input, transforms it, and writes the result in a format the next tool (or a human) can use.

This philosophy is incredibly flexible and leads to fantastic reuse. The Unix utilities can be combined in ways undreamed of by their authors. And each one multiplies the potential of the others.

It’s also highly reliable—each small program does one thing well, which makes it easier to test.

There’s another benefit. A command pipeline can operate in parallel. If I write

 $ ​​grep​​ ​​Elixir​​ ​​*.pml​​ ​​|​​ ​​wc​​ ​​-l

the word-count program, wc, runs at the same time as the grep command. Because wc consumes grep’s output as it is produced, the answer is ready with virtually no delay once grep finishes.

Just to give you a taste of this, here’s an Elixir function called pmap. It takes a collection and a function, and returns the list that results from applying that function to each element of the collection. But…it runs a separate process to do the conversion of each element. Don’t worry about the details for now.

spawn/pmap1.exs
 defmodule​ Parallel ​do
 def​ pmap(collection, func) ​do
  collection
  |> Enum.map(&(Task.async(​fn​ -> func.(&1) ​end​)))
  |> Enum.map(&Task.await/1)
 end
 end

We could run this function to get the squares of the numbers from 1 to 1,000.

 result = Parallel.pmap 1..1000, &(&1 ​*​ &1)

And, yes, I just kicked off 1,000 background processes, and I used all the cores and processors on my machine.

The code may not make much sense, but by about halfway through the book, you’ll be writing this kind of thing for yourself.

Functions Are Data Transformers

Elixir lets us solve the problem in the same way the Unix shell does. Rather than have command-line utilities, we have functions. And we can string them together as we please. The smaller—more focused—those functions, the more flexibility we have when combining them.

If we want, we can make these functions run in parallel—Elixir has a simple but powerful mechanism for passing messages between them. And these are not your father’s boring old processes or threads—we’re talking about the potential to run millions of them on a single machine and have hundreds of these machines interoperating. Bruce Tate commented on this paragraph with this thought: “Most programmers treat threads and processes as a necessary evil; Elixir developers feel they are an important simplification.” As we get deeper into the book, you’ll start to see what he means.

This idea of transformation lies at the heart of functional programming: a function transforms its inputs into its output. The trigonometric function sin is an example—give it π/4, and you’ll get back 0.7071. An HTML templating system is a function; it takes a template containing placeholders and a list of named values, and produces a completed HTML document.

But this power comes at a price. You’re going to have to unlearn a whole lot of what you know about programming. Many of your instincts will be wrong. And this will be frustrating, because you’re going to feel like a total n00b.

Personally, I feel that’s part of the fun. You didn’t learn, say, object-oriented programming overnight. You are unlikely to become a functional programming expert by breakfast, either.

But at some point things will click. You’ll start thinking about problems in a different way, and you’ll find yourself writing code that does amazing things with very little effort on your part. You’ll find yourself writing small chunks of code that can be used over and over, often in unexpected ways (just as wc and grep can be).

Your view of the world may even change a little as you stop thinking in terms of responsibilities and start thinking in terms of getting things done. And just about everyone can agree that will be fun.