Single-Quoted Strings—Lists of Character Codes

Single-quoted strings are represented as a list of integer values, each value corresponding to a codepoint in the string. For this reason, we refer to them as character lists (or char lists).

 iex>​ str = ​'wombat'
 'wombat'
 iex>​ is_list str
 true
 iex>​ length str
 6
 iex>​ Enum.reverse str
 'tabmow'

This is confusing: IEx says it is a list, but it shows the value as a string. That’s because IEx prints a list of integers as a string if it believes each number in the list is a printable character. You can try this for yourself:

 iex>​ [ 67, 65, 84 ]
 'CAT'

You can look at the internal representation in a number of ways:

 iex>​ str = ​'wombat'
 'wombat'
 iex>​ ​:io​.format ​"​​~w~n"​, [ str ]
 [119,111,109,98,97,116]
 :ok
 iex>​ List.to_tuple str
 {119, 111, 109, 98, 97, 116}
 iex>​ str ++ [0]
 [119, 111, 109, 98, 97, 116, 0]

The ~w in the format string forces str to be written as an Erlang term—the underlying list of integers. The ~n is a newline.

The last example creates a new character list with a null byte at the end. IEx no longer thinks all the bytes are printable, and so returns the underlying character codes.

If a character list contains characters Erlang considers nonprintable, you’ll see the list representation.

 iex>​ ​'∂x/∂y'
 [8706, 120, 47, 8706, 121]

Because a character list is a list, we can use the usual pattern matching and List functions.

 iex>​ ​'pole'​ ++ ​'vault'
 'polevault'
 iex>​ ​'pole'​ -- ​'vault'
 'poe'
 iex>​ List.zip [ ​'abc'​, ​'123'​ ]
 [{97, 49}, {98, 50}, {99, 51}]
 iex>​ [ head | tail ] = ​'cat'
 'cat'
 iex>​ head
 99
 iex>​ tail
 'at'
 iex>​ [ head | tail ]
 'cat'

Why is the head of ’cat’ 99 and not c?. Remember that a char list is just a list of integer character codes, so each individual entry is a number. It happens that 99 is the code for a lowercase c.

In fact, the notation ?c returns the integer code for the character c. This is often useful when employing patterns to extract information from character lists. Here’s a simple module that parses the character-list representation of an optionally signed decimal number.

strings/parse.exs
 defmodule​ Parse ​do
 
 def​ number([ ​?-​ | tail ]), ​do​: _number_digits(tail, 0) * -1
 def​ number([ ​?+​ | tail ]), ​do​: _number_digits(tail, 0)
 def​ number(str), ​do​: _number_digits(str, 0)
 
 defp​ _number_digits([], value), ​do​: value
 defp​ _number_digits([ digit | tail ], value)
 when​ digit ​in​ ​'0123456789'​ ​do
  _number_digits(tail, value*10 + digit - ​?0​)
 end
 defp​ _number_digits([ non_digit | _ ], _) ​do
 raise​ ​"​​Invalid digit '​​#{​[non_digit]​}​​'"
 end
 end

Let’s try it in IEx.

 iex>​ c(​"​​parse.exs"​)
 [Parse]
 iex>​ Parse.number(​'123'​)
 123
 iex>​ Parse.number(​'-123'​)
 -123
 iex>​ Parse.number(​'+123'​)
 123
 iex>​ Parse.number(​'+9'​)
 9
 iex>​ Parse.number(​'+a'​)
 **​ (RuntimeError) Invalid digit 'a'