Single-quoted strings are represented as a list of integer values, each value corresponding to a codepoint in the string. For this reason, we refer to them as character lists (or char lists).
| iex> str = 'wombat' |
| 'wombat' |
| iex> is_list str |
| true |
| iex> length str |
| 6 |
| iex> Enum.reverse str |
| 'tabmow' |
This is confusing: IEx says it is a list, but it shows the value as a string. That’s because IEx prints a list of integers as a string if it believes each number in the list is a printable character. You can try this for yourself:
| iex> [ 67, 65, 84 ] |
| 'CAT' |
You can look at the internal representation in a number of ways:
| iex> str = 'wombat' |
| 'wombat' |
| iex> :io.format "~w~n", [ str ] |
| [119,111,109,98,97,116] |
| :ok |
| iex> List.to_tuple str |
| {119, 111, 109, 98, 97, 116} |
| iex> str ++ [0] |
| [119, 111, 109, 98, 97, 116, 0] |
The ~w in the format string forces str to be written as an Erlang term—the underlying list of integers. The ~n is a newline.
The last example creates a new character list with a null byte at the end. IEx no longer thinks all the bytes are printable, and so returns the underlying character codes.
If a character list contains characters Erlang considers nonprintable, you’ll see the list representation.
| iex> '∂x/∂y' |
| [8706, 120, 47, 8706, 121] |
Because a character list is a list, we can use the usual pattern matching and List functions.
| iex> 'pole' ++ 'vault' |
| 'polevault' |
| iex> 'pole' -- 'vault' |
| 'poe' |
| iex> List.zip [ 'abc', '123' ] |
| [{97, 49}, {98, 50}, {99, 51}] |
| iex> [ head | tail ] = 'cat' |
| 'cat' |
| iex> head |
| 99 |
| iex> tail |
| 'at' |
| iex> [ head | tail ] |
| 'cat' |
Why is the head of ’cat’ 99 and not c?. Remember that a char list is just a list of integer character codes, so each individual entry is a number. It happens that 99 is the code for a lowercase c.
In fact, the notation ?c returns the integer code for the character c. This is often useful when employing patterns to extract information from character lists. Here’s a simple module that parses the character-list representation of an optionally signed decimal number.
| defmodule Parse do |
| |
| def number([ ?- | tail ]), do: _number_digits(tail, 0) * -1 |
| def number([ ?+ | tail ]), do: _number_digits(tail, 0) |
| def number(str), do: _number_digits(str, 0) |
| |
| defp _number_digits([], value), do: value |
| defp _number_digits([ digit | tail ], value) |
| when digit in '0123456789' do |
| _number_digits(tail, value*10 + digit - ?0) |
| end |
| defp _number_digits([ non_digit | _ ], _) do |
| raise "Invalid digit '#{[non_digit]}'" |
| end |
| end |
Let’s try it in IEx.
| iex> c("parse.exs") |
| [Parse] |
| iex> Parse.number('123') |
| 123 |
| iex> Parse.number('-123') |
| -123 |
| iex> Parse.number('+123') |
| 123 |
| iex> Parse.number('+9') |
| 9 |
| iex> Parse.number('+a') |
| ** (RuntimeError) Invalid digit 'a' |