Elixir has two kinds of string: single-quoted and double-quoted. They differ significantly in their internal representation. But they also have many things in common.
Strings can hold characters in UTF-8 encoding.
They may contain escape sequences:
\a | BEL (0x07) | \b | BS (0x08) | \d | DEL (0x7f) |
\e | ESC (0x1b) | \f | FF (0x0c) | \n | NL (0x0a) |
\r | CR (0x0d) | \s | SP (0x20) | \t | TAB (0x09) |
\v | VT (0x0b) | \uhhh | 1–6 hex digits | \xhh | 2 hex digits |
They allow interpolation on Elixir expressions using the syntax #{...}:
| iex> name = "dave" |
| "dave" |
| iex> "Hello, #{String.capitalize name}!" |
| "Hello, Dave!" |
Characters that would otherwise have special meaning can be escaped with a backslash.
They support heredocs.
Any string can span several lines. To illustrate this, we’ll use both IO.puts and IO.write. We use write for the multiline string because puts always appends a newline, and we want to see the contents without this.
| IO.puts "start" |
| IO.write " |
| my |
| string |
| " |
| IO.puts "end" |
produces
| start |
| |
| my |
| string |
| end |
Notice how the multiline string retains the leading and trailing newlines and the leading spaces on the intermediate lines.
The heredoc notation fixes this. Triple the string delimiter (”’ or """) and indent the trailing delimiter to the same margin as your string contents, and you get this:
| IO.puts "start" |
| IO.write """ |
| my |
| string |
| """ |
| IO.puts "end" |
which produces
| start |
| my |
| string |
| end |
Heredocs are used extensively to add documentation to functions and modules.
Like Ruby, Elixir has an alternative syntax for some literals. We’ve already seen it with regular expressions, where we wrote ~r{...}. In Elixir, these ~-style literals are called sigils (symbols with magical powers).
A sigil starts with a tilde, followed by an upper- or lowercase letter, some delimited content, and perhaps some options. The delimiters can be <…>, {…}, […], (…), |…|, /…/, "…", and ’…’.
The letter determines the sigil’s type:
~C | A character list with no escaping or interpolation |
~c | A character list, escaped and interpolated just like a single-quoted string |
~D | A Date in the format yyyy-mm-dd |
~N | A naive (raw) DateTime in the format yyyy-mm-dd hh:mm:ss[.ddd] |
~R | A regular expression with no escaping or interpolation |
~r | A regular expression, escaped and interpolated |
~S | A string with no escaping or interpolation |
~s | A string, escaped and interpolated just like a double-quoted string |
~T | A Time in the format hh:mm:ss[.dddd] |
~W | A list of whitespace-delimited words, with no escaping or interpolation |
~w | A list of whitespace-delimited words, with escaping and interpolation |
Here are some examples of sigils, using a variety of delimiters:
| iex> ~C[1\n2#{1+2}] |
| '1\\n2\#{1+2}' |
| iex> ~c"1\n2#{1+2}" |
| '1\n23' |
| iex> ~S[1\n2#{1+2}] |
| "1\\n2\#{1+2}" |
| iex> ~s/1\n2#{1+2}/ |
| "1\n23" |
| iex> ~W[the c#{'a'}t sat on the mat] |
| ["the", "c\#{'a'}t", "sat", "on", "the", "mat"] |
| iex> ~w[the c#{'a'}t sat on the mat] |
| ["the", "cat", "sat", "on", "the", "mat"] |
| iex> ~D<1999-12-31> |
| ~D[1999-12-31] |
| iex> ~T[12:34:56] |
| ~T[12:34:56] |
| iex> ~N{1999-12-31 23:59:59} |
| ~N[1999-12-31 23:59:59] |
The ~W and ~w sigils take an optional type specifier, a, c, or s, which determines whether it returns a list of atoms, character lists, or strings. (We’ve already seen the ~r options.)
| iex> ~w[the c#{'a'}t sat on the mat]a |
| [:the, :cat, :sat, :on, :the, :mat] |
| iex> ~w[the c#{'a'}t sat on the mat]c |
| ['the', 'cat', 'sat', 'on', 'the', 'mat'] |
| iex> ~w[the c#{'a'}t sat on the mat]s |
| ["the", "cat", "sat", "on", "the", "mat"] |
The delimiter can be any nonword character. If it is (, [, {, or <, then the terminating delimiter is the corresponding closing character. Otherwise the terminating delimiter is the next nonescaped occurrence of the opening delimiter.
Elixir does not check the nesting of delimiters, so the sigil ~s{a{b} is the three-character string a{b.
If the opening delimiter is three single or three double quotes, the sigil is treated as a heredoc.
| iex> ~w""" |
| ...> the |
| ...> cat |
| ...> sat |
| ...> """ |
| ["the", "cat", "sat"] |
If you want to specify modifiers with heredoc sigils (most commonly you’d do this with ~r), add them after the trailing delimiter.
| iex> ~r""" |
| ...> hello |
| ...> """i |
| ~r/hello\n/i |
One of the interesting things about sigils is that you can define your own. We talk about this in Part III,.