Whereas single-quoted strings are stored as char lists, the contents of a double-quoted string (dqs) are stored as a consecutive sequence of bytes in UTF-8 encoding. Clearly this is more efficient in terms of memory and certain forms of access, but it does have two implications.
First, because UTF-8 characters can take more than a single byte to represent, the size of the binary is not necessarily the length of the string.
| iex> dqs = "∂x/∂y" |
| "∂x/∂y" |
| iex> String.length dqs |
| 5 |
| iex> byte_size dqs |
| 9 |
| iex> String.at(dqs, 0) |
| "∂" |
| iex> String.codepoints(dqs) |
| ["∂", "x", "/", "∂", "y"] |
| iex> String.split(dqs, "/") |
| ["∂x", "∂y"] |
Second, because you’re no longer using lists, you need to learn and work with the binary syntax alongside the list syntax in your code.
When Elixir library documentation uses the word string (and most of the time it uses the word binary), it means double-quoted strings.
The String module defines functions that work with double-quoted strings.
Returns the grapheme at the given offset (starting at 0). Negative offsets count from the end of the string.
| iex> String.at("∂og", 0) |
| "∂" |
| iex> String.at("∂og", -1) |
| "g" |
Converts str to lowercase, and then capitalizes the first character.
| iex> String.capitalize "école" |
| "École" |
| iex> String.capitalize "ÎÎÎÎÎ" |
| "Îîîîî" |
Returns the codepoints in str.
| iex> String.codepoints("José's ∂øg") |
| ["J", "o", "s", "é", "'", "s", " ", "∂", "ø", "g"] |
Converts str to lowercase.
| iex> String.downcase "ØRSteD" |
| "ørsted" |
Returns a string containing n copies of str.
| iex> String.duplicate "Ho! ", 3 |
| "Ho! Ho! Ho! " |
Returns true if str ends with any of the given suffixes.
| iex> String.ends_with? "string", ["elix", "stri", "ring"] |
| true |
Returns the first grapheme from str.
| iex> String.first "∂og" |
| "∂" |
Returns the graphemes in the string. This is different from the codepoints function, which lists combining characters separately. The following example uses a combining diaeresis along with the letter e to represent ë. (It might not display properly on your ereader.)
| iex> String.codepoints "noe\u0308l" |
| ["n", "o", "e", "̈", "l"] |
| iex> String.graphemes "noe\u0308l" |
| ["n", "o", "ë", "l"] |
Returns a float between 0 and 1 indicating the likely similarity of two strings.
| iex> String.jaro_distance("jonathan", "jonathon") |
| 0.9166666666666666 |
| iex> String.jaro_distance("josé", "john") |
| 0.6666666666666666 |
Returns the last grapheme from str.
| iex> String.last "∂og" |
| "g" |
Returns the number of graphemes in str.
| iex> String.length "∂x/∂y" |
| 5 |
Returns the list of transformations needed to convert one string to another.
| iex> String.myers_difference("banana", "panama") |
| [del: "b", ins: "p", eq: "ana", del: "n", ins: "m", eq: "a"] |
Splits str into its leading codepoint and the rest, or nil if str is empty. This may be used as the basis of an iterator.
| defmodule MyString do |
| def each(str, func), do: _each(String.next_codepoint(str), func) |
| |
| defp _each({codepoint, rest}, func) do |
| func.(codepoint) |
| _each(String.next_codepoint(rest), func) |
| end |
| |
| defp _each(nil, _), do: [] |
| end |
| |
| MyString.each "∂og", fn c -> IO.puts c end |
produces
| ∂ |
| o |
| g |
Same as next_codepoint, but returns graphemes (:no_grapheme on completion).
Returns a new string, at least new_length characters long, containing str right-justified and padded with padding.
| iex> String.pad_leading("cat", 5, ">") |
| ">>cat" |
Returns a new string, at least new_length characters long, containing str left-justified and padded with padding.
| iex> String.pad_trailing("cat", 5) |
| "cat " |
Returns true if str contains only printable characters.
| iex> String.printable? "José" |
| true |
| iex> String.printable? "\x00 a null" |
| false |
Replaces pattern with replacement in str under control of options.
If the :global option is true, all occurrences of the pattern are replaced; otherwise only the first is replaced.
If :insert_replaced is a number, the pattern is inserted into the replacement at that offset. If the option is a list, it is inserted multiple times.
| iex> String.replace "the cat on the mat", "at", "AT" |
| "the cAT on the mAT" |
| iex> String.replace "the cat on the mat", "at", "AT", global: false |
| "the cAT on the mat" |
| iex> String.replace "the cat on the mat", "at", "AT", insert_replaced: 0 |
| "the catAT on the matAT" |
| iex> String.replace "the cat on the mat", "at", "AT", insert_replaced: [0,2] |
| "the catATat on the matATat" |
Reverses the graphemes in a string.
| iex> String.reverse "pupils" |
| "slipup" |
| iex> String.reverse "∑ƒ÷∂" |
| "∂÷ƒ∑" |
Returns a len character substring starting at offset (measured from the end of str if negative).
| iex> String.slice "the cat on the mat", 4, 3 |
| "cat" |
| iex> String.slice "the cat on the mat", -3, 3 |
| "mat" |
Splits str into substrings delimited by pattern. If :global is false, only one split is performed. pattern can be a string, a regular expression, or nil. In the latter case, the string is split on whitespace.
| iex> String.split " the cat on the mat " |
| ["the", "cat", "on", "the", "mat"] |
| iex> String.split "the cat on the mat", "t" |
| ["", "he ca", " on ", "he ma", ""] |
| iex> String.split "the cat on the mat", ~r{[ae]} |
| ["th", " c", "t on th", " m", "t"] |
| iex> String.split "the cat on the mat", ~r{[ae]}, parts: 2 |
| ["th", " cat on the mat"] |
Returns true if str starts with any of the given prefixes.
| iex> String.starts_with? "string", ["elix", "stri", "ring"] |
| true |
Trims leading and trailing whitespace from str.
| iex> String.trim "\t Hello \r\n" |
| "Hello" |
Trims leading and trailing instances of character from str.
| iex> String.trim "!!!SALE!!!", "!" |
| "SALE" |
Trims leading whitespace from str.
| iex> String.trim_leading "\t\f Hello\t\n" |
| "Hello\t\n" |
Trims leading copies of character (an integer codepoint) from str.
| iex> String.trim_leading "!!!SALE!!!", "!" |
| "SALE!!!" |
Trims trailing whitespace from str.
| iex> String.trim_trailing(" line \r\n") |
| " line" |
Trims trailing occurrences of character from str.
| iex> String.trim_trailing "!!!SALE!!!", "!" |
| "!!!SALE" |
upcase(str)
| iex> String.upcase "José Ørstüd" |
| "JOSÉ ØRSTÜD" |
Returns true if str is a string containing valid codepoints.
| iex> String.valid? "∂" |
| true |
| iex> String.valid? "∂og" |
| true |
| iex> String.valid? << 0x80, 0x81 >> |
| false |