The first rule of binaries is “if in doubt, specify the type of each field.” Available types are binary, bits, bitstring, bytes, float, integer, utf8, utf16, and utf32. You can also add qualifiers:
Use hyphens to separate multiple attributes for a field:
| << length::unsigned-integer-size(12), flags::bitstring-size(4) >> = data |
However, unless you’re doing a lot of work with binary file or protocol formats, the most common use of all this scary stuff is to process UTF-8 strings.
When we process lists, we use patterns that split the head from the rest of the list. With binaries that hold strings, we can do the same kind of trick. We have to specify the type of the head (UTF-8), and make sure the tail remains a binary.
| defmodule Utf8 do |
| def each(str, func) when is_binary(str), do: _each(str, func) |
| |
| defp _each(<< head :: utf8, tail :: binary >>, func) do |
| func.(head) |
| _each(tail, func) |
| end |
| |
| defp _each(<<>>, _func), do: [] |
| end |
| |
| Utf8.each "∂og", fn char -> IO.puts char end |
produces
| 8706 |
| 111 |
| 103 |
The parallels with list processing are clear, but the differences are significant. Rather than use [ head | tail ], we use << head::utf8, tail::binary >>. And rather than terminate when we reach the empty list, [], we look for an empty binary, <<>>.