Binaries and Pattern Matching

The first rule of binaries is “if in doubt, specify the type of each field.” Available types are binary, bits, bitstring, bytes, float, integer, utf8, utf16, and utf32. You can also add qualifiers:

Use hyphens to separate multiple attributes for a field:

 << length::unsigned-integer-size(12), flags::bitstring-size(4) >> = data

However, unless you’re doing a lot of work with binary file or protocol formats, the most common use of all this scary stuff is to process UTF-8 strings.

String Processing with Binaries

When we process lists, we use patterns that split the head from the rest of the list. With binaries that hold strings, we can do the same kind of trick. We have to specify the type of the head (UTF-8), and make sure the tail remains a binary.

strings/utf-iterate.ex
 defmodule​ Utf8 ​do
 def​ each(str, func) ​when​ is_binary(str), ​do​: _each(str, func)
 
 defp​ _each(<< head :: utf8, tail :: binary >>, func) ​do
  func.(head)
  _each(tail, func)
 end
 
 defp​ _each(<<>>, _func), ​do​: []
 end
 
 Utf8.each ​"​​∂og"​, ​fn​ char -> IO.puts char ​end

produces

 8706
 111
 103

The parallels with list processing are clear, but the differences are significant. Rather than use [ head | tail ], we use << head::utf8, tail::binary >>. And rather than terminate when we reach the empty list, [], we look for an empty binary, <<>>.