Sets in Go are usually implemented as a map[T]bool
, where T
is the
element type. A set represented by a map is very flexible
but, for certain problems, a specialized representation may outperform it. For
example, in domains such as dataflow analysis where set
elements are small non-negative integers, sets have many elements, and
set operations like union and intersection are common, a bit vector
is ideal.
A bit vector uses a slice of unsigned integer values or “words,” each bit of which represents a possible element of the set. The set contains i if the i-th bit is set. The following program demonstrates a simple bit vector type with three methods:
// An IntSet is a set of small non-negative integers. // Its zero value represents the empty set. type IntSet struct { words []uint64 } // Has reports whether the set contains the non-negative value x. func (s *IntSet) Has(x int) bool { word, bit := x/64, uint(x%64) return word < len(s.words) && s.words[word]&(1<<bit) != 0 } // Add adds the non-negative value x to the set. func (s *IntSet) Add(x int) { word, bit := x/64, uint(x%64) for word >= len(s.words) { s.words = append(s.words, 0) } s.words[word] |= 1 << bit } // UnionWith sets s to the union of s and t. func (s *IntSet) UnionWith(t *IntSet) { for i, tword := range t.words { if i < len(s.words) { s.words[i] |= tword } else { s.words = append(s.words, tword) } } }
Since each word has 64 bits, to locate the bit for x
, we use the
quotient x/64
as the word index and the remainder x%64
as the bit
index within that word. The UnionWith
operation uses the bitwise
OR operator |
to compute the union 64 elements at a time.
(We’ll revisit the choice of 64-bit words in Exercise 6.5.)
This implementation lacks many desirable features, some of
which are posed as exercises below, but one is hard to live without:
way to print an IntSet
as a string.
Let’s give it a String
method as we did with Celsius
in
Section 2.5:
// String returns the set as a string of the form "{1 2 3}". func (s *IntSet) String() string { var buf bytes.Buffer buf.WriteByte('{') for i, word := range s.words { if word == 0 { continue } for j := 0; j < 64; j++ { if word&(1<<uint(j)) != 0 { if buf.Len() > len("{") { buf.WriteByte(' ') } fmt.Fprintf(&buf, "%d", 64*i+j) } } } buf.WriteByte('}') return buf.String() }
Notice the similarity of the String
method above with
intsToString
in Section 3.5.4;
bytes.Buffer
is often used this way in String
methods.
The fmt
package treats types with a String
method specially
so that values of complicated types can display themselves in a
user-friendly manner.
Instead of printing the raw representation of the value (a struct
in this case), fmt
calls the String
method.
The mechanism relies on interfaces and type assertions, which we’ll
explain in Chapter 7.
We can now demonstrate IntSet
in action:
var x, y IntSet x.Add(1) x.Add(144) x.Add(9) fmt.Println(x.String()) // "{1 9 144}" y.Add(9) y.Add(42) fmt.Println(y.String()) // "{9 42}" x.UnionWith(&y) fmt.Println(x.String()) // "{1 9 42 144}" fmt.Println(x.Has(9), x.Has(123)) // "true false"
A word of caution: we declared String
and Has
as methods of
the pointer type *IntSet
not out of necessity, but for consistency
with the other two methods, which need a pointer receiver because
they assign to s.words
.
Consequently, an IntSet
value does not
have a String
method, occasionally leading to surprises like
this:
fmt.Println(&x) // "{1 9 42 144}" fmt.Println(x.String()) // "{1 9 42 144}" fmt.Println(x) // "{[4398046511618 0 65536]}"
In the first case, we print an *IntSet
pointer, which does have a
String
method.
In the second case, we call String()
on an
IntSet
variable; the compiler inserts the implicit &
operation,
giving us a pointer, which has the String
method.
But in the third case, because the IntSet
value does not have
a String
method, fmt.Println
prints the representation
of the struct instead. It’s important not to forget the &
operator.
Making String
a method of IntSet
, not *IntSet
,
might be a good idea, but this is a case-by-case judgment.
Exercise 6.1: Implement these additional methods:
func (*IntSet) Len() int // return the number of elements func (*IntSet) Remove(x int) // remove x from the set func (*IntSet) Clear() // remove all elements from the set func (*IntSet) Copy() *IntSet // return a copy of the set
Exercise 6.2:
Define a variadic (*IntSet).AddAll(...int)
method that allows a list
of values to be added, such as s.AddAll(1, 2, 3)
.
Exercise 6.3:
(*IntSet).UnionWith
computes the union of two sets using |
, the
word-parallel bitwise OR operator.
Implement methods for IntersectWith
, DifferenceWith
, and
SymmetricDifference
for the corresponding set operations.
(The symmetric difference of two sets contains the elements present
in one set or the other but not both.)
Exercise 6.4:
Add a method Elems
that returns a slice containing the elements
of the set, suitable for iterating over with a range
loop.
Exercise 6.5:
The type of each word used by IntSet
is uint64
,
but 64-bit arithmetic may
be inefficient on a 32-bit platform. Modify the program to use the
uint
type, which is the most efficient unsigned integer type for
the platform.
Instead of dividing by 64, define a constant holding the effective
size of uint
in bits, 32 or 64.
You can use the perhaps too-clever expression
32 << (^uint(0) >> 63)
for this purpose.