Named functions can be declared only at the package level, but we can
use a function literal to denote a function value within any
expression.
A function literal is written like a function declaration, but without
a name following the func
keyword.
It is an expression, and its value is called an
anonymous function.
Function literals let us define a function at its point of use.
As an example, the
earlier call to strings.Map
can be rewritten as
strings.Map(func(r rune) rune { return r + 1 }, "HAL-9000")
More importantly, functions defined in this way have access to the entire lexical environment, so the inner function can refer to variables from the enclosing function, as this example shows:
// squares returns a function that returns // the next square number each time it is called. func squares() func() int { var x int return func() int { x++ return x * x } } func main() { f := squares() fmt.Println(f()) // "1" fmt.Println(f()) // "4" fmt.Println(f()) // "9" fmt.Println(f()) // "16" }
The function squares
returns another function,
of type func() int
.
A call to squares
creates a local variable x
and
returns an anonymous function that, each time it is called, increments
x
and returns its square.
A second call to squares
would create a second variable
x
and return a new anonymous function which increments that
variable.
The squares
example demonstrates that function values are not
just code but can have state.
The anonymous inner function can access
and update the local variables of the enclosing function squares
.
These hidden variable references are why we classify functions as
reference types and why function values are not comparable.
Function values like these are implemented using a technique called
closures, and Go programmers often use this term for function
values.
Here again we
see an example where the lifetime of a variable is not determined by its
scope: the variable x
exists after squares
has returned
within main
, even though x
is hidden inside f
.
As a somewhat academic example of anonymous functions, consider the problem of
computing a sequence of computer science courses that
satisfies the prerequisite requirements of each one. The
prerequisites are given in the prereqs
table below, which is a mapping
from each course to the list of courses that must be completed
before it.
// prereqs maps computer science courses to their prerequisites. var prereqs = map[string][]string{ "algorithms": {"data structures"}, "calculus": {"linear algebra"}, "compilers": { "data structures", "formal languages", "computer organization", }, "data structures": {"discrete math"}, "databases": {"data structures"}, "discrete math": {"intro to programming"}, "formal languages": {"discrete math"}, "networks": {"operating systems"}, "operating systems": {"data structures", "computer organization"}, "programming languages": {"data structures", "computer organization"}, }
This kind of problem is known as topological sorting. Conceptually, the prerequisite information forms a directed graph with a node for each course and edges from each course to the courses that it depends on. The graph is acyclic: there is no path from a course that leads back to itself. We can compute a valid sequence using depth-first search through the graph with the code below:
func main() { for i, course := range topoSort(prereqs) { fmt.Printf("%d:\t%s\n", i+1, course) } } func topoSort(m map[string][]string) []string { var order []string seen := make(map[string]bool) var visitAll func(items []string) visitAll = func(items []string) { for _, item := range items { if !seen[item] { seen[item] = true visitAll(m[item]) order = append(order, item) } } } var keys []string for key := range m { keys = append(keys, key) } sort.Strings(keys) visitAll(keys) return order }
When an anonymous function requires recursion, as in this example, we
must first declare a variable, and then assign the anonymous function
to that variable. Had these two steps been combined in the
declaration, the function literal would not be within the scope of
the variable visitAll
so it would have no way to call itself recursively:
visitAll := func(items []string) { // ... visitAll(m[item]) // compile error: undefined: visitAll // ... }
The output of the toposort
program is shown below.
It is deterministic, an often-desirable property that doesn’t always
come for free.
Here, the values of the prereqs
map are slices, not more maps,
so their iteration order is deterministic, and we sorted the keys of
prereqs
before making the initial calls to visitAll
.
1: intro to programming 2: discrete math 3: data structures 4: algorithms 5: linear algebra 6: calculus 7: formal languages 8: computer organization 9: compilers 10: databases 11: operating systems 12: networks 13: programming languages
Let’s return to our findLinks
example.
We’ve moved the link-extraction function links.Extract
to its own package, since we’ll use it again in
Chapter 8.
We replaced the visit
function with an anonymous function
that appends to the links
slice directly, and used
forEachNode
to handle the traversal.
Since Extract
needs only the pre
function, it passes
nil
for the post
argument.
// Package links provides a link-extraction function. package links import ( "fmt" "net/http" "golang.org/x/net/html" ) // Extract makes an HTTP GET request to the specified URL, parses // the response as HTML, and returns the links in the HTML document. func Extract(url string) ([]string, error) { resp, err := http.Get(url) if err != nil { return nil, err } if resp.StatusCode != http.StatusOK { resp.Body.Close() return nil, fmt.Errorf("getting %s: %s", url, resp.Status) } doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { return nil, fmt.Errorf("parsing %s as HTML: %v", url, err) } var links []string visitNode := func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "a" { for _, a := range n.Attr { if a.Key != "href" { continue } link, err := resp.Request.URL.Parse(a.Val) if err != nil { continue // ignore bad URLs } links = append(links, link.String()) } } } forEachNode(doc, visitNode, nil) return links, nil }
Instead of appending the raw href
attribute value to the
links
slice, this version parses it as a URL relative to the
base URL of the document, resp.Request.URL
.
The resulting link
is in absolute form, suitable for use in a
call to http.Get
.
Crawling the web is, at its heart, a problem of graph traversal.
The topoSort
example showed a depth-first traversal;
for our web crawler, we’ll use breadth-first traversal, at least initially.
In Chapter 8, we’ll explore concurrent traversal.
The function below encapsulates the essence of a breadth-first
traversal.
The caller provides an initial list worklist
of items to visit
and a function value f
to call for each item.
Each item is identified by a string.
The function f
returns a list of new items to append to the
worklist.
The breadthFirst
function returns when all items have been visited.
It maintains a set of strings to ensure that no item is visited twice.
// breadthFirst calls f for each item in the worklist. // Any items returned by f are added to the worklist. // f is called at most once for each item. func breadthFirst(f func(item string) []string, worklist []string) { seen := make(map[string]bool) for len(worklist) > 0 { items := worklist worklist = nil for _, item := range items { if !seen[item] { seen[item] = true worklist = append(worklist, f(item)...) } } } }
As we explained in passing in Chapter 3,
the argument “f(item)...
” causes all the items in the list returned
by f
to be appended to the worklist.
In our crawler, items are URLs.
The crawl
function we’ll supply to breadthFirst
prints the URL,
extracts its links, and returns them so that they too are visited.
func crawl(url string) []string { fmt.Println(url) list, err := links.Extract(url) if err != nil { log.Print(err) } return list }
To start the crawler off, we’ll use the command-line arguments as the initial URLs.
func main() { // Crawl the web breadth-first, // starting from the command-line arguments. breadthFirst(crawl, os.Args[1:]) }
Let’s crawl the web starting from https://golang.org
.
Here are some of the resulting links:
$ go build gopl.io/ch5/findlinks3 $ ./findlinks3 https://golang.org https://golang.org/ https://golang.org/doc/ https://golang.org/pkg/ https://golang.org/project/ https://code.google.com/p/go-tour/ https://golang.org/doc/code.html https://www.youtube.com/watch?v=XCsL89YtqCs http://research.swtch.com/gotour https://vimeo.com/53221560 ...
The process ends when all reachable web pages have been crawled or the memory of the computer is exhausted.
Exercise 5.10:
Rewrite topoSort
to use maps instead of slices and eliminate the
initial sort.
Verify that the results, though nondeterministic, are valid
topological orderings.
Exercise 5.11:
The instructor of the linear algebra course decides that calculus is now a
prerequisite. Extend the topoSort
function to report cycles.
Exercise 5.12:
The startElement
and endElement
functions in
gopl.io/ch5/outline2
(§5.5)
share a global variable, depth
.
Turn them into anonymous functions that share a variable local to the
outline
function.
Exercise 5.13:
Modify crawl
to make local copies of the pages it finds,
creating directories as necessary.
Don’t make copies of pages that come from a different domain. For
example, if the original page comes from golang.org
, save all files from
there, but exclude ones from vimeo.com
.
Exercise 5.14:
Use the breadthFirst
function to explore a different structure.
For example, you could use the course dependencies from the
topoSort
example (a directed graph), the file system hierarchy
on your computer (a tree), or a list of bus or subway routes
downloaded from your city government’s web site (an undirected graph).
In this section, we’ll look at a pitfall of Go’s lexical scope rules that can cause surprising results. We urge you to understand the problem before proceeding, because the trap can ensnare even experienced programmers.
Consider a program that must create a set of directories and later remove them. We can use a slice of function values to hold the clean-up operations. (For brevity, we have omitted all error handling in this example.)
var rmdirs []func() for _, d := range tempDirs() { dir := d // NOTE: necessary! os.MkdirAll(dir, 0755) // creates parent directories too rmdirs = append(rmdirs, func() { os.RemoveAll(dir) }) } // ...do some work... for _, rmdir := range rmdirs { rmdir() // clean up }
You may be wondering why we assigned the loop variable d
to a new
local variable dir
within the loop body, instead of just naming the
loop variable dir
as in this subtly incorrect variant:
var rmdirs []func() for _, dir := range tempDirs() { os.MkdirAll(dir, 0755) rmdirs = append(rmdirs, func() { os.RemoveAll(dir) // NOTE: incorrect! }) }
The reason is a consequence of the
scope rules for loop variables.
In the program immediately above, the for
loop introduces a new lexical block
in which the variable dir
is
declared.
All
function values created by this loop “capture” and share the same
variable—an addressable storage location, not its value at that
particular moment.
The value of dir
is updated in successive iterations,
so by the time the cleanup functions are called, the
dir
variable has been updated several times by the now-completed
for
loop. Thus dir
holds the value from the final iteration, and
consequently all calls to os.RemoveAll
will attempt to remove the
same directory.
Frequently, the inner variable introduced to work around this
problem—dir
in our example—is given the exact same name as the
outer variable of which it is a copy, leading to odd-looking but
crucial variable declarations like this:
for _, dir := range tempDirs() { dir := dir // declares inner dir, initialized to outer dir // ... }
The risk is not unique to range
-based for
loops.
The loop in the example below suffers from the same problem due to
unintended capture of the index variable i
.
var rmdirs []func() dirs := tempDirs() for i := 0; i < len(dirs); i++ { os.MkdirAll(dirs[i], 0755) // OK rmdirs = append(rmdirs, func() { os.RemoveAll(dirs[i]) // NOTE: incorrect! }) }
The problem of iteration variable capture is most often encountered
when using the go
statement (Chapter 8) or with defer
(which we
will see in a moment) since both may delay the execution of a
function value until after the loop has finished.
But the problem is not inherent to go
or defer
.