Let’s see where we are. We’re implemented four GenServers and two supervisors. When the application starts, it will start the top-level supervisor, which in turn starts Results, PathFinder, WorkerSupervisor, and Gatherer.
When Gatherer starts (and it will start last), it tells the worker supervisor to start a number of workers. When each worker starts, it gets a path to process from PathFinder, hashes the corresponding file, and passes the result to Gatherer, which stores the path and the hash in the Results server. When there are no more files to process, each worker sends a :done message to the gatherer. When the last worker is done, the gatherer reports the results.
Everything seems to be wired up. Let’s try it:
| $ mix run |
| Compiling 7 files (.ex) |
| Generated duper app |
| $ |
Hmm…that’s strange. No output.
The first time this happened to me, I wasted most of a day working it out. And the problem is obvious once you know what’s happening.
The mix run command runs your application. Once it has it running, mix exits: mission accomplished.
But your application never finished; it just got started and mix went away. We have to tell mix not to exit.
| $ mix run --no-halt |
| Results: |
| |
| ["./_build/dev/lib/dir_walker/.compile.elixir_scm", |
| "./_build/test/lib/dir_walker/.compile.elixir_scm"] |
| ["./_build/dev/lib/dir_walker/.compile.elixir", |
| "./_build/test/lib/dir_walker/.compile.elixir"] |
| ["./_build/dev/lib/dir_walker/.compile.xref", |
| "./_build/dev/lib/duper/.compile.xref", |
| "./_build/test/lib/dir_walker/.compile.xref"] |
| ["./deps/dir_walker/.fetch", |
| "./_build/dev/lib/dir_walker/.compile.lock", |
| "./_build/dev/lib/dir_walker/.compile.fetch", |
| "./_build/test/lib/dir_walker/.compile.lock", |
| "./_build/test/lib/dir_walker/.compile.fetch"] |
| ["./_build/dev/lib/dir_walker/ebin/dir_walker.app", |
| "./_build/test/lib/dir_walker/ebin/dir_walker.app"] |
| $ |
Much better. Even inside our Elixir project we have duplicated files, mostly between the test and dev environments.
Our lib/duper/application.ex file contains parameters that tell the app where to search and how many workers to use when searching. (We’ll see in the next chapter how to move those values out of code and onto the command line.)
Let’s change these parameters. My ~/Pictures folder used 30 GB to store about 6,000 old pictures from when I used iPhoto. Let’s look for duplicates in that folder with one worker, two workers, and so on, recording elapsed time.
Here are the parameters for using a single worker:
| children = [ |
| Duper.Results, |
| { Duper.PathFinder, "/Users/dave/Pictures" }, |
| Duper.WorkerSupervisor, |
| { Duper.Gatherer, 1 }, |
| ] |
Run it:
| $ time mix run --no-halt >dups |
| 87.57 real 58.81 user 23.44 sys |
| |
| $ wc -l dups |
| 1869 dups |
We found 1,900-odd duplicated photos in about 88 seconds. The Elixir runtime used about 98% of one of my cores during this process.
Let’s try with two workers. Alter application.ex, and run this:
| $ time mix run --no-halt >dups |
| 48.58 real 58.33 user 17.98 sys |
Nice! It ran almost twice as fast. It means that I’m successfully overlapping the IO and the hashing.
To cut a long story short, here are the results for 1..5, 10, and 50 workers.
As my machine has only two processors (four cores, but two are just hyperthreading), that’s about as good as I could expect.