So far, you’ve seen that Scala Native can be easy to use. In many cases, it works as a drop-in replacement for mainstream JVM Scala. But what can it do for us that JVM Scala can’t?
To see some of Scala Native’s more exceptional functionality, run sbt nativeLink. You should see output like this:
| $ sbt nativeLink |
| [info] Loading project definition from ... |
| [info] Set current project to sn-word-sorter (in build file:... |
| [info] Compiling 1 Scala source to ... |
| [info] Discovered 1279 classes and 9445 methods |
| [info] Optimizing (4418 ms) |
| [info] Generating intermediate code (961 ms) |
| [info] Produced 37 files |
| [info] Compiling to native code (2072 ms) |
| [info] Linking native code (308 ms) |
| [success] Total time: 17 s, completed Jan 21, 2018 4:25:58 PM |
If you look in your build directory at target/scala-2.11, you’ll see a 4.2MB executable file called hello-minimal-out. This file is a native binary—it consists of immediately executable CPU instructions plus headers, symbol tables, and other metadata to allow your operating system to load it and run it.
You should also see a 4.2MB file at target/scala-2.11/hello-minimal-out. That’s our program! You can run it on its own just by typing ./target/scala-2.11/hello-minimal-out. You can copy it, move it around, and in many cases copy it to another computer intact. This file contains executable machine code: binary CPU instructions that your OS can load into memory and run without a virtual machine or interpreter.
In contrast, if you were to package up a standard JVM Scala Hello, World project, the output is a 5.5MB file called hello-minimal-assembly-0.1-SNAPSHOT.jar. Unlike our native binary, a .jar file cannot be directly executed—instead, it must be executed by a Java Virtual Machine (JVM). Combined, the size of a JVM and an application JAR is often close to 100MB for a small app, and can rapidly increase for larger projects with complex dependencies.
There’s another, even more important difference. If we time the execution of both versions of our program, we’ll see this:
| $ time java -jar ./target/scala-2.11/hello_jvm-assembly.jar |
| hello, world! |
| |
| real 0m0.350s |
| user 0m0.368s |
| sys 0m0.038s |
| |
| $ time ./target/scala-2.11/hello_out |
| hello, world! |
| |
| real 0m0.024s |
| user 0m0.021s |
| sys 0m0.003s |
This is already an exciting result! Scala Native runs our Hello, World program in about 20 milliseconds, while our JVM program takes almost twenty times longer—close to half of a second—to print a string out to the console. Here we’re seeing the impact of the JVM. A Java Virtual Machine is itself a large, complex program that takes time to set up and shut down, and we have to go through that process every time our tiny Scala program runs. In contrast, our native binary is a file containing machine code. Our OS can just load it into memory, point the CPU at the main method, and let it run.
Before we go any further, though, it’s worth taking a step back and asking, When does performance matter?
For a command-line tool that a developer runs a few times an hour, a difference in startup time is a nice quality-of-life improvement. But when you’re dealing with big data, high-throughput networking, or heavy-duty I/O, efficiency is critical; improving performance or reducing resource usage can save serious amounts of money and make it possible to tackle new, harder problem domains. That level of performance isn’t always necessary; there are plenty of problems that are easily solved by higher-level programming languages. But throughout this book, we’re going to keep our focus on areas where this kind of performance can make a difference. As a result, we’re going to rapidly move from Hello, World to seriously big data.
Let’s dive in by exploring the foundations of systems programming, starting with input and output.