In Linux and other UNIX-like OSs, a small family of system calls is responsible for loading and executing programs, which is collectively referred to as exec. However, there isn’t actually a function named exec. Instead, there are variants for different combinations of arguments. The one we’ll be using is called execve, and it has the following signature:
| def execve(filename:CString, args:Ptr[CString], env:Ptr[CString]):Int |
execve takes all of its arguments as strings or arrays of strings. The first, path, is a string containing the relative path to the executable. The second, argv, short for argument vector, is a Ptr[CString], which is actually an array of strings containing the arguments to the executable as 0-terminated strings. The third, envp, short for environment pointer, is also a Ptr[CString], again pointing to an array of 0-terminated strings; however, in this case, each of the strings is of the form key=value. These strings conceptually form a Map[String, String], called the environment, which is typically used for passing named parameters and configuration to the program.
These arrays can be a bit tricky to construct, especially from an idiomatic Scala Seq[String], so let’s write a utility function to help us out. Two tricky parts can trip us up: one catch is that the the resulting array needs to have space for one more pointer than the number of strings in the Seq, which we’ll fill with a null pointer; that’s how execve will know it’s the end of the array. The other detail is that the first item in the array needs to be the path to the executable, again. But as long as we keep those constraints in mind, it won’t be too hard:
| def makeStringArray(args:Seq[String]):Ptr[CString] = { |
| val pid = unistd.getpid() |
| val size = sizeof[Ptr[CString]] * args.size + 1 |
| val dest_array = stdlib.malloc(size).asInstanceOf[Ptr[CString]] |
| val count = args.size |
| Zone { implicit z => |
| for ( (arg,i) <- args.zipWithIndex) { |
| val string_ptr = toCString(arg) |
| val string_len = string.strlen(string_ptr) |
| val dest_str = stdlib.malloc(string_len + 1).asInstanceOf[Ptr[Byte]] |
| string.strncpy(dest_str, string_ptr, arg.size + 1) |
| dest_str(string_len) = 0 |
| dest_array(i) = dest_str |
| () |
| } |
| () |
| } |
| dest_array(count) = null |
| for (j <- (0 to count)) { |
| } |
| dest_array |
| } |
With that taken care of, we can wrap it in a more idiomatic Scala Native style by accepting the arguments and environment as a Seq and a Map, respectively. Then we just need to be sure to format the environment variables like KEY=VALUE, and we’re good:
| def runCommand(args:Seq[String], env:Map[String,String] = Map.empty):Int = { |
| if (args.size == 0) { |
| throw new Exception("bad arguments of length 0") |
| } |
| Zone { implicit z => |
| val fname = toCString(args.head) |
| val arg_array = makeStringArray(args) |
| val env_strings = env.map { case (k,v) => s"$k=$v" } |
| val env_array = makeStringArray(env_strings.toSeq) |
| |
| val r = execve(fname, arg_array, env_array) |
| if (r != 0) { |
| val err = errno.errno |
| stdio.printf(c"error: %d %d\n", err, string.strerror(err) ) |
| throw new Exception(s"bad execve: returned $r") |
| } |
| } |
| ??? // This will never be reached. |
| } |
A catch, though, is one you’ll see if we write a simple program to try it out. We’ll simply run the ls command with no arguments:
| def main(args:Array[String]):Unit = { |
| println("about to exec") |
| runCommand(Seq("/bin/ls", "-l", ".")) |
| println("exec returned, we're done!") |
| } |
If we run this program, we see the following:
| $ ./target/scala-2.11/badexec-out |
| about to exec |
| total 16 |
| -rw-r--r-- 1 rwhaling staff 2623 Jan 20 10:44 bad_exec.scala |
| -rw-r--r-- 1 rwhaling staff 156 Jan 20 10:09 build.sbt |
| drwxr-xr-x 6 rwhaling staff 192 Jan 20 10:30 project |
| drwxr-xr-x 5 rwhaling staff 160 Jan 20 10:36 target |
Our program appears to successfully run the ls command, but the final println, exec returned, we’re done!", never appeared.
What could have gone wrong? The answer is a subtlety of the exec functions, and it’s best to reproduce the note from the manual in full:
“execve() does not return on success, and the text, initialized data, uninitialized data (bss), and stack of the calling process are over‐written according to the contents of the newly loaded program.”
What this means is that if execve executes successfully, the state of our program is completely obliterated and replaced with the new program. execve never returns, because once the new program begins running, there’s nowhere for it to return to. Our calling program is gone.