Modern Systems Programming with Scala Native

Introducing exec

In Linux and other UNIX-like OSs, a small family of system calls is responsible for loading and executing programs, which is collectively referred to as exec. However, there isn’t actually a function named exec. Instead, there are variants for different combinations of arguments. The one we’ll be using is called execve, and it has the following signature:

def execve(filename:CString, args:Ptr[CString], env:Ptr[CString]):Int

execve takes all of its arguments as strings or arrays of strings. The first, path, is a string containing the relative path to the executable. The second, argv, short for argument vector, is a Ptr[CString], which is actually an array of strings containing the arguments to the executable as 0-terminated strings. The third, envp, short for environment pointer, is also a Ptr[CString], again pointing to an array of 0-terminated strings; however, in this case, each of the strings is of the form key=value. These strings conceptually form a Map[String, String], called the environment, which is typically used for passing named parameters and configuration to the program.

These arrays can be a bit tricky to construct, especially from an idiomatic Scala Seq[String], so let’s write a utility function to help us out. Two tricky parts can trip us up: one catch is that the the resulting array needs to have space for one more pointer than the number of strings in the Seq, which we’ll fill with a null pointer; that’s how execve will know it’s the end of the array. The other detail is that the first item in the array needs to be the path to the executable, again. But as long as we keep those constraints in mind, it won’t be too hard:

ForkWaitShell/nativeFork/nativeFork.scala

	def makeStringArray(args:Seq[String]):Ptr[CString] = {
	val pid = unistd.getpid()
	val size = sizeof[Ptr[CString]] * args.size + 1
	val dest_array = stdlib.malloc(size).asInstanceOf[Ptr[CString]]
	val count = args.size
	Zone { implicit z =>
	for ( (arg,i) <- args.zipWithIndex) {
	val string_ptr = toCString(arg)
	val string_len = string.strlen(string_ptr)
	val dest_str = stdlib.malloc(string_len + 1).asInstanceOf[Ptr[Byte]]
	string.strncpy(dest_str, string_ptr, arg.size + 1)
	dest_str(string_len) = 0
	dest_array(i) = dest_str
	()
	}
	()
	}
	dest_array(count) = null
	for (j <- (0 to count)) {
	}
	dest_array
	}

With that taken care of, we can wrap it in a more idiomatic Scala Native style by accepting the arguments and environment as a Seq and a Map, respectively. Then we just need to be sure to format the environment variables like KEY=VALUE, and we’re good:

ForkWaitShell/nativeFork/nativeFork.scala

	def runCommand(args:Seq[String], env:Map[String,String] = Map.empty):Int = {
	if (args.size == 0) {
	throw new Exception("bad arguments of length 0")
	}
	Zone { implicit z =>
	val fname = toCString(args.head)
	val arg_array = makeStringArray(args)
	val env_strings = env.map { case (k,v) => s"$k=$v" }
	val env_array = makeStringArray(env_strings.toSeq)

	val r = execve(fname, arg_array, env_array)
	if (r != 0) {
	val err = errno.errno
	stdio.printf(c"error: %d %d\n", err, string.strerror(err) )
	throw new Exception(s"bad execve: returned $r")
	}
	}
	??? // This will never be reached.
	}

A catch, though, is one you’ll see if we write a simple program to try it out. We’ll simply run the ls command with no arguments:

ForkWaitShell/bad_exec/bad_exec.scala

	def main(args:Array[String]):Unit = {
	println("about to exec")
	runCommand(Seq("/bin/ls", "-l", "."))
	println("exec returned, we're done!")
	}

If we run this program, we see the following:

	$ ./target/scala-2.11/badexec-out
	about to exec
	total 16
	-rw-r--r-- 1 rwhaling staff 2623 Jan 20 10:44 bad_exec.scala
	-rw-r--r-- 1 rwhaling staff 156 Jan 20 10:09 build.sbt
	drwxr-xr-x 6 rwhaling staff 192 Jan 20 10:30 project
	drwxr-xr-x 5 rwhaling staff 160 Jan 20 10:36 target

Our program appears to successfully run the ls command, but the final println, exec returned, we’re done!", never appeared.

What could have gone wrong? The answer is a subtlety of the exec functions, and it’s best to reproduce the note from the manual in full:

“execve() does not return on success, and the text, initialized data, uninitialized data (bss), and stack of the calling process are over‐written according to the contents of the newly loaded program.”

What this means is that if execve executes successfully, the state of our program is completely obliterated and replaced with the new program. execve never returns, because once the new program begins running, there’s nowhere for it to return to. Our calling program is gone.