A Discursion into Recursion

This program needs to be able to navigate down the entire subdirectory tree to any number of levels. To be able to do this, you have to use recursion. Put simply, a recursive method is one that calls itself. If you aren’t familiar with recursive programming, see Digging Deeper in Digging Deeper.

In the program file_info.rb, the processfiles method is recursive:

file_info.rb

def processfiles( aDir )
    totalbytes = 0
    Dir.foreach( aDir ){
    |f|
    mypath = "#{aDir}\\#{f}"
    s = ""
    if File.directory?(mypath) then
        if f != '.' and f != '..' then
        bytes_in_dir = processfiles(mypath)     # <==== recurse!
        puts( "<DIR> --->
            #{mypath} contains [#{bytes_in_dir/1024}] KB" )
    end
    else
       filesize = File.size(mypath)
       totalbytes += filesize
       puts ( "#{mypath} : #{filesize/1024}K" )
    end
    }
    $dirsize += totalbytes
    return totalbytes
end

You will see that when the method is first called, toward the bottom of the source code, it is passed the name of a directory in the variable dirname:

processfiles( dirname )

I’ve already assigned the parent of the current directory, given by two dots:

dirname = ".."

If you are running this program in its original location (that is, the location to which it is extracted from this book’s source code archive), this will reference the directory containing the subdirectories of all the sample code files. Alternatively, you could assign the name of some directory on your hard disk to the variable, dirname. If you do this, don’t specify a directory containing huge numbers of files and directories (on Windows, C:\Program Files would not be a good choice, and C:\ would be even worse!) because the program would then take quite some time to execute.

Let’s take a closer look at the code in the processfiles method. Once again, I use Dir.foreach to find all the files in the current directory and pass each file, f, one at a time, to be handled by the code in a block between curly brackets. If f is a directory and is not the current one ('.') or its parent directory ('..'), then I pass the full path of the directory back to the processfiles method:

if File.directory?(mypath) then
    if f != '.' and f != '..' then
       bytes_in_dir = processfiles(mypath)

If f is not a directory but just an ordinary data file, I find its size in bytes with File.size and assign this to the variable filesize:

filesize = File.size(mypath)

As each successive file, f, is processed by the block of code, its size is calculated, and this value is added to the variable totalbytes:

totalbytes += filesize

Once every file in the current directory has been passed into the block, totalbytes will be equal to the total size of all the files in the directory.

However, I need to calculate the bytes in all the subdirectories too. Because the method is recursive, this is done automatically. Remember that when the code between curly brackets in the processfiles method determines that the current file, f, is a directory, it passes this directory name back to itself—the processfiles method.

Let’s imagine that you first call processfiles with the C:\test directory. At some point, the variable f is assigned the name of one of its subdirectories, say, C:\test\dir_a. Now this subdirectory is passed back to processfiles. No further directories are found in C:\test\dir_a, so processfiles simply calculates the sizes of all the files in this subdirectory. When it finishes calculating these files, the processfiles method comes to an end and returns the number of bytes in the current directory, totalbytes, to whichever bit of code called the method in the first place:

return totalbytes

In this case, it was this bit of code inside the processfiles method that recursively called the processfiles method:

bytes_in_dir = processfiles(mypath)

So, when processfiles finishes processing the files in the subdirectory, C:\test\dir_a, it returns the total size of all the files found there, and this is assigned to the bytes_in_dir variable. The processfiles method now carries on where it left off (that is, it continues from the point at which it called itself to deal with the subdirectory) by processing the files in the original directory, C:\test.

No matter how many levels of subdirectories this method encounters, the fact that it calls itself whenever it finds a directory ensures that it automatically travels down every directory pathway it finds, calculating the total bytes in each.

One final thing to note is that the values assigned to variables declared inside the processfiles method will change back to their “previous” values as each level of recursion completes. So, the totalbytes variable will first contain the size of C:\test\test_a\test_b, then of C:\test\test_a, and finally of C:\test. To keep a running total of the combined sizes of all the directories, you need to assign values to a variable declared outside the method. Here I use the global variable $dirsize for this purpose, adding to it the value of totalbytes calculated for each subdirectory processed:

$dirsize += totalbytes

Incidentally, although a byte may be a convenient unit of measurement for very small files, it is generally better to describe larger files in kilobyte sizes and very large files or directories in megabytes. To change bytes to kilobytes or to change kilobytes to megabytes, you need to divide by 1,024. To change bytes to megabytes, divide by 1,048,576. The last line of code in my program does these calculations and displays the results in a formatted string using Ruby’s printf method:

printf( "Size of this directory and subdirectories is
    #{$dirsize} bytes,
    #{$dirsize/1024}K, %0.02fMB",
    "#{$dirsize/1048576.0}" )

Notice that I have embedded the formatting placeholder "%0.02fMB" in the first string, and I have added a second string following a comma: "#{$dirsize/1048576.0}". The second string calculates the directory size in megabytes, and this value is then substituted for the placeholder in the first string. The placeholder’s formatting option "%0.02f" ensures that the megabyte value is shown as a floating-point number, "f", with two decimal places, "0.02".