Chapter 4

Programming Tutorial

Principles and Best Practices

Pascal Wallisch

Abstract

Unlike many other sections in MATLAB® for Neuroscientists, the focus here is not on learning techniques in MATLAB, but on how to use those techniques better. The sections that follow introduce guidelines for code organization in small and large projects, defect (bug) control, and testing strategies in an attempt to communicate strategies for managing the complexity that comes with larger programming efforts.

Keywords

code organization; code maintenance; variables; global functions; mnemonics; scope; parameters; commenting code; objects

4.1 Goals of this Chapter

Unlike most other sections in MATLAB® for Neuroscientists, the focus here is not on learning new techniques in MATLAB, but on how to use those techniques better. The sections that follow introduce guidelines for code organization in small and large projects, defect (bug) control, and testing strategies in an attempt to communicate strategies for managing the complexity that comes with larger programming efforts.

In order to benefit maximally, basic proficiency with MATLAB coding is necessary. Working through the MATLAB tutorial should be adequate preparation; however, progressing through a few sections beyond the tutorial is an even better preparation. The additional experience with MATLAB will provide a stronger foundation for understanding the rationale for the suggestions that follow.

4.2 Organizing Code

4.2.2 Variables and How to Name Them

Simply put, a variable denotes a storage location. That location can hold a number, a function, a matrix, or even more complex entities, such as cell arrays or MATLAB objects. In the context of MATLAB code, a variable is referenced by name and scope. This section will discuss variable naming strategies. Variable scope is equally important, but it will be discussed in the next section.

Variable names can be any contiguous set of alphanumeric (i.e., 0–9 and a–z) characters plus the underline character, and they begin with a nonnumeric character. For example, this_is_a_variable and th1s_1s_4ls0_4_v4r14bl3. With such flexibility in name choice, it is surprising how often poor names are chosen.

Here is a simple function written twice to demonstrate the impact of good variable name choices. First, a version of the function with poorly chosen variable names.

function out = align_waveforms(x)

  % Determines alignments for a set of waves relative

  % to the initial waveform using

  % cross correlation.

  % Input parameters

  % x:    MxN matrix of waveforms, where x(m,:) is the nth waveform

  % Output parameters

  % out:   vector of length M, where out(m) is the offset relative

  %     to the first wave

 n = size(x);

 n = n(2);

 out = [];

 for w = 1:n

      c = xcorr(x(:, 1), x(:, w));

      s = find(c == max(c));

      d = s - length(c)/2;

      out = [out d];

  end

end

The same function with better variable names follows.

function offsets = align_waveforms(waves)

  % Determines alignments for a set of waves relative

  % to the initial waveform using

  % cross correlation.

  % Input parameters

  % waves:  MxN matrix of waveforms, where x(m,:) is the nth waveform

  % Output parameters

  % offsets: vector of length M, where out(m) is the offset relative

  %     to the first wave

 wave_count = size(waves);

 wave_count = wave_count(2);

 offsets = [];

 for wave = 1:wave_count

      c = xcorr(waves(:, 1), waves(:, wave));

      max_c_index = find(c == max(c));

      offset = max_c_index - length(c)/2;

      offsets = [offsets offset];

  end

end

Clearly, variable name choice has impact on readability, even in short functions. Here are a few simple guidelines for naming variables that promote readability and maintainability.

Avoid the names of global functions. When MATLAB encounters a sequence of characters that forms a valid name, the variables in the current workspace are checked first for possible matches. MATLAB searches for functions, scripts, or classes only if an identifier fails to match any existing variable names. Choosing a name synonymous with a MATLAB function, or even a user-defined function, hides that function in the current scope. In the example code ahead, assigning the value 5 to a variable named “factorial” causes the subsequent attempt to call the factorial function to fail. Because MATLAB recognizes factorial as a variable, the interpreter attempts to resolve (4) an index into a vector. Since the variable factorial is a scalar, the index request fails and yields the error.

>> factorial(4)

ans =

  24

>> factorial = 5;

>> factorial(4)

??? Index exceeds matrix dimensions

Especially inappropriate choices can even cause difficult to identify errors. Another example ahead shows how setting gamma to a vector creates a situation in which an integer argument to the gamma function is misinterpreted as an index into the vector and yields the wrong value.

>> gamma(4) % the correct value for gamma(4) is 6

ans =

  6

>> gamma = [0 1 2 3 4 5];

>> gamma(4)

ans =

  3

While the MATLAB interpreter is able to evaluate the request for element 4 of the vector gamma without an explicit error, this is highly confusing to anyone familiar to the gamma function. If the intent of gamma(4) was actually to invoke the gamma function, the expression returns the wrong result silently. Such code needlessly complicates later maintenance and readability.

The which command is especially useful for determining if a variable might override an existing command. Typing which followed by a potential name for a variable displays information about the identity of that name. The clear command will remove a variable from the current workspace, which is quite useful when inadvertently overriding an important command. Note how clear alters how MATLAB resolves the identity of gamma in the following example.

>> gamma = [0 1 2 3 4 5];

>> gamma(4)

ans =

  3

>> which gamma

gamma is a variable.

>> clear gamma

>> gamma(4)

ans =

  6

>> which gamma

built-in (/sw/matlab-7.11/toolbox/matlab/specfun/@double/gamma) % double method

Pick a mnemonic name. A name that reflects the purpose of a variable improves readability significantly. Although it’s quite tempting to choose short, one-character variable names such as n or x, variable names should reflect the variable’s use or contents whenever possible. A common MATLAB task involves writing mathematical formulae as executable MATLAB code. When writing such code, the use of exceptionally short variable names is especially tempting, since variables used in mathematical notation are quite often single letters. In the simplest of functions, this is reasonable, especially if the function operates uniformly on all inputs, i.e., there is no specific meaning ascribable to the variable. This is fairly rare, however. Aside from simple mathematical functions or variables used as indices in for loops, single-letter variables should be avoided.

A mnemonic name is not an invitation to a stream of consciousness description of the code, however. An excessively long name might be a subtle clue of an overly broad or imprecise use. For example, a variable named indicates_yes_response_or_viable_value should probably be broken into two separate variables for simplicity. This guideline is particularly true for any variable expected to be used interactively. No one wants to type indicates_yes_response_or_viable_value over and over in an interactive session unless absolutely necessary.

Retire variables after their specific purpose. Repurposing variables can make code difficult to follow and maintain, particularly in a long or complicated sequence of code. Usually choosing mnemonic variable names automatically avoids this problem.

Exercise 4.1

Review the following code and rename variables that could be better named. Use the comments as a guide to the intended functionality.

function psth,bins = bin_for_psth(rd, ...

                 sampling_rate_in_samples_per_second, ...

                 t, ...

                 q, ...

                 q2, ...

                 size_of_each_psth_bin_in_seconds)

  % Locates events above threshold in raw data and generates PSTH from

  % multi-trial recording. Trials should be contiguous.

  %

  % Input  parameters

  %   rd:  raw input data

  %   sampling_rate_in_samples_per_second: sampling rate in Hz

  %   t: threshold for events, in same units as raw data

  %   q: number of contiguous trials

  %   q2: length of each trial, in seconds

  %   size_of_each_psth_bin_in_seconds: size of each PSTH bin, in seconds

  %

  % Output parameters

  %   psth:       count in each bin

  %   bins:       center position of each bin, in seconds relative to

  %            trial start

  % first, threshold signal

  vnts = rd > t;

  % only positive threshold crossings (not sustained activity above threshold)

  vnts = diff(events) == 1;

  vnts = [0 events];

  % now, split into trials

  vnts = reshape(vnts, q, ...

            q2 * sampling_rate_in_samples_per_second);

  % vnts should be MxN, where M = trial and N = sample

  sum = sum(vnts);

  % sum should be the sum of events at each sample relative to

  % the start of the trial

  max_vnt_count = max(sum);

  for count = 0:max_event_count-1

     above_count = find(sum > count);

     vnt_offsets = [vnt_offsets above_count];

  end

  % vnt_offsets should be the offset in sample counts where events occur

  vnt_ts = vnt_offsets / sampling_rate_in_samples_per_second;

  [psth, bins] = hist(vnt_ts, q2/size_of_each_psth_bin_in_seconds);

end

4.2.3 Understanding Scope

Scope refers to the extent of a variable within the code. During execution, variables move in and out of scope. For example, under normal circumstances, a variable created within a MATLAB function ceases to exist once the function ends. One of the most important aspects of scope is that scope together with name uniquely denotes a variable. Two or more variables with identical names can coexist separately in different scopes.

Related to the idea of scope is the MATLAB workspace, which acts as a container for variables within a specific scope. In a sense, workspaces implement scope. Unlike the abstract concept of scope, workspaces in MATLAB are entities that can be viewed and interacted with. The most visible workspace is the workspace associated with the command line, which is visible in the workspace window during interactive sessions, but other workspaces are created, suspended, and destroyed as necessary to implement other scopes during execution.

MATLAB recognizes three basic scopes: the command-line scope, global scope, and function-level scope. Each of these has one or more corresponding workspaces during execution. With a few exceptions, when the MATLAB interpreter encounters a legal variable name, the current workspace is checked for a match. Which workspace is current changes throughout execution. If a match is located, the identifier is treated as the corresponding variable. We will now discuss each of these types of scope (command line, global, and function level) and their corresponding workspaces.

The command-line scope consists of all variables created interactively at the command line or in a script file. Unlike functions, scripts operate under the command-line scope. Thus, scripts have access to all variables present in the command-line scope and their values. This makes scripts especially useful for executing sets of commands that one would normally type at the command line, such as commands to set up an environment for a specific type of analysis or visualization. It also means that scripts can easily overwrite variables in the command line scope.

Function-level scope occurs at the entry of a function (i.e., the beginning of execution, at a function call) and continues until the function ends, usually at the return. Variables within the function-level scope do not persist after the execution of a function call. Cases such as the code ahead may appear to counter this assertion, but a careful analysis demonstrates that this is not the case.

----- in square.m-----

function x = square(x)

  x = x * 2;

------at command line-----

>> x = 6;

>> x = square(x);

>> x

x =

  12

At the command line, it may appear that the value of x persists beyond the call to square() or within the call to square(). However, this is not the case. Initially, there is a variable x declared at the command line. This variable contains the value 6. Then, the assignment statement x=square(x) is executed, and the function square() is called, with the parameter x. During the execution of the call to square, all the expressions in the parameter list are evaluated prior to the call. In this case, the variable x is evaluated, and it refers to the variable x at command line scope. So the value is 6, and square() is called.

When square() is called with the parameter 6, the value 6 is bound to the function-level scope variable x during the execution of square(). It is crucial to note that this x at the function-level scope has no relationship with the variable of the same name at the command line scope aside from the confusing nature of their identical names. During the execution of square(), the function-level variable x is set to 2 times itself, or 12, and the function returns. At the return of the function, the value of function-level x is obtained as the return value of the function (this return value is 12) and bound to the command-line scope variable x as the final part of the assignment statement. Thus, the command-line scope variable x will then contain the value 12.

As demonstrated above, when examining functions and function calls, it is important to remember that all parameters in a function call are evaluated before the call and then bound to variables in the function-level scope. In other words, variable names in a function call have no direct relationship with variable names in the function’s code. So, in the previous example, the command-line variables could have been named x2, y, or even not_the_x_in_square, and the example would have produced the same result. Formally, parameters in the function’s code are termed formal parameters to distinguish them from parameters in the calling of a function, which are called actual parameters.

Function-level scope can become especially complex with recursive functions. Recursive functions invoke themselves, albeit (usually) with different parameters each time. Here is an example function that implements a factorial, N!. The function is called factorial2 to avoid conflict with the MATLAB built-in function factorial.

function f = factorial2(g)

if g == 1

 f = 1;

 return

end

f = g * factorial2(g-1);

end

So, calling factorial2 with a value of 3 causes it to call factorial2 with a value 2, which calls factorial2 with a value of 1. The innermost factorial2 call which was passed a value 1 terminates, returning 1. Then, the factorial2(2) call resumes, calculating 2*1 and returning the result 2. Finally, the original call factorial2(3) resumes, calculating 3*2 (2 being the result from factorial2(2)) and returning 6. This sequence of calls and the associated creation, suspension, and deletion of scopes is illustrated in Figure 4.1.

It is important to note that each invocation of factorial2 creates its own scope. Consequently, each variable g in an invocation is different from the variables named g in other invocations. As each execution of factorial2 invokes factorial2 with a slightly smaller input parameter, the existing scope is suspended for the execution of the inner call and resumed when the call returns.

In the context of execution, each scope created by the invocation of a function call is sometimes called a stack frame. The collection of all the frames existing at any point in the execution, suspended or live, is sometimes called the execution stack or call stack. factorial2() can be modified to make the stack frame a little more visible during execution by displaying input parameter g and the current stack using dbstack:

function f = factorial2(g)

g

dbstack

if g == 1

 f = 1;

 return

end

f = g * factorial2(g-1);

end

Invoking factorial2() on the input 3 at the command line involves 3 calls to factorial2.

>> factorial2(3)

g =

  3

In factorial2 at 3

g =

  2

In factorial2 at 3

In factorial2 at 8

g =

  1

In factorial2 at 3

In factorial2 at 8

In factorial2 at 8

ans =

  6

Global scope has characteristics of both command-line and function-level scopes. Like command-line scope, variables in global scope persist for the lifetime of the MATLAB interpreter. Like function-level scope, variables in global scope are accessible during function execution. One substantial difference between global scope and either of the two other scopes is the necessity of explicitly specifying global scope for those variables which need it. This is done with the global keyword. Here is an example showing two functions referencing the same variable in global scope.

function x = increment()

global increment_value

x = increment_value + x

end

function set_increment(y)

global increment_value

increment_value = y

end

Without the statement global increment_value in either one of the functions, subsequent statements would attempt to reference a variable of the same name (increment_value) with function-level scope. As one might expect from the scope of global variables (i.e., available wherever global appears), all global variables share a workspace separate from command-line and function-level workspaces.

4.3 Organizing More Code: Bigger Projects

4.3.2 Coupling and Cohesion

Coupling describes the flow of information to and, to some degree, the degree of dependency between two or more logical units. Functions requiring a greater quantity of structured information for use, such as parameters or global variables, are described as having high or strong coupling. Functions requiring minimal amounts of parameters or global variables are likewise described as having low or weak coupling.

function s = square(m)

 s = m .^2

end

function s = square_field(m)

s = m.matrix_data.^2

global global_sum

global_sum = global_sum + sum(s)

end

In the function square, there is a small degree of coupling between square and a calling function. The function square expects a single input parameter and returns a single output parameter. The function square_field is far more coupled to its caller. Like square, square_field expects a single input parameter and returns a single output parameter. However, square_field expects the input parameter to be a MATLAB object with a field named matrix_data. Moreover, square_field uses a global variable to track the global sum of squares. These two aspects of square_field’s functionality must be understood and managed by any function invoking square_field.

High coupling marks a more complex relationship between two or more functions. Usually, a more complex relationship is more difficult to manage in the event of modification. In the example above, if the name of the field matrix_data needed to be changed, every caller of square_field would need to be changed in addition to square_field itself.

Low coupling has a number of advantages for reuse. With fewer parameters to construct and provide, simpler interfaces are easier to integrate into existing code. The simpler relationships of weaker coupling also mean less to understand when reading or maintaining the code in the future.

Cohesion describes the degree to which a unit’s functionality achieves a single purpose. Functions in which the functionality of each constituent part implements some necessary aspect of a clear purpose have high cohesion. The example functions above differ in cohesion as well as coupling. The function square has high cohesion. Its single statement implements the clear functionality of the unit, which is an element-wise squaring of the input. The function square_field, on the other hand, could be described as less cohesive. The rationale for tracking the global sum of elements is not clear from the function’s name or other code. As such, one could argue that the statements dealing with the global variable global_sum are not aligned with the primary purpose of the unit, which is to calculate the square of the input variable’s field matrix_data.

Functions with lower cohesion are often more difficult to reuse. Such functions often have additional input/output variables or even global variables that must be accounted for in the calling code. Accounting for unnecessary aspects of the function takes additional time and effort that, themselves, should not be necessary. Code focused on a single purpose is usually simpler, which is often easier to read, maintain, and debug.

4.3.3 Separation of Concerns

Separation of concerns is a useful guiding principle in organizing a larger project. Under separation of concerns, all code related to providing a distinct feature is grouped together in a separate logical unit. This logical unit could be a single-named function for simpler features or a set of functions or objects for more complex features. Thus, if systems have overlapping common features (i.e., concerns), then separation argues for separating out those common concerns into a new logical unit.

This is quite similar to, but distinct from, high cohesion. High cohesion demands conformity of purpose within a logical unit. Separation of concerns seeks to collect similar functionality across a system within a single logical unit. Often, maintaining high cohesion will result in a very natural separation of concerns among logical units.

Separation of concerns provides the substantial benefit that different areas of the project can be tested and modified independently of other functional units. Let’s take a hypothetical example of an application that presents auditory stimuli and records EEG data during the stimulus presentation. In such an application, there are many concerns, including randomization of stimuli, playback of stimuli through the speakers, displaying EEG data in realtime, collecting EEG data through hardware, and storing the EEG data. This list is only an example. Different scenarios might lend themselves to a different set of concerns. Under a separation of concerns model, all the code dealing with one of these concerns, for example, realtime display, would be confined to one or more functions that would be limited as much as possible to realizing realtime display functionality.

4.3.4 Limiting Side Effects, or the Perils of Global State

Using nonlocal variables, either through script files or through variables declared as global, strongly limits reuse. Imagine a function that counts spikes in a recording and uses a global variable to track the total number of spikes over all recordings:

function interval_count = count_spikes(r, threshold)

  global global_count;

  % above_threshold will be for every sample above threshold

  above_threshold = r > threshold;

  % counting only points where diff(above_threshold) > 0

  % counts the number of contiguous blocks of samples above threshold

  interval_count = diff(above_threshold) > 0;

  global_count = global_count + interval_count;

end

This may be a convenient way of tracking the overall count, but this mechanism imposes significant constraints on how count_spikes could be used. Now imagine a set of extracellular recordings over time intervals for multiple sites, made simultaneously, stored in an interleaved fashion (i.e., site 1 for interval 1, site 2 for interval 1, site 3 for interval 1, site 1 for interval 2, site 2 for interval 2, site 3 for interval 2, etc.). If the intervals are processed in order, the global count will include spikes above thresholds at all three sites instead of counting the total spikes at each site separately.

Outside of limitations imposed by complex usage patterns, the use of global variable global_count could also limit how other functions are used in the same project. Since all global variables share a workspace, any other function that uses a global variable named global_count could disrupt the accumulation of results in global_count.

Modifying global variables in a function is a specific case of what is termed a side effect. A side effect is any change in the run time state outside the scope of a function. Sometimes, side effects are absolutely necessary. For example, printing text to the screen and writing to a file would both be considered side effects; such actions change the console (printing) or file system (writing to a file). Often, side effects are unnecessary, as in the previous example. Tracking all spikes at a given site is best left to the caller of count_spikes, as no information about the site context is provided to count_spikes.

4.3.5 Objects

Object-oriented programming (OOP) has been a popular programming paradigm for more than a decade. The fundamental kernel of object-oriented programming is the capability to package together cohesive units of code and data as “objects,” entities that can be manipulated programmatically. Objects can refer to physical real world objects or can be highly abstract concepts. Most programming languages in common use support some mechanism for object-oriented programming, and MATLAB is no exception.

Object-oriented programming provides a mechanism for separating concerns, reducing coupling, and increasing cohesion. First, the object-oriented paradigm allows otherwise difficult to extract bits of code from multiple routines to be grouped together logically. Secondly, large, relatively inflexible function parameter sequences can be replaced by one or more flexible objects. In this case, the same data is passed between functions, but the semantics of objects allow many types of changes to objects without requiring alterations to those objects’ users, which reduces coupling. Thirdly, by grouping related bits of code from throughout the system into logical units, the cohesion of the routines from where the bits of code originated improves.

Beyond the capacity to create and manipulate objects, there is no strict set of features that compose the object-oriented programming paradigm. Programming languages differ significantly on the functionality that their respective object models provide. Even MATLAB supports two different object models, with varying functionality. Within that variety of models, the following features are strongly associated with the object-oriented programming paradigm, and many object models support a majority if not all of them:

Encapsulation: the grouping together of data and relevant code in cohesive units

Data hiding: limiting access to data or executable routines related to functionality internal to the object

Inheritance: allowing “descent of objects”; objects can be defined as descendants of other objects, gaining their data and executable code

Subtype polymorphism: functioning as objects of a parent type in places where an object of the parent type are expected

Dynamic dispatch: the capability to differentiate among multiple implementations of a routine at runtime depending on the identity of an object

As mentioned earlier, MATLAB has supported two separate object models. The more recently introduced object model, available in MATLAB R2008a and later, provides for all the features listed above. Only this object model will be discussed here.

4.3.5.1 Creating Objects

Under the MATLAB object model, one specifies the data held by an object and associated routines in a class definition. These data are called member variables or properties, and the executable routines are called methods. This terminology is fairly standard among object models. Once a class is defined, any number of objects (limited by memory, of course) can be created from that class through a process called instantiation. Each object can hold its own copy of the member variables and operate on them independently of objects of the same class.

To illustrate the benefits of object-oriented programming, we will create a set of objects that will present a unified model for locating auditory recordings, regardless of the underlying format representation. At this time, we anticipate that other code that analyzes the auditory recordings will require a sampling rate and a time stamp for the start of the file in addition to the raw data of the recording.

The initial class we define will provide a basic interface for all recordings for obtaining raw data, sampling rate, and time of recording. Here is code for this initial class.

classdef recording

 properties

  filename

 end

 methods

   function obj = recording(filename)

        obj.filename = filename;

   end

   function t = timestamp(obj)

        % get the time stamp by getting the date from dir

        d = dir(obj.filename);

        t = d.date;

   end

   function r = sample_rate(obj)

        r = -1;

   end

   function d = raw_data(obj)

        d = [];

   end

 end

end

The class definition begins with the reserved word classdef. Like most statements that introduce blocks in MATLAB, classdef has a matching end. Within classdef, there are properties and methods sections. We’ll discuss the methods section in a moment. Names for data managed by the class are specified in the properties section. In this case, a property called “filename” is defined.

The methods section contains the executable routines specific to the class. Object-oriented programming has a number of terms to describe the subtly different invocation of functions in the context of objects. Functions bound to objects and operating on them are called methods. Outside of MATLAB, data held by objects are often called members or member fields (MATLAB calls them properties).

Examining the methods, one will quickly discover that all but the first method have an initial parameter, obj. This initial parameter is the object being referenced. This should be fairly clear in the implementation of timestamp(). The code in timestamp() obtains the name of the file through the filename property of the referenced object, which is then used to locate the date through the dir() function. With the exception of this initial parameter, methods operate similarly to normal functions.

Now we look at the first method. This method has the same name as the class as specified after the classdef statement. That marks this method as a special method, called a constructor. A constructor is eponymous with the class and includes initialization code to be executed when the object is constructed. The parameter list of the constructor lists all the parameters required to create an instance object of the class. Unlike the other methods, there no is referenced object as a first parameter, since the object has not been created yet. Instead, the constructor has the output parameter that appears to go unassigned. It is this variable that holds the newly created object during the invocation of the constructor. To initialize our recording object, the value of the filename variable passed into the constructor must be copied into the property of the same name in obj. This may seem unnecessary, but the two variables named filename are entirely distinct and live in entirely different scopes, one within the class recording, and one as a local variable in the constructor for recording.

To create a recording object, type the following:

>> r = recording(‘test.wav’);

Look at the time stamp:

>> r.timestamp()

Note that the methods to load the data and return the sample rate are unimplemented:

>> r.sample_rate()

>> r.raw_data()

Creating implementations for these methods are the focus of the next section.

4.3.5.2 Inheritance

At this stage, it would be helpful to be able to load a sound file. The example below shows code for a wav_recording class, which loads WAV files. The code for the wav_recording class is fairly similar to the recording class, with a few differences.

classdef wav_recording < recording

 methods

  function obj = wav_recording(filename)

        obj = obj@recording(filename);

  end

  function r = sample_rate(obj)

        [data, r] = wavread(obj.filename);

  end

  function data = raw_data(obj)

        [data, r] = wavread(obj.filename);

  end

 end

end

The most noticeable difference is the less than sign and “recording” at the beginning of the class definition, immediately after the class name. These denote that the wav_recording class should inherit from recording. This inheritance means that objects of the wav_recording class have their own copies of the properties and methods in recording. Through inheritance, all objects of wav_recording are also objects of recording. Note the constructor. The unusual function call in the constructor references the constructor of the parent class, recording. Calling the constructor in recording ensures that the filename input parameter in the wav_recording constructor will be copied to the filename property during the execution of the constructor of recording. Also, note the absence of the timestamp method here in wav_recording. Since the functionality provided in the parent class is sufficient, there is no need to override it here.

>> r = wav_recording(‘test.wav’)

>> r.timestamp()

The inheritance is also clear in the implementations of sample_rate() and raw_data(). In these methods, the filename property of the object is referenced, and this relies upon the definition of the parent class recording.

Try obtaining the sample rate:

>> r.sample_rate()

>> plot(r.raw_data())

The previous functionality is still available, simply by instantiating a recording object:

>> r = recording(‘test.wav’)

>> r.sample_rate()

>> plot(r.raw_data())

The capability of the MATLAB interpreter to choose the proper method based on the class identity of the object is called dynamic dispatch. For dynamic dispatch to work properly, the method name and input parameter lists must be the same throughout the class hierarchy.

Now, we will add support for PCM audio files. PCM (pulse code modulation) is a simple file format that stores digitized samples as 16 bit integers. Unlike WAV files, PCM files include only data, and the sample rate must be stored elsewhere (e.g., in experimental notes or in a separate file). Because of this, we will include the sample rate as a parameter on the constructor. Our PCM reading–recording class will also require PCM-specific implementations of sample_rate() and raw_data(), as did the WAV reading class. Here is code for a PCM reading class:

classdef pcm_recording < recording

 methods

   function obj = pcm_recording(filename, sample_rate)

        obj = obj@recording(filename);

        obj.sample_rate = sample_rate;

    end

   function r = sample_rate(obj)

        return obj.sample_rate;

    end

   function data = raw_data(obj)

        fid = fopen(obj.filename, ‘r’);

        if fid == -1

           error(‘Unable to open file’ + obj.filename);

           data = [];

           return

        else

           data = fread(fid, inf, ‘uint16=>double’, 0, ‘l’);

           fclose(fid);

        end

   end

 end

end

Note that the sample_rate() method does nothing with the file, but it returns the sample rate specified through the constructor.

Try the following:

>> r = pcm_recording(‘test.pcm’, 20000)

>> r.sample_rate()

>> r = pcm_recording(‘test.pcm’, 40000)

>> r.sample_rate() % note 40 kHz rate now

One of the major benefits of working under an object-oriented paradigm is the ability to write code that works under a variety of cases and that, at the same time, isolates those cases in separate pieces of code. This strongly promotes both high cohesion and lower coupling. The example ahead shows a set of functions that scan for events above a threshold and report on their threshold crossing times.

function dates = threshold_crossings(wav_filename, start_time)

  % Input parameters

  % wav_filename : filename for WAV file

  % start_time : start time of the WAV recording

  raw_data, sampling_rate = wavread(wav_filename);

  above = raw_data > threshold;

  threshold_crossings = diff(above) == 1;

  threshold_times = find(threshold_crossings);

  % threshold_times is the sample count since start_time

  % for each threshold crossing

  threshold_sec = threshold_times / sampling_rate;

  dates = zeros((length(threshold_sec), 1));

  for ii = 1:length(threshold_sec)

   dates[ii] = addtodate(datenum(start_time, threshold_sec, ‘second’));

  end

end

At the moment, this function works only for WAV files. We can change threshold_crossings() to work with our wav_recording objects with a small amount of work. The next example shows threshold_crossings() modified to use recording objects. In making the change, we reduce coupling somewhat, as threshold_crossings now only requires a single input parameter, even though new constraints are placed on that parameter (it must support raw_data, sampling_rate, and start_time methods). Cohesion is improved as well, as the WAV loading code is no longer in threshold_crossings().

function dates = threshold_crossings(rec)

  % Input parameters

  % rec : audio recording object

  raw_data = rec.raw_data();

  sampling_rate = rec.sampling_rate();

  start_time = rec.start_time();

  above = raw_data > threshold;

  threshold_crossings = diff(above) == 1;

  threshold_times = find(threshold_crossings);

  % threshold_times is the sample count since start_time

  % for each threshold crossing

  threshold_sec = threshold_times / sampling_rate;

  dates = zeros((length(threshold_sec), 1));

  for ii = 1:length(threshold_sec)

    dates[ii] = addtodate(datenum(start_time, threshold_sec, ‘second’));

  end

end

One substantial benefit in making the change is near effortless support of PCM files obtained through data hiding and encapsulation. Since all the code specific to PCM loading is isolated in the PCM class, we can safely make changes in threshold_crossings() to support generic recordings without worrying about PCM support within threshold_crossings().

Exercise 4.3

Write play_recording from Exercise 4.2 as a method on the recording class.

Exercise 4.5

Generalize the class written for Exercise 4.4 to work with any HDF5 file in which a dataset containing a vector of values and an associated attribute with sampling rate will work, regardless of the dataset location within the HDF5 file or the name of the sampling rate attribute. (Hint: The names of these two keys will need to be specified at object instantiation, in the constructor!) Try your solution on test_audio3.hdf5, which has audio data at /audio/recording2 and sampling information at “samples_per_sec” on /audio/recording2.

4.3.5.3 Passing Objects Around: The Handle Class

Much like other types of variables, MATLAB objects are copied in and copied out during function calls. Here is an example that demonstrates this phenomenon for a vector:

function x = no_change(in_vector, index, new_value)

  in_vector(index) = new_value;

end

Type the following:

>> x = 1:5;

>> no_change(x, 4, 2)

>> x

x =

  1 2 3 4 5

For the change to be permanent, the input parameter must be moved to the output parameter, as in the following:

function x = change(in_vector, index, new_value)

  in_vector(index) = new_value;

  x = in_vector;

end

>> x = 1:5;

>> y = change(x, 4, 2);

>> x

x =

  1 2 3 4 5

>> y

y =

  1 2 3 2 5

While the original input x is still unchanged, the modification does leave the function as an output, which is copied into the variable y. Objects operate much the same. This is even the case for methods that modify properties. To preserve the change, the modified object must be copied out:

classdef  example

  properties

    name

 end

 methods

   function change_name1(obj, new_name)

        obj.name = new_name

   end

   function change_name2(obj, new_name)

        obj.name = new_name;

   end

 end

end

>> r = example;

>> r.change_name1(‘new name’);

>> r

r =

example

Properties:

name: []

Methods

>> r.change_name2(‘new name’);

>> r

r =

example

Properties:

 name: []

 Methods

>> r = r.change_name2(‘new name’);

>> r

r =

 example

 Properties:

 name: ‘new name’

 Methods

>> r2 = r

r2 =

 example

 Properties:

  name: ‘new name’

 Methods

>> r2.name = ‘testing’;

>> r2

r2 =

 example

 Properties:

  name: ‘testing’

 Methods

>> r

r =

 example

 Properties:

  name: ‘new name’

 Methods

Only passing the modified object out and storing the result in allows for the method to change the property value. Also note that assigning r to a new variable r2 causes the object held by r to be copied. The result is a new object in r2 distinct from that held by r.

The situation may arise where this copying of objects is undesirable. This often is the case when objects are used to manipulate the state of a non-duplicating resource, such as a file reference or a graphical object. Another case in which these semantics are undesirable occurs when creating objects where the internal state of the object could change without the awareness of the invoking code. The WAV reading class written earlier illustrates an example of this later use.

Because the MATLAB function wavread() always reads the whole WAV file, even if only the sampling rate is needed, invoking wav_recording.sample_rate() will still read the entire WAV file into memory. Moreover, invoking wav_recording.raw_data() immediately afterwards reads the WAV file into memory again. Ideally, the contents of a file could be stored away as a property when read. Whenever the data was requested through the raw_data() method, the previously read contents of the file could be returned if available, saving the overhead of an extra read. Unfortunately, this would require the parent class’ raw_data() to return the current object as a parameter in addition to the read data. This also makes for a messier call, since the call would be something like

[data, r] = r.raw_data()

Fortunately, MATLAB provides an alternate method of passing objects. Under this alternate object passing mechanism, variables refer not to objects directly, but to handles of objects. With this paradigm, multiple variables can “point” to the same object, and the normal copying of variables during function calling only copies a handle. To specify that a class use this alternate passing mechanism, classes must inherit from handle. Open up the example class above, and add “< handle” to the end of the first line. Change the class name to example2, and save as example2. The first two lines of example2 should be:

classdef example2 < handle

  properties

.

.

.

Then, type

>> r = example2;

>> r.change_name1(‘new name’);

>> r

r =

 example2 handle

 Properties:

  name: ‘new name’

 Methods, Events, Superclasses

>> r2 = r

r2 =

 example2 handle

 Properties:

  name: ‘new name’

 Methods, Events, Superclasses

>> r2.name = ‘testing’

r2 =

 example2 handle

 Properties:

  name: ‘testing’

 Methods, Events, Superclasses

>> r

r =

 example2 handle

 Properties:

  name: ‘testing’

 Methods, Events, Superclasses

Invoking change_name1 on r modified r itself. Likewise, r and r2 refer to the same object, so changes made using r are visible when examining the object through r2.

For completeness, here is a class wav_recording2 that implements the caching discussed earlier. A given file is only read once, regardless of how many times sample_rate() or raw_data() is invoked. The parent class is recording2, which is identical to recording except that it inherits from handle. A listing of the first few lines follows the listing for wav_recording2. If a class inherits from handle, then all parent classes must as well.

classdef wav_recording2 < recording2

  properties

    stored_data = [];

    stored_rate = -1;

  end

  methods

    function obj = wav_recording2(filename)

           obj = obj@recording2(filename);

   end

   function r = sample_rate(obj)

        if obj.stored_rate >= 0

           r = obj.stored_rate;

         else

           [data, r] = wavread(obj.filename);

           obj.stored_data = data;

           obj.stored_rate = r;

         end

  end

  function data = raw_data(obj)

        if ~isempty(obj.stored_data)

           data = obj.stored_data;

        else

           [data, r] = wavread(obj.filename);

           obj.stored_data = data;

           obj.stored_rate = r;

        end

  end

 end

end

classdef recording2 < handle

 properties

  filename

 end

.

.

.

One last note about objects descending from handle: Creating objects derived from handle causes allocation of memory that is not automatically cleaned up when the variable is cleared or goes out of scope. Clearing a variable holding an object derived from handle only removes the reference to the object. To remove the object itself, delete must be used. Since delete removes the object itself from memory, other references to the same object automatically become invalid after deletion.

>> r = recording2(‘test.wav’);

>> r2 = r;

>> delete(r)

>> r.filename

??? Invalid or deleted object.

>> r2.filename

??? Invalid or deleted object.

>> r2

r2 =

deleted recording2 handle

Methods, Events, Superclasses

Unfortunately, in addition to deleting handle-derived objects, delete can also be used to remove files when used as a command. To avoid inadvertently deleting files, always be sure to use the functional form of delete (i.e., with parentheses).

4.3.5.4 Summary

MATLAB’s object model supports much, much more than what is touched on here, including access control, events, and complex inheritance patterns. For simple data analysis, simple representation of data as matrices is suitable. Object-oriented programming provides a tool for organizing larger efforts that promotes maintainability and minimizes code duplication. Object-oriented programming is particularly suited to GUI programming, where the programmatic objects are a natural analog of the controls and other visual entities on the screen such as the mouse cursor or menus.

4.4 Taming Errors

4.4.1 An Introduction to the Debugger

At some point, despite great care in design and implementation, errors will rear their ugly heads. One of the greatest tools for eradicating errors is a debugger.

A debugger allows for running MATLAB code in an environment where the program state (e.g., variable value, interpreter location, etc.) can be explicitly controlled. The following code shows a naïve implementation of a factorial function.

function f = factorial2(g)

if g == 1

 f = 1;

 return

end

f = g * factorial2(g-1);

end

Typing factorial2(5) yields the expected 120. Try typing factorial2(5.1). Clearly, this behavior is not desirable. In this simple example, finding the error by inspecting the code alone is entirely plausible. That same simplicity also argues for this as a good example for demonstrating the debugger.

The easiest way to invoke the debugger on a function like factorial2 is to open the function in the editor and add a breakpoint in the editor. A breakpoint denotes a location in the code where the MATLAB interpreter will always stop. Not all lines can support a breakpoint. In the editor, lines where a breakpoint can be placed will have a horizontal line to the left of the code. Clicking on that horizontal line to the left of the text places a breakpoint at that line. Clicking again removes the breakpoint. For this example, place a breakpoint on the line “if g==1.” Figure 4.2 illustrates the editor/debugger window with a breakpoint set.

With breakpoint in place, type factorial2(5.1) again. MATLAB should reposition the editor/debugger window to the front and place an arrow at the breakpoint. The command line prompt should also change. The MATLAB interpreter is now inside the function.

The current workspace is not the command line (Base), but the current factorial scope. Because the interpreter is in the scope of the current invocation of the function, variable values can be inspected. Typing “g” shows the value of g in the current scope (5.1). One can also observe the value of g in the workspace window. In addition to the g in the current scope, the workspace window allows viewing variables in other scopes as well. Selecting a different scope in the scope dropdown will cause all the variables in that scope to be displayed.

Selecting “Continue” from the Debug menu in the editor/debugger window will resume execution until encountering the next breakpoint. Note the contents of the stack list box after continuing. The stack dropdown allows selecting a specific scope (i.e., stack frame) in which to operate (see Figure 4.3). In the command window, type

>> g

After observing the value of g, select a different frame from the dropdown and inspect the value of that frame’s g value by typing

>> g

Are they the same?

After continuing four more times (a total of five), inspect the value of “g.”

The variable “g” should be 0.1. Continuing again and inspecting g will reveal the error; g becomes negative. From this, it should be clear that the termination condition for the factorial2 function in the if statement is not specific enough. Checking whether the input is exactly 1 will miss any non-integer argument. At this point, the error could be addressed in a number of fashions, depending on the desired functionality when non-integers are specified as the initial argument.

Change the value of “g” to 1 from−0.9. This can be done by typing g=1 at the command line or by clicking on “g” in the workspace window. After setting g to 1, continue executing the function by selecting “Continue” from the debug menu in the editor/debugger window. From this point forward in the execution, the value of g at that scope will be 1. Since the next line after the breakpoint is the if statement comparing the value of g to 1, the invocation of factorial2 at the level where g was modified quickly terminates and returns a value of 1 to the previous calling level.

Again, type factorial2(5.1), and continue until g is negative. Before continuing again, set the value of g to 1. Before continuing, place a breakpoint on the last end of the function. Now, the MATLAB interpreter will stop after calculating each part of the factorial. After continuing, examine the value of g. Is it what you expect? It is important to remember that the variable “g” at each scope is a distinct and separate variable.

In addition to continuing after a breakpoint, the debugger allows stepping through code line by line. Type factorial2(3). When the interpreter reaches the first breakpoint, step one line by selecting “step” from the debug menu. Continue stepping through the function until the final answer is calculated. Stepping through line by line demonstrates how the calculation invokes a series of calls that only end once termination condition is reached. Normally, step does not enter called functions. To enter a called function, step in can be used instead of step. Likewise, when inside a called function, step out will return to the calling function.

Finally, when finished, clear the breakpoints, either by clicking on the breakpoints in the editor or by selecting “clear all” from the debug menu in the editor window.

4.4.2 Logging

In larger programs, running under the debugger may not be feasible. Larger programs have more complex states, and it may not be possible to duplicate the bug within the debugger. For such cases, logging may be an appropriate methodology for tracking down bugs. Logging is simply printing out the internal status of the program. Usually, the log will be written to an external file to preserve the record for later debugging.

function f = factorial2(g, log_file)

fprintf(log_file, ‘Entering factorial2(), g=%d’, g)

if g == 1

 f = 1;

 return

end

f = g * factorial2(g-1);

end

This example has been simplified to demonstrate logging. This example requires the log file to remain open for the running of the program. Ideally, one would want the file only open when the log was being updated. To realize this, the file name could be passed instead of an open file, but that would require error-handling code in every function using the log. A better approach would be to create a logging object, capable of maintaining a file name and encapsulating all the file handling code. This approach improves the cohesion of the factorial2() function, as only the portion of the logging relevant to supplying factorial2’s behavior would remain in factorial2(). The code ahead demonstrates such a class and its usage (post-R2008a semantics).

classdef logger

  % Provides simple logging functionality.

  % To create, use the constructor with a filename.

  % Log entries are appended to the end of the file, with each

  % entry added on its own line.

  properties

    filename;

  end

  methods

    function obj = logger(filename)

          obj.filename = filename;

    end

    function message(obj, msg_str)

          % Outputs a message to the logging file.

          % Messages should be a string.

          % Usage:

          % log.message(‘Within function message()’)

          fid = fopen(obj.filename, ’a+’);

          if fid == -1

            warning(‘Unable to open log file’)

            return

          end

          fprintf(fid, ‘%s\n’, msg_str);

          fclose(fid);

    end

 end

end

function f = factorial2(g, log)

 log.message(sprintf(‘Entering factorial2(), g=%d’, g));

 if g == 1

  f = 1;

  return

 end

 f = g * factorial2(g-1);

end

>> log = logger(‘factorial-log.log’);

>> f = factorial2(g, log);

4.4.3 Edge Cases and Unit Testing

Greater modularity and tighter cohesion lend themselves to simpler testing. With modular code, small portions of a larger program can be isolated and tested in a rigorous fashion. Such automated testing of smaller logical components through isolating them from the other parts of the program is called unit testing.

When unit testing, one wants to verify that all possible inputs have an expected result. Since testing all possible inputs is not feasible (for a single two element vector alone, this is 2128 different possibilities!), unit testing focuses on what are termed edge cases: those states or sets of parameters on the edges of different parameter spaces. Because these edge cases are usually the boundary between qualitatively different types of functional behavior, these types of inputs are often likely to evoke erroneous behavior because these types of values are often unplanned for. To illustrate the selection of edge cases, we will use the factorial2() function from previous sections:

function f = factorial2(g)

if g <= 1

 f = 1;

 return

end

f = g * factorial2(g-1);

end

If the input is limited to scalars, the behavior is likely to differ for integers (positive and negative), reals (positive and negative), 0, and 1. Thus, good edge cases would include−1, 0, 1, positive values and negative distant from 0, and small real values. Each test case should test a single edge case. Test cases should invoke the function with known inputs and compare the result to the expected output. Errors are acceptable as long as the error is the expected output for the function, given the input.

Here is a script with test cases for a−1, 0, 1, and a sequence of positive integer inputs.

% test 1

if factorial2(-1) != -1

  error(‘incorrect output for test 1: negative numbers’);

end

% test 2

if factorial2(0) != 1

  error(‘incorrect output for test 2: 0’);

end

% test 3

if factorial2(1) != 1

  error(‘incorrect output for test 3: 1’);

end

% test 4

a = [2 3 4 5];

a = factorial(a);

a2 = [2 3 4 5];

for ii = 1:length(a)

 a2(ii) = factorial2(a2(ii));

end

if a2 ~= a

 error(‘incorrect output for test 4: positive value check’);

end

Exercise 4.6

As coded previously, factorial2() does not pass the test cases. Modify factorial2() to pass the test cases.

Exercise 4.7

Add test cases to the test script for other edge cases (reals, negative integers).

Even though the test script demonstrated clear deficiencies in the existing factorial2() function, one of the benefits of a set of unit tests is quickly capturing bugs observed after the initial design or implementation. Once an aspect of a unit’s functionality is captured in a test script, regressions in that aspect can be spotted quickly. For example, if someone later modifying the code changed the termination condition from g <=1 to g==1, executing the unit test script would identify the error immediately. Unit tests capture the expected behavior of a function and allow divergences to be identified much more quickly than embedded in a large program.

Exercise 4.8

The following code thresholds a raw recording, separates the event set into trials, and sorts the events to generate data for a PSTH (peri-stimulus time histogram).

function psth,bins = bin_for_psth(raw_data, sampling_rate, threshold, trial_count, trial_length, bin_size)

 % Locates events above threshold in raw data and generates PSTH from

 % multi-trial recording. Trials should be contiguous.

 %

 % Input parameters

 %  raw_data:     raw input data

 %  sampling_rate:  sampling rate in Hz

 %  threshold:     threshold for events, in same units as raw data

 %  trial_count:    number of contiguous trials

 %  trial_length:   length of each trial, in seconds

 %  bin_size:     size of each PSTH bin, in seconds

 %

 % Output parameters

 %  psth:       count in each bin

 %  bins:       center position of each bin, in seconds relative to

 %    trial start

 % first, threshold signal

 events = raw_data > threshold;

 % only positive threshold crossings (not sustained activity above threshold)

 events = diff(events) == 1;

 events = [0 events];

 % now, split into trials

 events = reshape(events, trial_count, trial_length*sampling_rate);

 % events should be MxN, where M = trial and N = sample

 summed_events = sum(events);

 % summed_events should be the sum of events at each sample relative to

 % the start of the trial

 max_event_count = max(summed_events);

 for count = 0:max_event_count-1

  above_count = find(summed_events > count);

  event_offsets = [event_offsets above_count];

 end

 % event_offsets should be the offset in sample counts where events occur

 event_times = event_offsets / sampling_rate;

 [psth, bins] = hist(event_times, trial_length/bin_size);

End

Determine edge cases for testing the input parameters of bin_for_psth, above.

4.4.4 A Few Words about Precision

Like most quantitative software, MATLAB does not represent most real numbers exactly. Where this representation fails to capture values exactly is often a source of bugs in quantitative code. This section discusses how MATLAB represents real numbers so that such problems can be diagnosed and avoided.

MATLAB labels the specific representation of every variable, and this is visible in the workspace. You may have noticed that nearly all variables have the label “double.” This representation is the default representation type for values in MATLAB. Double here is in deference to single precision floating point, a lesser used floating point representation that consumes half the memory. Floating point representations are so called because the representation does not fix the number of digits on either side of the decimal point. Since the standards body IEEE (Institute of Electrical and Electronics Engineers) oversees the specification of this format, it is commonly known as IEEE 754 or 64 bit IEEE floating point. Similarly to MATLAB, most quantitative software uses this format for representing real numbers (Figure 4.4).

The sign bit denotes whether the number as a whole is positive or negative. The exponent is a base 2 number biased by 210−1, or 1023. The representation of exponents are 1023 plus the exponent’s value. This system allows for exponents in the range −1023 to 1023.

The representation of the mantissa is the most complex portion of this standard. The digits in the mantissa represent a binary fraction, where each successive digit represents a successive fractional power of 2. Additionally, the mantissa is the fractional part of the number; there is a 1 implicit in the number not represented in the format.

Here is an example to illustrate how a decimal floating point number is represented internally by MATLAB.

Take 15.1875.

In base 2, 15 is 11112. As a binary fraction, 0.1875 is 0.00112. (0.1875 is 3/16, or 1/8 + 1/16, or 0*1/2+0*1/4+1*1/8+1*1/16.) In total, 15.1875 is 1111.00112. This must be converted to binary exponential notation: 1.11100112×23. To save space, the IEEE format assumes a leading one, so we must do the same. To store this as in double precision format, we need to discard the initial 1 from the mantissa and bias the exponent (Figure 4.5).

Why is this important? MATLAB can only represent a small subset of real numbers with absolute precision. These numbers are those whose fractional part is a sum of fractional powers of 2. For example, 7/16 can be perfectly represented (1/4 + 1/8 + 1/16), but 7/17 cannot. Understanding the limitations of floating point representation can also help to diagnose difficult to see errors.

One such error is testing for equality with floating point values. This often occurs when testing that a variable is zero or one valued. For example, the test below is attempting to verify that the variable x is 0:

if x == 0

However, this will often not work if x is the result of extensive calculations. When testing for zero, it is usually best to check a range around zero because the value is likely some extremely small floating point number rather than exactly zero:

if abs(x) < 1.0e-6

This scenario most often happens when checking for equality with zero, but any comparison involving floating point values and an arbitrary value, such as 0 or 1, should use a small interval instead.

Another such error occurs with operations on two values of extremely different magnitudes. The mantissa portion of the IEEE format has only 52 bits of precision. Thus, values differing by more than 253 cannot be reliably added. This example demonstrates the problems inherent with sums of large and small magnitude numbers.

>> format compact

>> format long

>> 2^52

ans =

4.503599627370496e+15

Note that this is exactly 252; there are no values hidden from view. To demonstrate, we can add one:

>> 2^52 + 1

ans =

4.503599627370497e+15

This cannot be done for 253:

>> 2^53

ans =

9.007199254740992e+15

>> 2^53+1

ans =

9.007199254740992e+15

The result is identical to the original value, 253. It is important to note that this is not the result of the unit’s place being hidden from view. The following demonstrates that, under MATLAB, 1+253 is equal to 253. The equivalent example with 252 is shown for comparison.

>> 2^53 == 2^53 + 1

ans =

1

>> 2^52 == 2^52 + 1

ans =

0

Similar problems occur when multiplying or dividing exceedingly small numbers. For example:

>> 10^(−200) * 10^(−100) * 10^(−100)

ans =

0

>> 10^(−200) * 10^(−100) * 10^(−100) * 10^(300) * 10^(300)

ans =

0

Rearranging the terms yields the correct answer:

>> 10^(−200) * 10^(300) * 10^(−100) * 10^(−100) * 10^(300)

ans =

1.000000000000000e+200

Such problems can be avoided by carefully considering the magnitudes of the values in the calculation. When adding or subtracting terms of varying magnitudes, keeping the large and small values separated as distinct terms as long as possible often avoids this problem. Moving the calculation to logarithms is particularly effective for problematic multiplication or division.

4.4.5 Suggestions for Optimization

Occasionally, the situation arises where code does produce the expected result, but the code does so too slowly. In such cases, the code can be optimized. Optimization here means the rewriting of a portion of the code to improve the performance of the overall program. Since optimization involves scrutinizing working code, it is best to limit substantial optimization efforts to known hot paths—places in a program where the MATLAB interpreter spends a substantial proportion of the execution time.

Identifying hot paths from source code is difficult and quite error prone for larger pieces of code. Before engaging in a substantial optimization effort at a poorly performant site in the code, it is best to verify that the site is actually the cause of the perceived performance problem. This can be done by timing experiments with tic/toc or using the MATLAB profiler. Once identified, addressing efficiencies in the hot paths of a program can yield substantial returns.

4.4.5.1 Vectorizing Matrix Operations

MATLAB is particularly efficient in executing matrix operations relative to the same operations. Taking full advantage of matrix operations in code often doesn’t occur when first learning MATLAB, as the syntax is not as straightforward. This and the following sections offer suggestions for moving common non-matrix operations to matrix form. Code transformations of this type are called vectorization, as the type of matrix operations MATLAB offers are termed vector operations. (The use of vector to characterize MATLAB matrix operations indicates multi-valued operations, in deference to scalar (single-valued) operations, and does not refer to the mathematical objects operated on.)

A primary benefit of vectorizing code is a potential speed up in execution with a substantial change to the larger algorithm in the code. Here is an example contrasting two different approaches to adding matrices.

A = ones(4, 4); * 3; % matrix of threes

B = ones(4, 4); * 6; % matrix of sixes

C = zeros(4, 4);

for ii = 1:4

 for jj = 1:4

  C(ii, jj) = A(ii, jj) + B(ii, jj);

 end

end

or

A = ones(4, 4); * 3; % matrix of threes

B = ones(4, 4); * 6; % matrix of sixes

C = zeros(4, 4);

C = A + B;

While both pieces of code accomplish the same task, the second executes measurably faster. Note that the second snippet avoids the nested for loops.

Understanding why these two bits of code execute so differently requires a brief explanation of how MATLAB evaluates code. Individual operations in MATLAB execute as compiled machine code, at high speed. For example, the matrix addition in the second code section executes in this manner.

However, in the case of the first example, evaluation of the inner statement alone requires evaluating each of the two index variables, three matrix lookups, a scalar addition, and storing the scalar result. In between operations, the interpreter must be constantly consulted to determine the next step.

MATLAB Functions, Commands, and Operators Covered in This Chapter

which

global

classdef

delete

clear

dbstack