Overview of Object-Oriented Programming in R

Object-oriented programming is not the same thing as programming with objects. R is a very object-centric language; everything in R is an object. However, there is more to OOP than just objects. Here’s a short description of what object-oriented programming means.

As an example of how object-oriented programming is used in R, we’ll consider time series.[28] A time series is a sequence of measurements of a quantity over time. Measurements are taken at equally spaced intervals. Time series have some properties associated with them: a start time, an end time, a number of measurements, a frequency, and so forth.

In OOP, we would create a “time series” class to capture information about time series. A class is a formal definition for an object. Each individual time series object is called an instance of the class. A function that operates on a specific class of objects is called a method.

As a user of time series, you probably don’t care too much about how time series are implemented. All you care about is that you know how to create a time series object and manipulate the object through methods. The time series could be stored as a data frame, a vector, or even a long text field. The process of separating the interface from the implementation is called encapsulation.

Suppose that we wanted to track the weight history of people over time. For this application, we’d like to keep all the same information as a time series, plus some additional information on individual people. It would be nice to be able to reuse the code for our time series class for objects in the weight history class. In OOP, it is possible to base one class on another and just specify what is different about the new class. This is called inheritance. We would say that the weight history class inherits from the time series class. We might also say that the time series class is a superclass of the weight history class and that the weight history class is a subclass of the time series class.

Suppose that you wanted to ask a question like “What is the period of the measurements in the class?” Ideally, it would be nice to have a single function name for finding this information, maybe called “period.” In OOP, allowing the same method name to be used for different objects is called polymorphism.

Finally, suppose that we implemented the weight history class by creating classes for each of its pieces: time series, personal attributes, and so on. The process of creating a new class from a set of other classes is called composition. In some languages (like R), a class can inherit methods from more than one other class. This is called multiple inheritance.

If you’re familiar with object-oriented programming in other languages (like Java), you’ll find that most of the familiar concepts are included in R. However, the syntax and structure in R are different. In particular, you define a class with a call to a function (setClass) and define a method with a call to another function (setMethod). Before we describe R’s implementation of object-oriented programming in depth, let’s look at a quick example.

Let’s implement a class representing a time series. We’ll want to define a new object that contains the following information:

Clearly, some of this information is redundant; given many of the attributes of a time series, we can calculate the remaining attributes. Let’s start by defining a new class called “TimeSeries.” We’ll represent a time series by a numeric vector containing the data, a start time, and an end time. We can calculate units, frequency, and period from the start time, end time, and the length of the data vector. As a user of the class, it shouldn’t matter how we represent this information, but it does matter to the implementer.

In R, the places where information is stored in an object are called slots. We’ll name the slots data, start, and end. To create a class, we’ll use the setClass function:

> setClass("TimeSeries",
+   representation(
+     data="numeric",
+     start="POSIXct",
+     end="POSIXct"
+   )
+ )

The representation explains the class of the object contained in each slot. To create a new TimeSeries object, we will use the new function. (The new function is a generic constructor method for S4 objects.) The first argument specifies the class name; other arguments specify values for slots:

> my.TimeSeries <- new("TimeSeries",
+    data=c(1, 2, 3, 4, 5, 6),
+    start=as.POSIXct("07/01/2009 0:00:00", tz="GMT",
+                     format="%m/%d/%Y %H:%M:%S"),
+    end=as.POSIXct("07/01/2009 0:05:00", tz="GMT",
+                     format="%m/%d/%Y %H:%M:%S")
+ )

There is a generic print method for new S4 classes in R that displays the slot names and the contents of each slot:

> my.TimeSeries
An object of class "TimeSeries"
Slot "data":
[1] 1 2 3 4 5 6

Slot "start":
[1] "2009-07-01 GMT"

Slot "end":
[1] "2009-07-01 00:05:00 GMT"

Not all possible slot values are valid. We want to make sure that end occurs after start and that the lengths of start and end are both exactly 1. We can write a function to check the validity of a TimeSeries object. R allows you to specify a function that will be used to validate a specific class. We can specify this with the setValidity function:

> setValidity("TimeSeries",
+    function(object) {
+      object@start <= object@end &&
+      length(object@start) == 1 &&
+      length(object@end) == 1
+    }
+  )
Class "TimeSeries" [in ".GlobalEnv"]

Slots:

Name:     data   start     end
Class: numeric POSIXct POSIXct

You can now check that a TimeSeries object is valid with the validObject function:

> validObject(my.TimeSeries)
[1] TRUE

When we try to create a new TimeSeries object, R will check the validity of the new object and reject bad objects:

> good.TimeSeries <- new("TimeSeries",
+     data=c(7, 8, 9, 10 ,11, 12),
+     start=as.POSIXct("07/01/2009 0:06:00", tz="GMT",
+                      format="%m/%d/%Y %H:%M:%S"),
+     end=as.POSIXct("07/01/2009 0:11:00", tz="GMT",
+                      format="%m/%d/%Y %H:%M:%S")
+  )
> bad.TimeSeries <- new("TimeSeries",
+     data=c(7, 8, 9, 10, 11, 12),
+     start=as.POSIXct("07/01/2009 0:06:00", tz="GMT",
+                      format="%m/%d/%Y %H:%M:%S"),
+     end=as.POSIXct("07/01/1999 0:11:00", tz="GMT",
+                      format="%m/%d/%Y %H:%M:%S")
+  )
Error in validObject(.Object) : invalid class "TimeSeries" object: FALSE

(You can also specify the validity method at the time you are creating a class; see the full definition of setClass for more information.)

Now that we have defined the class, let’s create some methods that use the class. One property of a time series is its period. We can create a method for extracting the period from the time series. This method will calculate the duration between observations based on the length of the vector in the data slot, the start time, and the end time:

> period.TimeSeries <- function(object) {
+   if (length(object@data) > 1) {
+     (object@end - object@start) / (length(object@data) - 1)
+   } else {
+     Inf
+   }
+ }

Suppose that you wanted to create a set of functions to derive the data series from other objects (when appropriate), regardless of the type of object (i.e., polymorphism). R provides a mechanism called generic functions for doing this.[29] You can define a generic name for a set of functions (like “series”). When you call “series” on an object, R will find the correct method to execute based on the class of the object. Let’s create a function for extracting the data series from a generic object:

> series <- function(object) {object@data}
> setGeneric("series")
[1] "series"
> series(my.TimeSeries)
[1] 1 2 3 4 5 6

The call to setGeneric redefined series as a generic function whose default method is the old body for series:

> series
standardGeneric for "series" defined from package ".GlobalEnv"

function (object)
standardGeneric("series")
<environment: 0x19ac4f4>
Methods may be defined for arguments: object
Use  showMethods("series")  for currently available ones.
> showMethods("series")
Function: series (package .GlobalEnv)
object="ANY"
object="TimeSeries"
    (inherited from: object="ANY")

As a further example, suppose we wanted to create a new generic function called “period” for extracting a period from an object and wanted to specify that the function period.TimeSeries should be used for TimeSeries objects, but the generic method should be used for other objects. We could do this with the following commands:

> period <- function(object) {object@period}
> setGeneric("period")
[1] "period"
> setMethod(period, signature=c("TimeSeries"), definition=period.TimeSeries)
[1] "period"
attr(period,"package")
[1] ".GlobalEnv"
> showMethods("period")
Function: period (package .GlobalEnv)
object="ANY"
object="TimeSeries"

Now we can calculate the period of a TimeSeries object by just calling the generic function period:

> period(my.TimeSeries)
Time difference of 1 mins

It is also possible to define your own methods for existing generic functions, such as summary. Let’s define a summary method for our new class:

> setMethod("summary",
+   signature="TimeSeries",
+   definition=function(object) {
+     print(paste(object@start,
+                 " to ",
+                 object@end,
+                 sep="", collapse=""))
+     print(paste(object@data, sep="", collapse=","))
+   }
+ )
Creating a new generic function for "summary" in ".GlobalEnv"
[1] "summary"
> summary(my.TimeSeries)
[1] "2009-07-01 to 2009-07-01 00:05:00"
[1] "1,2,3,4,5,6"

You can even define a new method for an existing operator:

> setMethod("[",
+   signature=c("TimeSeries"),
+   definition=function(x, i, j, ...,drop) {
+     x@data[i]
+   }
+ )
[1] "["
> my.TimeSeries[3]
[1] 3

(As a quick side note, this works for only some built-in functions. For example, you can’t define a new print method this way. See the help file for S4groupGeneric for a list of generic functions that you can redefine this way, and Old-School OOP in R: S3 for an explanation on why this doesn’t always work.)

Now let’s show how to implement a WeightHistory class based on the TimeSeries class. One way to do this is to create a WeightHistory class that inherits from the TimeSeries class but adds extra fields to represent a person’s name and height. We can do this with the setClass command by stating that the new class inherits from the TimeSeries class and specifying the extra slots in the WeightHistory class:

> setClass(
+   "WeightHistory",
+   representation(
+     height = "numeric",
+     name = "character"
+   ),
+   contains = "TimeSeries"
+ )

Now we can create a WeightHistory object, populating slots named in TimeSeries and the new slots for WeightHistory:

> john.doe <- new("WeightHistory",
+   data=c(170, 169, 171, 168, 170, 169),
+   start=as.POSIXct("02/14/2009 0:00:00", tz="GMT",
+     format="%m/%d/%Y %H:%M:%S"),
+   end=as.POSIXct("03/28/2009 0:00:00",tz="GMT",
+     format="%m/%d/%Y %H:%M:%S"),
+   height=72,
+   name="John Doe")
> john.doe
An object of class “WeightHistory”
Slot "height":
[1] 72

Slot "name":
[1] "John Doe"

Slot "data":
numeric(0)

Slot "start":
[1] "2009-02-14 GMT"

Slot "end":
[1] "2009-03-28 GMT"

R will validate that the new TimeSeries object contained within WeightHistory is valid. (You can test this yourself.)

Let’s consider an alternative way to construct a weight history. Suppose that we had created a Person class containing a person’s name and height:

> setClass(
+   "Person",
+   representation(
+     height = "numeric",
+     name = "character"
+   )
+ )

Now we can create an alternative weight history that inherits from both a TimeSeries object and a Person object:

> setClass(
+   "AltWeightHistory",
+   contains = c("TimeSeries", "Person")
+ )

This alternative implementation works identically to the original implementation, but the new implementation is slightly cleaner. This implementation inherits methods from both the TimeSeries and the Person classes.

Suppose that we also had created a class to represent cats:

> setClass(
+   "Cat",
+   representation(
+     breed = "character",
+     name = "character"
+   )
+ )

Notice that both Person and Cat objects contain a name attribute. Suppose that we wanted to create a method for both classes that checked if the name was “Fluffy.” An efficient way to do this in R is to create a virtual class that is a superclass of both the Person and the Cat classes and then write an is.fluffy method for the superclass. (You can write methods for a virtual class but can’t create objects from that class because the representation of those objects is ambiguous.)

> setClassUnion(
+   "NamedThing",
+   c("Person", "Cat")
+ )

We could then create an is.fluffy method for the NamedThing class that would apply to both Person and Cat objects. (Note that if we were to define a method of is.fluffy for the Person class, this would override the method from the parent class.) An added benefit is that we could now check to see if an object was a NamedThing:

> jane.doe <- new("AltWeightHistory",
+   data=c(130, 129, 131, 128, 130, 129),
+   start=as.POSIXct("02/14/2009 0:00:00", tz="GMT",
+      format="%m/%d/%Y %H:%M:%S"),
+   end=as.POSIXct("03/28/2009 0:00:00", tz="GMT",
+      format="%m/%d/%Y %H:%M:%S"),
+   height=67,
+   name="Jane Doe")
> is(jane.doe,"NamedThing")
[1] TRUE
> is(john.doe,"TimeSeries")
[1] TRUE


[28] You may have noticed that I picked an example of a class that is already implemented in R. Time series objects are implemented by the ts class in the stats package. (I introduced ts objects in Time Series.) The implementation in the stats package is an example of an S3 class. We’ll talk more about what that means, and how to use S3 and S4 classes together, next.

[29] In object-oriented programming terms, this is called overloading a function.