Object-oriented programming is not the same thing as programming with objects. R is a very object-centric language; everything in R is an object. However, there is more to OOP than just objects. Here’s a short description of what object-oriented programming means.
As an example of how object-oriented programming is used in R, we’ll consider time series.[28] A time series is a sequence of measurements of a quantity over time. Measurements are taken at equally spaced intervals. Time series have some properties associated with them: a start time, an end time, a number of measurements, a frequency, and so forth.
In OOP, we would create a “time series” class to capture information about time series. A class is a formal definition for an object. Each individual time series object is called an instance of the class. A function that operates on a specific class of objects is called a method.
As a user of time series, you probably don’t care too much about how time series are implemented. All you care about is that you know how to create a time series object and manipulate the object through methods. The time series could be stored as a data frame, a vector, or even a long text field. The process of separating the interface from the implementation is called encapsulation.
Suppose that we wanted to track the weight history of people over time. For this application, we’d like to keep all the same information as a time series, plus some additional information on individual people. It would be nice to be able to reuse the code for our time series class for objects in the weight history class. In OOP, it is possible to base one class on another and just specify what is different about the new class. This is called inheritance. We would say that the weight history class inherits from the time series class. We might also say that the time series class is a superclass of the weight history class and that the weight history class is a subclass of the time series class.
Suppose that you wanted to ask a question like “What is the period of the measurements in the class?” Ideally, it would be nice to have a single function name for finding this information, maybe called “period.” In OOP, allowing the same method name to be used for different objects is called polymorphism.
Finally, suppose that we implemented the weight history class by creating classes for each of its pieces: time series, personal attributes, and so on. The process of creating a new class from a set of other classes is called composition. In some languages (like R), a class can inherit methods from more than one other class. This is called multiple inheritance.
If you’re familiar with object-oriented programming in
other languages (like Java), you’ll find that most of the familiar
concepts are included in R. However, the syntax and structure in R are different. In
particular, you define a class with a call to a function (setClass
) and define a method with a
call to another function (setMethod
). Before we describe R’s
implementation of object-oriented programming in depth, let’s look at a
quick example.
Let’s implement a class representing a time series. We’ll want to define a new object that contains the following information:
A set of data values, sampled at periodic intervals over time
A start time
An end time
The period of the time series
Clearly, some of this information is redundant; given many of the attributes of a time series, we can calculate the remaining attributes. Let’s start by defining a new class called “TimeSeries.” We’ll represent a time series by a numeric vector containing the data, a start time, and an end time. We can calculate units, frequency, and period from the start time, end time, and the length of the data vector. As a user of the class, it shouldn’t matter how we represent this information, but it does matter to the implementer.
In R, the places where information is stored in an object are
called slots. We’ll name the slots data
, start
, and end
. To create a class, we’ll use the setClass
function:
> setClass("TimeSeries", + representation( + data="numeric", + start="POSIXct", + end="POSIXct" + ) + )
The representation explains the class of the object contained in
each slot. To create a new TimeSeries
object, we will use the new
function.
(The new
function is a generic
constructor method for S4 objects.) The first
argument specifies the class name; other arguments specify values for
slots:
> my.TimeSeries <- new("TimeSeries", + data=c(1, 2, 3, 4, 5, 6), + start=as.POSIXct("07/01/2009 0:00:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("07/01/2009 0:05:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S") + )
There is a generic print method for new S4 classes in R that displays the slot names and the contents of each slot:
> my.TimeSeries
An object of class "TimeSeries"
Slot "data":
[1] 1 2 3 4 5 6
Slot "start":
[1] "2009-07-01 GMT"
Slot "end":
[1] "2009-07-01 00:05:00 GMT"
Not all possible slot values are valid. We want to make sure that
end
occurs after start
and that the lengths of start
and end
are both exactly 1. We can write a
function to check the validity of a TimeSeries
object. R allows you to specify a
function that will be used to validate a specific class. We can specify
this with the setValidity
function:
> setValidity("TimeSeries", + function(object) { + object@start <= object@end && + length(object@start) == 1 && + length(object@end) == 1 + } + ) Class "TimeSeries" [in ".GlobalEnv"] Slots: Name: data start end Class: numeric POSIXct POSIXct
You can now check that a TimeSeries
object is valid with the validObject
function:
> validObject(my.TimeSeries)
[1] TRUE
When we try to create a new TimeSeries
object, R will check the validity
of the new object and reject bad objects:
> good.TimeSeries <- new("TimeSeries", + data=c(7, 8, 9, 10 ,11, 12), + start=as.POSIXct("07/01/2009 0:06:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("07/01/2009 0:11:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S") + ) > bad.TimeSeries <- new("TimeSeries", + data=c(7, 8, 9, 10, 11, 12), + start=as.POSIXct("07/01/2009 0:06:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("07/01/1999 0:11:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S") + ) Error in validObject(.Object) : invalid class "TimeSeries" object: FALSE
(You can also specify the validity method at the time you are
creating a class; see the full definition of setClass
for more information.)
Now that we have defined the class, let’s create some methods that
use the class. One property of a time series is its period. We can
create a method for extracting the period from the time series. This
method will calculate the duration between observations based on the
length of the vector in the data
slot, the start
time, and the
end
time:
> period.TimeSeries <- function(object) { + if (length(object@data) > 1) { + (object@end - object@start) / (length(object@data) - 1) + } else { + Inf + } + }
Suppose that you wanted to create a set of functions to derive the data series from other objects (when appropriate), regardless of the type of object (i.e., polymorphism). R provides a mechanism called generic functions for doing this.[29] You can define a generic name for a set of functions (like “series”). When you call “series” on an object, R will find the correct method to execute based on the class of the object. Let’s create a function for extracting the data series from a generic object:
> series <- function(object) {object@data} > setGeneric("series") [1] "series" > series(my.TimeSeries) [1] 1 2 3 4 5 6
The call to setGeneric
redefined series as a generic function whose default
method is the old body for series:
> series standardGeneric for "series" defined from package ".GlobalEnv" function (object) standardGeneric("series") <environment: 0x19ac4f4> Methods may be defined for arguments: object Use showMethods("series") for currently available ones. > showMethods("series") Function: series (package .GlobalEnv) object="ANY" object="TimeSeries" (inherited from: object="ANY")
As a further example, suppose we wanted to create a new generic
function called “period” for extracting a period from an object and
wanted to specify that the function period.TimeSeries
should be used for TimeSeries
objects, but the generic method
should be used for other objects. We could do this with the following
commands:
> period <- function(object) {object@period} > setGeneric("period") [1] "period" > setMethod(period, signature=c("TimeSeries"), definition=period.TimeSeries) [1] "period" attr(period,"package") [1] ".GlobalEnv" > showMethods("period") Function: period (package .GlobalEnv) object="ANY" object="TimeSeries"
Now we can calculate the period of a TimeSeries
object by just calling the
generic function period
:
> period(my.TimeSeries)
Time difference of 1 mins
It is also possible to define your own methods for existing
generic functions, such as summary
.
Let’s define a summary
method for our
new class:
> setMethod("summary", + signature="TimeSeries", + definition=function(object) { + print(paste(object@start, + " to ", + object@end, + sep="", collapse="")) + print(paste(object@data, sep="", collapse=",")) + } + ) Creating a new generic function for "summary" in ".GlobalEnv" [1] "summary" > summary(my.TimeSeries) [1] "2009-07-01 to 2009-07-01 00:05:00" [1] "1,2,3,4,5,6"
You can even define a new method for an existing operator:
> setMethod("[", + signature=c("TimeSeries"), + definition=function(x, i, j, ...,drop) { + x@data[i] + } + ) [1] "[" > my.TimeSeries[3] [1] 3
(As a quick side note, this works for only some built-in
functions. For example, you can’t define a new print
method this way. See the help file for
S4groupGeneric
for a list of generic
functions that you can redefine this way, and Old-School OOP in R: S3 for an explanation on why this doesn’t always
work.)
Now let’s show how to implement a
WeightHistory
class based on the
TimeSeries
class. One way to do this
is to create a WeightHistory
class
that inherits from the TimeSeries
class but adds extra fields to represent a person’s name and height. We
can do this with the setClass
command
by stating that the new class inherits from the TimeSeries
class and specifying the extra
slots in the WeightHistory
class:
> setClass( + "WeightHistory", + representation( + height = "numeric", + name = "character" + ), + contains = "TimeSeries" + )
Now we can create a WeightHistory
object, populating slots named
in Time
Series
and the new slots for WeightHistory
:
> john.doe <- new("WeightHistory", + data=c(170, 169, 171, 168, 170, 169), + start=as.POSIXct("02/14/2009 0:00:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("03/28/2009 0:00:00",tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + height=72, + name="John Doe") > john.doe An object of class “WeightHistory” Slot "height": [1] 72 Slot "name": [1] "John Doe" Slot "data": numeric(0) Slot "start": [1] "2009-02-14 GMT" Slot "end": [1] "2009-03-28 GMT"
R will validate that the new TimeSeries
object contained within WeightHistory
is valid. (You can test this
yourself.)
Let’s consider an alternative way to construct a weight history.
Suppose that we had created a Person
class containing a person’s name and height:
> setClass( + "Person", + representation( + height = "numeric", + name = "character" + ) + )
Now we can create an alternative weight history that inherits from
both a TimeSeries
object and a Person
object:
> setClass( + "AltWeightHistory", + contains = c("TimeSeries", "Person") + )
This alternative implementation works identically to the original
implementation, but the new implementation is slightly cleaner. This
implementation inherits methods from both the TimeSeries
and the Person
classes.
Suppose that we also had created a class to represent cats:
> setClass( + "Cat", + representation( + breed = "character", + name = "character" + ) + )
Notice that both Person
and
Cat
objects contain a name attribute.
Suppose that we wanted to create a method for both classes that checked
if the name was “Fluffy.” An efficient way to do this in R is to create
a virtual class that is a superclass of both the Person
and the Cat
classes and then write an is.fluffy
method for the superclass. (You can
write methods for a virtual class but can’t create objects from that
class because the representation of those objects is ambiguous.)
> setClassUnion( + "NamedThing", + c("Person", "Cat") + )
We could then create an is.fluffy
method for the NamedThing
class that would apply to both
Person
and Cat
objects. (Note that if we were to define a
method of is.fluffy
for the Person
class, this would override the method
from the parent class.) An added benefit is that we could now check to
see if an object was a NamedThing
:
> jane.doe <- new("AltWeightHistory", + data=c(130, 129, 131, 128, 130, 129), + start=as.POSIXct("02/14/2009 0:00:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + end=as.POSIXct("03/28/2009 0:00:00", tz="GMT", + format="%m/%d/%Y %H:%M:%S"), + height=67, + name="Jane Doe") > is(jane.doe,"NamedThing") [1] TRUE > is(john.doe,"TimeSeries") [1] TRUE
[28] You may have noticed that I picked an example of a class that
is already implemented in R. Time series objects are implemented by
the ts
class in the stats
package. (I introduced ts
objects in Time Series.) The implementation in the
stats
package is an example of an
S3 class. We’ll talk more about what that means, and how to use S3
and S4 classes together, next.
[29] In object-oriented programming terms, this is called overloading a function.