Adding a dataset to a package

Packages are also used to deliver research and research results to people and make them reproducible. Therefore, it is often necessary to include a dataset in our R package. Many packages also use this possibility to include this data into their demo code, to give users the possibilities to execute a demo version of the package's functions instantly without the need to import your own data.

To include this data, R provides several options. The choice of the option depends on what kind of data we want to attach to the package and what this data will be used for.

The most common way is to include it in the data subdirectory. This way is often used when our dataset is used for the example code. Another way to include it is an .rda file in the sysdata.rda file. We can use this function if we do not want the package's users to have full access to these datasets.

The data files we can include in the package can be in three formats:

R code
Tables (.txt or .csv* files)
Save() images (.RData or .rda files)

Tip

Please note, that the csv files in this context are not normal csv files. They have to be in a special format to be included this way. We can find more information about this format at: http://tools.ietf.org/html/rfc4180.

Creating .rda files

To create .rda files we can create them in R or load them into R and then call the save() function. This function will then save this data to an .rda file.

The following code shows how to create such a data file:

df = data.frame(matrix(rnorm(10), nrow = 5))
save(df, file = "dataFile.Rda")

This code will create the file dataFile.Rda, which can then be found in the home directory of our project.

Creating .rda files

These .Rda files can be loaded into the working environment simply by clicking on them. This will open a pop up where we can confirm that this RData file should be loaded.

Creating .rda files

After loading the file into the environment, we can find it in the Environment panel. Then we can work with them like we are used to.

Creating .rda files

The compression with the save() function is also the best way when we want to ship very large datasets with our package.

Using LazyData with a package

As R has to load every dataset into the memory before it can be used, it is important, especially when we have bigger datasets, to use LazyData in our package. We can activate it by adding the following line to the DESCRIPTION file:

LazyData: true

Then, our datasets are not loaded into the memory until we really use them. This often saves a lot of memory and you should use LazyData in all packages that include data files.