Working with RStudio and projects

In the times before RStudio, it was very hard to manage bigger projects with R in the R console, as you had to create all the folder structures on your own.

When you work with projects or open a project, RStudio will instantly take several actions. For example, it will start a new and clean R session, it will source the .Rprofile file in the project's main directory, and it will set the current working directory to the project directory. So, you have a complete working environment individually for every project. RStudio will even adjust its own settings, such as active tabs, splitter positions, and so on, to where they were when the project was closed.

But just because you can create projects with RStudio easily, it does not mean that you should create a project for every single time that you write R code. For example, if you just want to do a small analysis, we would recommend that you create a project where you save all your smaller scripts.

RStudio offers you an easy way to create projects. Just navigate to File | New Project and you will see a popup window with the following options:

Creating a project with RStudio

These options let you decide from where you want to create your project. So, if you want to start it from scratch and create a new directory, associate your new project to an existing one, or if you want to create a project from a version control repository, you can avail of the respective options. For now, we will focus on creating a new directory.

The following list will show you the next options available:

Creating a project with RStudio

We will look in the categories, R Package and Shiny Web Application later in this book, so for now we will concentrate on the Empty Project option.

A very important question you have to ask yourself when creating a new project is where you want to save it? There are several options and details you have to pay attention to especially when it comes to collaboration and different people working on the same project.

You can save your project locally, on a cloud storage or with the help of a revision control system such as Git.

An easy way to store your project and to be able to access it from everywhere is the use of a cloud storage provider like Dropbox. It offers you a free account with 2 GB of storage, which should be enough for your first project.

To begin your first project, choose the New Directory option we described before and create an empty project. Then, choose a name for the directory and the location that you want to save it in. You should create a projects folder on your Dropbox.

The first project will be a small data analysis based on a dataset that was extracted from the 1974 issue of the Motor Trend US magazine. It comprises fuel consumption and ten aspects of automobile design and performance, such as the weight or number of cylinders for 32 automobiles, and is included in the base R package. So, we do not have to install a separate package to work with this dataset, as it is automatically loaded when you start R.

Creating your first project

As you can see, we left the Use packrat with this project option unchecked. Packrat is a dependency management tool that makes your R code more isolated, portable, and reproducible by giving your project its own privately managed package library. This is especially important when you want to create projects in an organizational context where the code has to run on various computer systems, and has to be usable for a lot of different users. This first project will just run locally and will not focus on a specific combination of package versions.

If you want to create a report from your R script, you have to specify the relative path to the data file, beginning with two dots:

Next, we can take a look at the different variables and see if we can find any correlations on the first look. We can create a pairs matrix with the following line:

We can then save the created matrix with the export function of the Plots Pane option. Then, we can save it as an image in the plots folder:

Correcting the path for report exporting

As you can see, we can expect a lot of different variable combinations, which could correlate very well. The most obvious one is surely weight of the car (wt) and Miles per Gallon (mpg): a heavy car seems to need more gallons of fuel than a lighter car.

We can now test this hypothesis by calculating the correlation and plotting a scatterplot of these two variables. In addition, we can also do a linear regression and see how it performs:

Correcting the path for report exporting

We can see more details with:

We can see that there is a high negative correlation between these two variables, and the first model is a pretty good fit with an R-squared value of 0.75.

But we also have to test other combinations and see how they perform. And what we basically do is test all the correlations and use the best model.

We will not explain the statistical functions behind this approach, as it would be out of the scope of this chapter:

The output will look like this:

Correcting the path for report exporting

The best model now has the following formula:

So, we will create a final model with this formula and see how it performs:

Correcting the path for report exporting

As we can see, the final model also includes the variable, qsec, which is the time the car needs for a quarter mile, and am, which is the type of transmission (automatic or manual).

But, we can also see that just the transmission type, manual, seems to play a significant role when it comes to mileage.

After you execute the analysis script, you can see that all your results are still in RStudio, which is a big advantage in contrast to the R console.

So, you can go through all the graphs you produced in the plot viewer with the arrows.

Correcting the path for report exporting

Or, you can see which variables are set in the environment. These are all the models you calculated in this analysis, as well as in your initial dataset.

Correcting the path for report exporting

You can click on the table icon behind cars_data in the Environment pane to open the data frame in the Source pane.

Correcting the path for report exporting