Summarizing multivariate data in a single heat map

In the preceding couple of recipes, we looked at representing a matrix of data along two axes on a heat map. In this recipe, we will learn how to summarize multivariate data using a heat map.

We are only using the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the nba.csv example dataset for this recipe. So, let's first load it:

This example dataset, which shows some statistics on the top scorers in NBA basketball games has been taken from a blog post on FlowingData (see http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/ for details). The original data is from the databaseBasketball.com website (http://databasebasketball.com/). We will use our own code to create a similar heat map showing player statistics.

We will use the RColorBrewer library for a nice color palette, so let's load it:

We are going to summarize a number of NBA player statistics in the same heat map using the image() function:

How to do it...

Once again, in a way similar to the preceding couple of recipes, we first formatted the dataset with the appropriate row names (in this case, names of players) and cast it as a matrix. We did one additional thing—we scaled the values in the matrix using the scale() function, which centers and scales each column so that we can denote the relative values of each column on the same color scale.

We chose a blue color palette from the RColorBrewer library. We also created a vector with the descriptive names of the player statistics to use as labels for the x axis.

The code for the heat map itself and the axis labels is very similar to the previous recipe. We used the image() function with data_matrix as z and suppressed the default axes. Then, we used text() and axis() to add the x and y axis labels. We also used the text() function to add the graph title (instead of the title() function) in order to left align it with the y axis labels instead of the heat map.

As shown in the FlowingData blog post, we can order the data in the matrix as per the values in any one column. By default, the data is in the ascending order of total points scored by each player (as can be seen from the light to dark blue progression in the Total Points column). To order the players based on their scores from highest to lowest, we need to run the following code after reading the CSV file:

Then, we can run the rest of the code to make the following graph:

There's more