Press "Enter" to skip to content

How to Make a Scatter Plot in R with Ggplot2

Last updated on October 17, 2019

Share if You Like!

In this post, we will learn how make scatter plots using R and the package ggplot2.

More specifically, we will learn how to make scatter plots, change the size of the dots, change the markers, the colors, and change the number of ticks. 

Furthermore, we will learn how to plot a trend line, add text, plot a distribution on a scatter plot, among other things. In the final section of the scatter plot in R tutorial, we will learn how to save plots in high resolution.

What is a Scatter Plot?

Before continuing this scatter plots in R tutorial, we will breifly discuss what a scatter plot is. This plot is a two-dimensional (bivariate) data visualization that uses dots to represent the values collected, or measured, for two different variables.

That is, one of the variables is plotted along the x-axis and the other plotted along the y-axis. For example, the scatter plot below, created in R, shows the relationship between speed and stopping distance of cars.

Note, in this scatter plot a trend line as well as the correlation between the two variables are added.

Required r-packages

In this scatter plot tutorial, we are going to use a number of different r-packages. Therefore, we need to have them installed before continuing. Now, the easiest way to get all of the packages is to install the tidyverse packages.

How to Install R-packages

Here’s how to install the tidyverse package using the R command prompt using the install.packages() function,

install.packages(c("tidyverse", "GGally"))

If we only want to install the packages used in this scatter plot tutorial this is, of course, possible.

to.install <- c("magittr", "purrr", 
  "ggplot2", "dplyr", "broom", "GGally")
install.packages(to.install)

How to Make a Scatter Plot in R

In this section, we will learn how to create a scatter plot using R statistical programming environment. In the first code chunk, below, we print the dataset we start with; the mtcars dataset.

require(ggplot2)

head(mtcars)

In most of the examples, in this scatter plot tutorial, we are going to use available R datasets. Most of the time, however, we will use our own dataset that can be stored in Excel, CSV, SPSS, or other formats. In the tutorial below, we will learn how to read xlsx files in R.

How to use Ggplot2 to Produce Scatter Plots in R

In this section we will learn how to make scattergraphs in R using ggplot2.

How to Make a Scatter Plot in R

In the first ggplot2 scatter plot example, below, we will plot the variables wt (x-axis) and mpg (y-axis). This will give us a simple scatter plot showing the relationship between these two variables.

Before going on and creating the first scatter plot in R we will briefly cover ggplot2 and the plot functions we are going to use. First, we start by using ggplot to create a plot object.

Inside of the ggplot() function, we’re calling the aes() function that describe how variables in our data are mapped to visual properties . In this simple scatter plot in R example, we only use the x- and y-axis arguments and ggplot2 to put our variable wt on the x-axis, and put mpg on the y-axis.

Finally, still in the ggplot function, we tell ggplot2 to use the data mtcars. Next we’re using geom_point() to add a layer. This function is what will make the dots and, thus, our scatter plot in R.

data(Salaries, package = "carData")


gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)

gp + geom_point()

How to Change the Size of the Dots in a Scatter Plot

If we want to have the size of the dots represent one of the variables this is possible.  So, how do you change the size of the dots in a ggplot2 plot? In the next example, we change the size of the dots using the size argument.

gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)

gp + geom_point(aes(size = 4))

In the scatter plot example above, we again used the aes() but added the size argument to the geom_point() function. When creating a scatter plot we can also change the size of the based on values from one of our columns. In the next example, we are going to use wt variable for the dot size:

gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)

gp + geom_point(size = wt)

How to Change the Number of ticks using ggplot2

In the next scatter plot in R example, we are going to learn how to change the ticks on the x- axis and y-axis. That is, we are going to change the number of ticks on each axis. This is done by adding two new layers to our R plot.

More specifically, to change the x-axis we use the function scale_x_continuous and to change the y-axis we use the function scale_y_continuous. Furthermore, we use the arguments limits, which take a vector, and we can set the limits to change the ticks.

gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars) + 
    geom_point()

gp + scale_y_continuous(limits=c(1, 40)) +
   scale_x_continuous(limits=c(0, 6))

In the next scatter plot example, we are going to change the number of ticks on the x- and y-axis. To accomplish this, we add the breaks argument to the above functions. Furthermore, we add the seq function to create a numeric vector.

gp + scale_y_continuous(limits=c(1, 35),
                                breaks=seq(1, 35, 5)) +
   scale_x_continuous(limits=c(1.5, 5.5),
                     breaks=seq(1.5, 5.5, 1))

Grouped Scatter Plot in R

If we have a categorical variable (i.e., a factor) and want to group the dots in the scatter plot we use the color argument. Note, that we use the factor function to change the variable vs to a factor.

gp <- ggplot(aes(x=wt, y=mpg, color=factor(vs)), 
             data=mtcars)
gp + geom_point()

Alternatively, we can change the vs variable to a factor before creating the scatter plot in R. This is done using the as.factor function. This has the advantage that the legend text will only say “vs”. Here’s how to change a column to a factor in an R dataframe:

mtcars$vs <- as.factor(mtcars$vs)
gp <-ggplot(aes(x=wt, y=mpg, color=vs), 
             data=mtcars)
gp + geom_point()

Changing the Markers (the dots)

Now, one way to change the look of the markers is to use the shape argument. In the scatter plot in R, example below we are using a different dataset. Note, we are using the data function to load the Burt dataset from the package carData.

In the next, lines of code we change the class variable to a factor. Note that we are adding thea aes() function in the geom_point() function. In the aes() function we are adding the color and shape arguments and add the class column (the categorical variable). This way, our scatter plot is grouped by class both when it comes to the shape and the colors of the markers.

data(Burt, package = 'carData')


Burt$class <- as.factor(Burt$class)
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class, 
             shape = class))

How to Add a Trend Line to a Scatter Plot in R

In many cases, we are interested in the linear relationship between the two variables. For instance, we may continue by carrying out a regression analysis and want to illustrate the trend line on our scatter plot.

Luckily, this is quite easy using ggplot2; we just use the geom_smooth() function and the method “lm”. Finally, we set the parameter se to FALSE.

gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class, 
             shape = class)) +
      geom_smooth(method = "lm", se = FALSE)

In the next scatter plot example, we are going to add a regression line to the plot for each factor (category) also. Remember, we just add the color and shape arguments to the geom_point() function:

gp + geom_point(aes(color = class, 
             shape = class)) +
      geom_smooth(aes(color = class), method = "lm", se = FALSE)

Bivariate Distribution on a Scatter plot

In the next scatter plot in R example, we are going to plot a bivariate distribution as on the plot. To accomplish this we add the layer using the geom_density2d() function.

gp <- ggplot(aes(x=wt, y=mpg), 
             data=mtcars)
gp + geom_point() + geom_density2d()

How to Add Text to Scatter Plot in R

In this section, we are going to carry out a correlation analysis using R, extract the r– and p-values, and later learn how to add this as text to our scatter plot.

Here, we will use two additional packages and you can, of course, carry out your correlation analysis in R without these packages. The packages we are going to use here are dplyr, and broom.

require(dplyr)
require(broom)

corr <- mtcars  %$%
             cor.test(mpg, wt) %>%
             tidy %>%
             mutate_if(is.numeric, round, 4)

corr

In the code chunk, above, we are using the pipe functions %$% and %>%, cor.test() to carry out the correlation analysis between mpg and wt, and tidy() convert the result into a table format.

Finally, in the pipeline, we use the mutate_if with the is.numeric and round functions inside. The is.numeric function is used to make sure the round function is only applied on numeric values.

The resulting table will have the values we need, as well as confidence interval, t-value (statistic), what method we used, and whether we used a two sided or one sided test:

Now that we have our correlation results we can extract the r- and p-values and create a character vector. In the next code chunk, we use the paste0 and paste functions to do this. Furthermore, we are using the ifelse function to print the full p-value if it’s larger than 0.01.

text = paste0('r = ', corr$estimate, ', ',
             ifelse(corr$p.value <= 0,
                           'p < 0.05',
                   paste('p = ', corr$p.value))
             )

text

Adding Text to a Plot in R

It’s time to put everything together. In this scatter plot with R example, we are going to use the annotate function. When we use the annotate function, we use the x and y parameters for the positioning of the text and the label parameter is where we use our character vector, text. Put simply, we added a new layer to the ggplot2, with our text.

gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)
gp + geom_point() + geom_smooth(method = "lm", se = FALSE) +
    annotate('text',  x = 4.5, y = 35, label=text)

Now what if we wanna plot correlations by group on a scatter plot in R? Well, in the next code chunk we are going to use the tidyr and purrr packages, as well.

As this example is somewhat more complex, compared to the previous one, we are not going into detail of what is happening. However, we use the pipe, %>%, again. The nest function, here, is used to get the dataset grouped by class. More specifically, it creates smaller dataframes (by class) within our dataframe.

Now, after we have applied the nest function, we use mutate and create a column, within the new dataframe we are creating. We use the map function where we carry out the correlation analysis on each dataframe (e.g., by class). Furthermore, we are using map_dbl function twice, to extract the p- and r-values. Finally, the mutate_if is, again, used to round the numeric values and select will select the columns we want.

require(tidyr)
require(purrr)

data(Burt, package = 'carData')


corr <- Burt %>% group_by(class) %<%
  nest() %<% 
  mutate(Cor = map(data, ~ cor.test(.$IQbio, .$IQfoster)),
         p   = map_dbl(Cor, 'p.value'),
         est = map_dbl(Cor, 'estimate')
             ) %<%
  mutate_if(is.numeric, round, 4) %<%
  select(class, p, est, Cor)

text <- corr %<%
  mutate(
        text = paste0('r = ', est, ', ',
             ifelse(p <= 0.01,
                           'p < 0.05',
                   paste('p = ', p))))

Note, the text (character vector) is, like in the previous example, created using paste0 and paste. In the scatter plot using R example, below, we are going to use the function geom_text() to add text.

Burt$class <- as.factor(Burt$class)

gp < ggplot(aes(x = IQbio, y = IQfoster), 
             data = Burt) + 
    geom_point(aes(color = class, 
             shape = class))


corrp <- gp + geom_point(aes(color = class, 
             shape=class))
      geom_smooth(aes(color = class), method = "lm", se = FALSE) +
      geom_text(aes(x = 120, y = 137, color="high", 
                    label=subset(text, class == "high")$text)) +
      geom_text(aes(x = 118, y = 109, color="medium", 
                    label=subset(text, class == "medium")$text)) +
      geom_text(aes(x = 124, y = 103, color="low", 
                    label=subset(text, class == "low")$text))

corrp

Now, in the code chunk above, we use the aes() function inside the geom_text function. Here, we use the x and y arguments for coordinate, color (set to each class), and label to set the text. Note, that we use the subset() function to make a subset of the text table with each class and we select the text by using the $ operator and the column name (text). The resulting scatter plot looks like this:

How to Style a Scatter plot in R

In this section, we are going to learn how to change the grey background of the ggplot2 scatter plot to white. We are also going to learn how to add lines to the x- and y-axis, get remove the grid, remove the legend title, and keys.

Now, to accomplish this we add three more layers to the above plot. First, we use the function theme_bw() to get a dark-light themed plot. After this, we are going to make the scatter plot in black and grey colors using the scale_colour_grey() function. Finally, we add a theme layer using the function theme().

corrp + theme_bw() +  scale_colour_grey() +
  theme(axis.line = element_line(colour = "black")
        ,plot.background = element_blank()
        ,panel.grid.major = element_blank()
        ,panel.grid.minor = element_blank()
        ,strip.background = element_blank()
        ,panel.border = element_blank() 
        ,legend.title=element_blank()
        ,legend.key = element_blank())

In the theme function, there are a lot of things going on and it may be easier to play around with removing the different elements. Note, that the function element_blank() will make draw “nothing” at that particular parameter. For instance, plot.background = element_blank() will give the plot a blank (white) background.

How to Rotate the Axis using Ggplot2

In this section, we are going to create a scatter plot with R and rotate the x-axis labels.

data(Salaries, package = "carData")
Salaries$rank <- as.factor(Salaries$rank)

gp <- ggplot(aes(x = salary, y = yrs.since.phd), 
             data = Salaries) + 
    geom_point(aes(color = rank, 
             shape = rank)) +
    geom_smooth(method = "lm") +
   scale_y_continuous(limits = c(0, 60)) +
   scale_x_continuous(limits = c(50000, 240000), 
                      breaks = seq(50000, 240000, by = 10000))

Now, as we have set the x-ticks to be every 10000 we will get a scatter plot in which we cannot read the axis labels. To accomplish this, we add a theme layer using the theme() function. Here we use the axis.text.x and use the function element_text(). Inside the later function we set the angle-argument to 90 to rotate the text 90 degrees

gp + theme(axis.text.x = 
           element_text(angle = 90, hjust = 1))

Pairplot in R: Scatterplot + Histogram

In the last section, before learning how to save high resolution Figures in R, we are going to use create a pairplot using the package GGally. More specifically, we are going to create a scatter plot as well as histograms for pairs of variables in the dataset mtcars.

require(GGally)

cols = c('mpg', 'wt', 'hp', 'qsec')
ggpairs(mtcars, columns = cols)

Saving a High Resolution Plot in R

Now that we know how to create scatter plots in R, we are going to learn how to save the pltos in high resuolution. In this section, we are going to learn how to save ggplot2 plots as PDF and TIFF files.

For instance, if we are planning to use the scatter plots we created in R, we need to save the them to a high resolution file. In the last R code examples, we will learn how to save a high resolution image using R.

First, we create a new scatter plot using R and we use most of the functions that we have used in the previous examples.

data(Salaries, package = "carData")


gp <- ggplot(aes(x=yrs.since.phd, y=salary), 
             data=Salaries) + 
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, colour="gray") +
    theme_bw() +
    theme(axis.line = element_line(colour = "black")
        ,plot.background = element_blank()
        ,panel.grid.major = element_blank()
        ,panel.grid.minor = element_blank()
        ,strip.background = element_blank()
        ,panel.border = element_blank() 
        ,legend.title=element_blank()
        ,legend.key = element_blank())  +
    xlab('Years since Ph.D.') +
    ylab('Salary')

Second, we use the ggsave() function to save the scatter plot. Note, in both examples here we se the width and height in centimetres.

How to Save a Scatter Plot to PDF in R

Now, we are ready to save the plot as a .pdf file. In the code chunk, we use the device and set it to “pdf” as well as giving the file a file name (ending with “.pdf”).

ggsave("salaries_by_year_scatterplot.pdf", device = "pdf",
       width = 12, height = 8,
       units = "cm", dpi = 300)

How to Save a Scatter Plot to TIFF in R

In the final code chunk, below, we are again using the ggsave() function but change the device to “tiff” and the file ending to “.tiff”.

ggsave("salaries_by_year_scatterplot.tiff", device = "tiff",
       width = 12, height = 8,
       units = "cm", dpi = 300)

Reproducible Data Visualization

Before concluding this scatter plot in R tutorial, we will briefly touch on the topic of reproducible research. Research is considered to be reproducible when other researchers can produce the exact results, when having access to the original data, software, or code. This, of course, also means that our plots need to reproducible. Learn how to create a fully reproducible environment in the Binder and R for reproducible science tutorial.

Conclusion

In this post we have learned how to make scatter plots in R. Moreover, we have also learned how to:

  • change the color, number of ticks, the markers, and rotate the axis labels of ggplot2 plots
  • save a high resolution, and print ready, image of a ggplot2 plot

Here’s a Jupyter notebook with the code used in this blog post and here is, the same notebook, on nbviewer.

Share if You Like!

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: