How to Create a Violin plot in R with ggplot2 and Customize it

This data visualization tutorial will teach us how to make a violin plot in R using ggplot2. We can use several techniques to visualize data (see, for example, the Python-related post “9 Data Visualization Techniques You Should Learn in Python“) to visualize our data in r. Briefly described, violin plots combine a box plot and a histogram in the exact figure. In the next section, after the table of contents, you will get a brief overview of the content of this blog post.

Table of Contents

Outline

Before creating a violin plot in R, we will look at what you need to be able follow in this data visualization tutorial. We will answer some questions when we have what we need (e.g., learn what a violin plot is). In the sections following this, we will get into the practical details. We will learn how to create violin plots in R using ggplot2. Furthermore, we will also learn how to customize the plots. For example, you will learn to show the plot horizontally, fill it with a color based on category, and add/change labels.

Requirements

First, you need to have an active installation of R. Second, to use both ggplot2, you need to install the package. Installing R packages can be done by using the install.packages() command:

install.packages("ggplot2)Code language: CSS (css)

Note that it is also good to run an updated environment. You can easily check the R version in RStudio with the R.version command. If, needed, you can download and update R to a newer version. It is worth pointing out that ggplot2 is part of the Tidyverse package. You can install Tidyverse to get ggplot2, among many other handy R packages. For example, you can use dplyr to rename a column in R, remove duplicates, and count the number of occurrences in a column. In the next section, we will get answers to some commonly asked questions.

What is a Violin Plot?

As mentioned earlier in the post, a violin plot is a data visualization method combining box plots and histograms. This type of plot will display the distribution, median, and interquartile range (iqr) of data. The iqr and median are the statistical information in the box plot, whereas the histogram displays distribution.

What does a Violin plot show?

A violin plot shows numerical data. Specifically, it will reveal the numerical data’s distribution shape and summary statistics. It can explore data across different groups or variables in our datasets.

How do you make a violin plot in R?

You can use ggplot2 and the geom_violin() function to make a violin plot in R. For example, if we have the dataframe dataF and want to create a violin plot of the two groups’ response times, you can use the following code: p <- ggplot(aes(Group, RT), data = dataF)).

Example Data to Visualize

In this post, we will work with fake data from a Psychology experiment. The dataset can be downloaded here and is fake data that could be obtained using e.g., the Flanker task created with OpenSesame. Here is how we can read the data into R using read.csv() function:

data = 'https://raw.githubusercontent.com/marsja/jupyter/master/flanks.csv'

df <- read.csv(data)
head(df)Code language: R (r)

Note you can import data from different sources than CSV files:

Here’s a quick overview of the dataframe in which we can see the first six rows of the columns:

First 6 rows of dataframe
  • Save

If you already have your data in a list, you can convert a list to dataframe in R. In the next code chunk, we will use some neat functions from the dplyr package: group_by() and summarise_all() to calculate descriptive statistics in R:

df %>% group_by(TrialType) %>% 
  select(ACC, RT) %>% 
  summarise_all(list(mean = mean,
                     std = sd, 
                     min = min, 
                     max = max))Code language: R (r)
  • Save

In the code above, we first used dplyr’s group_by to group the data by trial type (i.e., the column TrialType). Second, we used dplyr to select columns by name using the select() function.  Finally, we used the summarise_all() function (also from dplyr) together with list(). Here we calculate mean, standard deviation, min, and max. For more information about summary statistics in R, see the following posts:

In the next section, we will load the ggplot2 library and learn how to create a simple violin plot in R.

How to Make a Violin Plot in R with ggplot2

Here’s how to create a violin plot with the R package ggplot2:

p <- ggplot(df, aes(TrialType, RT))
p + geom_violin()Code language: R (r)

In the code above, we first created a plot object with the ggplot() function. Here we used the aes() function as input. Moreover, we used the grouping column (i.e., TrialType) as the first argument and the dependent variable (response time) as the second. In the next row, we use the geom_violin()  function. This will, in turn, create the violin plot layer. Here’s the resulting violin plot that we created using R and ggplot2:

violin plot in R
  • Save

Making the Violing Plot Horizontal in R

In the next example, we will use the coord_flip() function to create a horizontal violin plot:

p <- ggplot(df, aes(TrialType, RT))
p + geom_violin() + 
  coord_flip()

As you can see, in the code chunk above, we just added the function, and this will result in this plot:

violin plot created in R
  • Save

In the next section, we will continue creating a violin plot using R and ggplot2 overlaying a boxplot. 

How to Create a Violin Plot in R: interquartile range, median

Here is how we can display a violin plot in R and add interquartile range and median by overlaying a boxplot:

p <- ggplot(df, aes(TrialType, RT))
p + geom_violin() + geom_boxplot()
violin plots in R
  • Save
Violin Plot in R

As you can see, the only addition to the previous code is that we also use the function. However, the created violin plot (see image above) can be better. For example, if we use the width argument, we can get a better violin plot:

p <- ggplot(df, aes(TrialType, RT))
p + geom_violin() + geom_boxplot(width = .2)Code language: HTML, XML (xml)
Width on boxplot changed
  • Save

The next section will use the quantiles argument in the geom_violin() function. We will see that we can use this to display 25th, 50th, and 75th quantiles, for example. In the following examples, we will play around with the color and theme of violin plots we created with R. 

How to add Quantiles to the Violin Plot in R

Here’s how we use the quantiles parameter to add quantiles to a violin plot:

p <- ggplot(df, aes(TrialType, RT))
p + geom_violin(draw_quantiles = c(.25, .50, .75))Code language: R (r)
  • Save

As you can see, we get three lines in the violin plot now. In the next example, we will learn how to customize the violin plot we create in R using the color parameter.

How to Change the Color of a Violin Plot in R

Here is how we can change the color of a violin plot in R:

p <- ggplot(df, aes(TrialType, RT, color = TrialType))
p + geom_violin() + geom_boxplot(width = .2)Code language: HTML, XML (xml)

In the code chunk above, we added one argument to the aes() function: the color argument. We can use this parameter if we want the lines of the violin plot to be different for the different groups (i.e., of different colors). Here is the resulting plot:

  • Save

In the following example, we will also fill the violin plot. As you will see, this is easy, and we use the fill parameter.

How to Fill a Violin Plot in R

Here is how you can change the color (or fill) of a violin plot in R:

p <- ggplot(df, aes(TrialType, RT, fill = TrialType))
p + geom_violin() + geom_boxplot(width = .2)Code language: HTML, XML (xml)

In the code chunk above, we added a parameter: fill. Moreover, we used the TrialType (a categorical variable) column here to fill the violin plots and box plots based on the trial type they belong to. Here is the resulting plot:

change the filling of violin plot in R
  • Save

Changing the Labels on Violin Plots in R

Here’s how we can change the labels of the violin plot we have created in R:

p + geom_violin() + geom_boxplot(width = .2) +
  labs(
    title = "Comparison of Response Time by Trial Type",
    x = "Trial Type",
    y = "Response time (ms.)"
  )Code language: R (r)

In the code chunk above, we added the labs() function. In this function, we worked with a couple of parameters. First, we added a title by using the title parameter. In the next two rows, we changed the x- and y-titles. This may be useful when we want to present the data to other researchers (e.g., publishing our results) and our variables (in the dataset) have shortened names (such as RT). To learn more about customizing ggplot2 figures, see, e.g., the post How to Make a Scatter Plot in R with Ggplot2. In a recent post, you can learn how to make a Residual plot in R using ggplot2.

  • Save

Alternative R Packages to Make a Violin Plot

Now, before concluding this post, it may be worth mentioning that there are plenty of other options (e.g., vioplot, violinplotter) that can be used to create violin plots in R.  For example, here’s how to install and create a plot using the violinplotter package:

install.packages("violinplotter")
library(violinplotter)
violinplotter(RT ~ TrialType, data = df)Code language: R (r)

As you can see, in the code chunk above, we use a formula as the first parameter (the second is the dataframe). Here’s the resulting violin plot:

violin plot in R created with violinplotter package
  • Save

In the image above, we get more information from the violin plot: the number of observations in each category, the standard deviation, standard error, and 95% confidence intervals. 

Conclusion

In this post, you have learned how to make a violin plot in R. First; you learned what you need to create a violin plot. Second, you learned more about this data visualization technique. Second, you learned how to use ggplot2 to create a violin plot by a couple of examples. Specifically, you learned how to display the violin plot horizontally, add a boxplot to the violin plot, change the color and fill the plot, and finally, change the plot’s labels. Hopefully, you have learned something. I hope you did. If you have any questions about the blog post, please comment below. Moreover, if you have any suggestions on what I should cover on this blog, comment below. 

Resources

Here are some R tutorials for your needs:

  • Save

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link
Powered by Social Snap