In this post, we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA for within-subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons of using R or Python for data analysis (i.e., ANOVA).
Learn about probabilistic programming in this guest post by Osvaldo Martin, a researcher at The National Scientific and Technical Research Council of Argentina (CONICET) and author of Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition.
This post is based on an excerpt from the second chapter of the book that I have slightly adapted so it’s easier to read without having read the first chapter.
In an earlier post I showed four different techniques that enables one-way analysis of variance (ANOVA) using Python. In this post we are going to learn how to do two-way ANOVA for independent measures using Python.
First, we ar going to learn how to calculate the ANOVA table “by hand”. Second, we are going to use Statsmodels and, third, we carry out the ANOVA in Python using pyvttbl. Finally, as a bonus, we will also use Pingouin Stats, a newer Python package.
An important advantage of the two-way ANOVA is that it is more efficient compared to the one-way. There are two assignable sources of variation – supp and dose in our example – and this helps to reduce error variation thereby making this design more efficient.
Two-way ANOVA (factorial) can be used to, for instance, compare the means of populations that are different in two ways. It can also be used to analyse the mean responses in an experiment with two factors. Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same time.
One can also test for independence of the factors provided there are more than one observation in each cell. The only restriction is that the number of observations in each cell has to be equal (there is no such restriction in case of one-way ANOVA).
The current post will focus on how to carry out between-subjects ANOVA in Python. As mentioned in an earlier post (Repeated measures ANOVA with Python) ANOVAs are commonly used in Psychology. We start with some brief introduction on theory of ANOVA.
If you are more interested in the four methods to carry out one-way ANOVA with Python click here.
In this post we will learn how to carry out ANOVA using SciPy, calculating it “by hand” in Python, using Statsmodels, and Pyvttbl.
Update: the Python package Pyvttbl is not maintained since a couple of year but there’s a new package called Pingouin. As a bonus, how to use this package is added in the end of the post.
Introduction to ANOVA
Before we learn how to do ANOVA in Python, we are briefly discussing what ANOVA is. ANOVA is a means of comparing the ratio of systematic variance to unsystematic variance in an experimental study. Variance in the ANOVA is partitioned in to total variance, variance due to groups, and variance due to individual differences.
Good resources for learning R as a Psychologist are hard to find. By that I mean that there are so many great sites and blogs around the internet to learn R. Thus, it may be hard to find learning resources that targets Psychology researchers.
Recently I wrote about four good R books targeted for Psychology students and researchers (i.e., R books for Psychologists). There are, however, of course other good resources for Psychological researchers to learn R programming.
Therefore, this post will list some of the best blogs and sites to learn R. The post will be divided into two categories; general and Psychology focused R sites and blogs. For those who are not familiar with R I will start with a brief introduction on what R is (if you know R already; click here to skip to the links).
What is R?
If you are new to R you might wonder what R is? R is a free and open source programming language and environment. Data analysis in R is carried out by writing scripts and functions. Finally, R is a complete, interactive, and object-oriented language.
In R statistical environment, you are able to carry out a variety of statistical and graphical techniques. For instance, linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and many more can be carried using both frequentist and Bayesian paradigms.
If you are new to R, you may want to start with R commander. This will provide you with a menu making the process of learning R a bit easier at the beginning. R can be downloaded here: The Comprehensive R Archive Network.
One of the main things that I like with R is the broad and helpful community. This also means that there are many good resources for learning the language.
R is a free and open-source statistical programming environment. Being open-source and free it has a large and helpful online community (for instance, see StackOverflow). When I went from carrying out analysis in SPSS to do them in R, I searched for good books targeted to Psychologists. The following 4 R books are useful and good for Psychologists that want to learn R.
The first book, Discovering Statistics Using R, may be a really good start if you are an undergraduate and have no experience of programming or statistics. The next two books are from intermediate to advanced level. The last book is, at the moment, free and is also a great introduction to statistics.