In this brief Python data analysis tutorial, we will learn how to carry out a repeated measures ANOVA using Statsmodels. More specifically, we will learn how to use the AnovaRM class from statsmodels anova module.
To follow this guide you will need to have Python, Statsmodels, Pandas, and their dependencies installed. One easy way to get these Python packages installed is to install a Python distribution such as Anaconda (see this YouTube Video on how to install Anaconda). However, if you already have Python installed you can of course use Pip.
How to carry out repeated measures ANOVA using other Python packages:
- Repeated Measures ANOVA using Python
- Two-way ANOVA for repeated measures using Python
- Repeated Measures ANOVA in R and Python using afex & pingouin
This short tutorial is structured in a way that we will first learn how to install Statsmodels and Pandas, the how to carry out one-way and two-way ANOVA using statsmodels.
Finally, there is a YouTube video showing how to carry out repeated measures ANOVA using Statsmodels and R. This Youtube video will also show some of the differences between Statsmodel and the r-package afex and the function aov_ez.
How to Install Statsmodels & Pandas
Statsmodels and Panda can easily be installed using pip:
pip install pandas statsmodelsCode language: Bash (bash)
Now, if there’s a newer version of pip available, and you want to get that version, here’s a blog post about how to upgrade pip.
Now, if your data is not normally distributed you should consider transforming it to normal shape.
How to Use AnovaRM to Carry Out a Repeated Measures ANOVA
In this section we are going to learn how to do a repeated measures ANOVA using Statsmodels. More specifically, we are going to learn how carry out a one-way ANOVA and two-way ANOVA in Python. The AnovaRM class takes five arguments:
- data: the first argument should be a dataframe object.
- depvar: second should variable should be your dependent variable. Should be a string (e.g., ‘responsetime’)
- subject: here you put in your subject identifier. Should also be a string (e.g., ‘subject’)
- within: the within-subject factors in a lit of strings.
- aggregate_func: this is optional and should be use if the data contains more than a single observation per participant. Can be “mean” or a function. For instance, you can use Numpy mean (i.e., np.mean).
Note, if you only have two pairs of matched values (i.e., only to levels of a factor) you can instead use Python to carry out the paired sample t-test.
One-way ANOVA for Repeated Measures Using Statsmodels
First, we start with the one-way ANOVA. In the examples below, we are going to use Pandas and the AnovaRM class from statsmodels. In the first example, we are using Pandas to use read_csv to load this data into a dataframe. See my Python Pandas Dataframe tutorial if you need to learn more about Pandas dataframes.
import pandas as pd from statsmodels.stats.anova import AnovaRM df = pd.read_csv('rmAOV1way.csv')Code language: Python (python)
We can use Pandas head() to have a look at the first five rows (i.e., df.head()):
As can be seen above we have the columns Sub_id, rt, and cond. These columns represent the subject identifier, the dependent variable, and the independent variable, respectively. Note, there are two levels of cond (using df.cond.unique() will show us noise and quiet).
Python One-way Repeated Measures ANOVA Example:
In the Statsmodels ANOVA example below we use our dataframe object, df, as the first argument, followed by our independent variable (‘rt’), subject identifier (‘Sub_id’), and the list of the dependend variable, ‘cond’. In the second row we are getting the fit so that we can print the ANOVA table.
aovrm = AnovaRM(df, 'rt', 'Sub_id', within=['cond']) res = aovrm.fit() print(res)Code language: Python (python)
In the second example we will also use the parameter aggregate_func. Click here to download the dataset.
flanks = pd.read_csv('flanks.csv') res = AnovaRM(flanks, 'RT', 'SubID', within=['TrialType'], aggregate_func='mean') print(res.fit())Code language: Python (python)
Two-way ANOVA for Repeated Measures Using Statsmodels
Finally, we continue with the two-way ANOVA. In the example below we are also using Pandas and the AnovaRM class from statsmodels. The example data can be downloaded here.
Two-Way ANOVA Using Statsmodels Example:
Notice the difference between the one-way ANOVA and the two-way ANOVA; the list now contains 2 variables.
df2way = pd.read_csv('rmAOV2way.csv') aovrm2way = AnovaRM(df2way, 'rt', 'Sub_id', within=['iv1', 'iv2']) res2way = aovrm2way.fit() print(res2way)Code language: Python (python)
The ANOVA table when carrying out a two-way ANOVA using Statsmodels look like this:
Note, if you only have two groups and your data is independent you can either carry out a two-sample t-test or a Mann-Whitney U test.
Repeated Measures ANOVA: R vs. Python (YouTube Video)
Finally, here’s the YouTube video covering how to carry out repeated measures ANOVA using Python and R. It will further show some of the differences between the function aov_ez and AnovaRM. Hint, there are more arguments available in aov_ez and it will calculate effect sizes, among other things.
That was it, now you know how to carry out one-way and two-way ANOVA for repeated measures using Python Statsmodels.
Hey, thanks for the awesome tutorials! They have been super helpful. Just wanted to let you know that your link for the github data under the header”Two-way ANOVA for Repeated Measures Using Statsmodels” is the same as the link for the one-way. I was able to find the data for that data set by simply changing the “rmAOV1way” to “rmAOV2way” in the URL, but others might not realize the data isn’t right for the 2 way model.
Hey Andrew! Glad you found them helpful! Thank you for your comment about the example data. I’ve updated the post to point at the right CSV file. Again, thanks!