Repeated Measures ANOVA in Python using Statsmodels

16 Shares

In this brief Python data analysis tutorial, we will learn how to carry out repeated measures ANOVA in Python using Statsmodels. More specifically, we will learn how to use the AnovaRM class from statsmodels anova module.

Outline

The outline of the post is as follows. We will explore the methodology of conducting a Repeated Measures Analysis of Variance (ANOVA) using the AnovaRM function in Python. This comprehensive guide will cover both one-way and two-way ANOVA for repeated measures, showcasing the versatility of Statsmodels in handling such analyses.

In the first section, we will implement one-way ANOVA for repeated measures using Statsmodels. Next, we will explore the application of two-way ANOVA for repeated measures in Python. By analyzing a dataset with multiple factors and repeated measures, we will showcase the power of Statsmodels in handling complex experimental designs.

To provide a comprehensive perspective, we will include a YouTube video comparing the process of conducting Repeated Measures ANOVA in both Python and R. This comparative analysis will shed light on the differences and similarities between the two popular programming languages.

By delving into the AnovaRM function, we will discover the insights it offers into within-subject variations, interactions, and overall statistical significance. This post aims to equip you with the knowledge and skills to perform repeated measures ANOVA in Python confidently and precisely.

Prerequisites

Before getting into Repeated Measures ANOVA using Statsmodels in Python, there are a few requirements to ensure a smooth learning experience. First and foremost, make sure that you have both Statsmodels and Pandas installed in your Python environment. One easy way to install these Python packages is to use a Python distribution such as Anaconda (see this YouTube Video on how to install Anaconda). However, if you already have Python installed, you can, of course, use Pip.

Before proceeding, ensure that you have the latest version of pip installed. Upgrading pip can be done by running the following command in your terminal or command prompt:

``````pip install --upgrade pip
```Code language: Python (python)```

How to Install Statsmodels & Pandas

Statsmodels and Panda can easily be installed using pip:

``pip install pandas statsmodels`Code language: Bash (bash)`

Now, if a newer version of pip is available, and you want to get that version, here’s a blog post about how to upgrade pip.

``````pip install statsmodels==0.12.2
pip install pandas==1.3.0
```Code language: Python (python)```

With these prerequisites in place, you will be well-equipped to learn how to do Repeated Measures ANOVA using Python and use the powerful classes of Statsmodels for advanced statistical analysis.

How to Use AnovaRM to Carry Out a Repeated Measures ANOVA

In this section, we will learn how to do repeated measures ANOVA using Statsmodels. More specifically, we will learn how to carry out one-way ANOVA and two-way ANOVA in Python. The AnovaRM class takes five arguments:

• data: the first argument should be a dataframe object.
• depvar: second should variable should be your dependent variable. Should be a string (e.g., ‘responsetime’)
• subject: here you put in your subject identifier. Should also be a string (e.g., ‘subject’)
• within:  the within-subject factors in a lit of strings.
• aggregate_func: this is optional and should be use if the data contains more than a single observation per participant. Can be “mean” or a function. For instance, you can use Numpy mean (i.e., np.mean).

Note, if you only have two pairs of matched values (i.e., only to levels of a factor) you can instead use Python to carry out the paired sample t-test.

One-way ANOVA for Repeated Measures Using Statsmodels

First, we start with the one-way ANOVA. The examples below will use Pandas and the AnovaRM class from statsmodels. In the first example, we use Pandas to use read_csv to load this data into a dataframe. See my Python Pandas Dataframe tutorial to learn more about Pandas dataframes.

``````import pandas as pd
from statsmodels.stats.anova import AnovaRM

df = pd.read_csv('rmAOV1way.csv')```Code language: Python (python)```

We can use Pandas head() to have a look at the first five rows (i.e., df.head()):

As seen above, we have the columns Sub_id, rt, and cond. These columns represent the subject identifier, dependent, and independent variable. Note there are two levels of cond (using df.cond.unique() will show us noise and quiet).

Python One-way Repeated Measures ANOVA Example:

In the Statsmodels ANOVA example below we use our dataframe object, df, as the first argument, followed by our independent variable (‘rt’), subject identifier (‘Sub_id’), and the list of the dependend variable, ‘cond’. In the second row, we are getting the fit to print the ANOVA table.

``````aovrm = AnovaRM(df, 'rt', 'Sub_id', within=['cond'])
res = aovrm.fit()

print(res)```Code language: Python (python)```

``````flanks = pd.read_csv('flanks.csv')
res = AnovaRM(flanks, 'RT', 'SubID', within=['TrialType'], aggregate_func='mean')

print(res.fit())```Code language: Python (python)```

If your data is not normally distributed, you should consider transforming it to a normal shape.

How to carry out repeated measures ANOVA using other Python packages:

Two-way ANOVA for Repeated Measures Using Statsmodels

Finally, we continue with the two-way ANOVA. The example below uses Pandas and the AnovaRM class from statsmodels. The example data can be downloaded here.

Two-Way ANOVA Using Statsmodels Example:

Notice the difference between the one-way ANOVA and the two-way ANOVA; the list now contains 2 variables.

``````df2way = pd.read_csv('rmAOV2way.csv')
aovrm2way = AnovaRM(df2way, 'rt', 'Sub_id', within=['iv1', 'iv2'])
res2way = aovrm2way.fit()

print(res2way)```Code language: Python (python)```

The ANOVA table, when carrying out a two-way ANOVA using statsmodels look like this:

Note, if you only have two groups and your data is independent, you can carry out a two-sample t-test or a Mann-Whitney U test.

Repeated Measures ANOVA: R vs. Python (YouTube Video)

Finally, here’s the YouTube video covering how to carry out repeated measures ANOVA using Python and R. It will further show some differences between the function aov_ez and AnovaRM. Hint, more arguments are available in aov_ez, and it will calculate effect sizes, among other things.

Conclusion

In conclusion, this post has provided a comprehensive guide to conducting Repeated Measures ANOVA in Python using the powerful Statsmodels library. We explored the implementation of one-way and two-way ANOVA for repeated measures, gaining valuable insights into the analysis of within-subjects designs.

With Python and Statsmodels, we can efficiently carry out Repeated Measures ANOVA, facilitating data-driven decision-making and enhancing the understanding of experimental designs with within-subjects designs. Leveraging the flexibility and capabilities of Python, users can perform sophisticated statistical analyses on complex datasets, leading to more informed conclusions and deeper insights into the factors influencing experimental outcomes.

The comparison of Repeated Measures ANOVA in Python and R through the provided YouTube video further enriches the post, highlighting each approach’s strengths and nuances and empowering users to make informed choices based on their preferences and requirements.

If you found this post insightful and helpful, please share it on social media. By spreading the knowledge of, you contribute to the growth of the data science community. Finally, I welcome your feedback, suggestions, and comments below, as they help us improve our content and cater to my readers’ specific needs.

Resources

Here are some Python tutorials:

16 Shares

2 thoughts on “Repeated Measures ANOVA in Python using Statsmodels”

1. Hey, thanks for the awesome tutorials! They have been super helpful. Just wanted to let you know that your link for the github data under the header”Two-way ANOVA for Repeated Measures Using Statsmodels” is the same as the link for the one-way. I was able to find the data for that data set by simply changing the “rmAOV1way” to “rmAOV2way” in the URL, but others might not realize the data isn’t right for the 2 way model.

1. Hey Andrew! Glad you found them helpful! Thank you for your comment about the example data. I’ve updated the post to point at the right CSV file. Again, thanks!

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top