Press "Enter" to skip to content

Tag: ANOVA

Python MANOVA Made Easy using Statsmodels

In previous posts, we learned how to use Python to detect group differences on a single dependent variable. However, there may be situations in which we are interested in several dependent variables. In these situations, the simple ANOVA model is inadequate.

One way to examine multiple dependent variables using Python would, of course, be to carry out multiple ANOVA. That is, one ANOVA for each of these dependent variables. However, the more tests we conduct on the same data, the more we inflate the family-wise error rate (the greater chance of making a Type I error).

This is where MANOVA comes in handy. MANOVA, or Multivariate Analysis of Variance, is an extension of Analysis of Variance (ANOVA). However, when using MANOVA we have two, or more, dependent variables.

MANOVA and ANOVA is similar when it comes to some of the assumptions. That is, the data have to be:

  • normally distributed dependent variables
  • equal covariance matrices)

In this post will learn how carry out MANOVA using Python (i.e., we will use Pandas and Statsmodels). Here, we are going to use the Iris dataset which can be downloaded here.

Repeated Measures ANOVA in R and Python using afex & pingouin

In this post we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA f or within subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons using R or Python for data analysis (i.e., ANOVA).

Repeated Measures ANOVA in Python using Statsmodels

In this brief Python data analysis tutorial we will learn how to carry out a repeated measures ANOVA using Statsmodels. More specifically, we will learn how to use the AnovaRM class from statsmodels anova module.

To follow this guide you will need to have Python, Statsmodels, Pandas, and their dependencies installed. One easy way to get these Python packages installed is to install a Python distribution such as Anaconda (see this YouTube Video on how to install Anaconda). However, if you already have Python installed you can of course use Pip.

Step-by-step guide for solving the Pyvttbl Float and NoneType error

In this short post I will show you a quick fix for the error “unsupported operand type(s) for +: ‘float’ and ‘NoneType’” with Pyvttbl. In earlier posts I have showed how to carry out ANOVA using Pyvttbl (among other packages. See posts 1, 2, 3, and 4 for ANOVA using pyvttbl).

However,  Pyvttbl is not compatible with Python versions greater 1.11 (e.g., 1.12.0, that I am running). This may, of course, be due to that Pyvttbl have not been updated in quite some time.

My solution to this problem involves setting up a Python virtual environment (the set up of the virtual environment it is based on the Hitchikers Guide to Python). You will learn how to set up the virtual environment in Linux and Windows.

Two-way ANOVA for repeated measures using Python

Previously I have shown how to analyze data collected using within-subjects designs using rpy2 (i.e., R from within Python) and Pyvttbl. In this post I will extend it into a factorial ANOVA using Python (i.e., Pyvttbl). In fact, we are going to carry out a Two-way ANOVA but the same method will enable you to analyze any factorial design. I start with importing the Python libraries that  are going to be use.

Three ways to do a two-way ANOVA with Python

In an earlier post I showed four different techniques that enables one-way analysis of variance (ANOVA) using Python.  In this post we are going to learn how to do two-way ANOVA for independent measures using Python.

First, we ar going to learn how to calculate the ANOVA table “by hand”. Second, we are going to use Statsmodels and, third, we carry out the ANOVA in Python using pyvttbl. Finally, as a bonus, we will also use Pingouin Stats, a newer Python package.

An important advantage of the two-way ANOVA is that it is more efficient compared to the one-way. There are two assignable sources of variation – supp and dose in our example – and this helps to reduce error variation thereby making this design more efficient.

Two-way ANOVA (factorial) can be used to, for instance, compare the means of populations that are different in two ways. It can also be used to analyse the mean responses in an experiment with two factors. Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same time.

One can also test for independence of the factors provided there are more than one observation in each cell. The only restriction is that the number of observations in each cell has to be equal (there is no such restriction in case of one-way ANOVA).

Four Ways to Conduct One-Way ANOVA with Python

The current post will focus on how to carry out between-subjects ANOVA in Python. As mentioned in an earlier post (Repeated measures ANOVA with Python) ANOVAs are commonly used in Psychology. We start with some brief introduction on theory of ANOVA.

If you are more interested in the four methods to carry out one-way ANOVA with Python click here.

In this post we will learn how to carry out ANOVA using SciPy, calculating it “by hand” in Python, using Statsmodels, and Pyvttbl. 

Update: the Python package Pyvttbl is not maintained since a couple of year but there’s a new package called Pingouin. As a bonus, how to use this package is added in the end of the post.

Introduction to ANOVA

Before we learn how to do ANOVA in Python, we are briefly discussing what ANOVA is. ANOVA is a means of comparing the ratio of systematic variance to unsystematic variance in an experimental study. Variance in the ANOVA is partitioned in to total variance, variance due to groups, and variance due to individual differences.

Python ANOVA - theory - partitioning of the sum of squares (i.e., the variance)
Partioning of Variance in the ANOVA. SS stands for Sum of Squares.

Repeated measures ANOVA using Python

A common method in experimental psychology is within-subjects designs. One way to analysis the data collected using within-subjects designs are using repeated measures ANOVA. I recently wrote a post on how to conduct a repeated measures ANOVA using Python and rpy2. I wrote that post since the great Python package statsmodels do not include repeated measures ANOVA. However, the approach using rpy2 requires R statistical environment installed. Recently, I found a python library called pyvttbl whith which you can do within-subjects ANOVAs. Pyvttbl enables you to create multidimensional pivot tables, process data and carry out statistical tests. Using the method anova on pyvttbl’s DataFrame we can carry out repeated measures ANOVA using only Python.

Note, pyvttbl is no longer maintained and you should see new post using Pingouin for carrying out repeated Measures ANOVA: Repeated Measures ANOVA in R and Python using afex & pingouin