Press "Enter" to skip to content

Four ways to conduct one-way ANOVA with Python

9 Data Visualization Techniques You Should Learn in PythonThe current post will focus on how to carry out between-subjects ANOVA using Python. As mentioned in an earlier post (Repeated measures ANOVA with Python) ANOVAs are commonly used in Psychology.

We start with some brief introduction on theory of ANOVA. If you are more interested in the four methods to carry out one-way ANOVA with Python click here. ANOVA is a means of comparing the ratio of systematic variance to unsystematic variance in an experimental study. Variance in the ANOVA is partitioned in to total variance, variance due to groups, and variance due to individual differences.

Python ANOVA - theory - partitioning of the sum of squares (i.e., the variance)
Partioning of Variance in the ANOVA. SS stands for Sum of Squares.

The ratio obtained when doing this comparison is known as the F-ratio. A one-way ANOVA can be seen as a regression model with a single categorical predictor. This predictor usually has two plus categories. A one-way ANOVA has a single factor with J levels. Each level corresponds to the groups in the independent measures design.

The general form of the model, which is a regression model for a categorical factor with J levels, is:

y_i = b_0+b_1X_{1,i} +...+b_{j-1,i} + e_i

There is a more elegant way to parametrize the model. In this way the group means are represented as deviations from the grand mean by grouping their coefficients under a single term.  I will not go into detail on this equation:

y_{ij} = \mu_{grand} + \tau_j + \varepsilon_{ij}

As for all parametric tests the data need to be normally distributed (each groups data should be roughly normally distributed) for the F-statistic to be reliable. Each experimental condition should have roughly the same variance (i.e., homogeneity of variance), the observations (e.g., each group) should be independent, and the dependent variable should be measured on, at least,  an interval scale.

ANOVA using Python

In the four examples in this tutorial we are going to use the dataset “PlantGrowth” that originally was available in R but can be downloaded using this link: PlantGrowth. In the first three examples we are going to use Pandas DataFrame. All three Python ANOVA examples below are using Pandas to load data from a CSV file. Note, we can also use Pandas read excel if we have our data in an Excel file (e.g., .xlsx).

import pandas as pd
datafile = "PlantGrowth.csv"
data = pd.read_csv(datafile)

#Create a boxplot
data.boxplot('weight', by='group', figsize=(12, 8))

ctrl = data['weight'][ == 'ctrl']

grps = pd.unique(
d_data = {grp:data['weight'][ == grp] for grp in grps}

k = len(pd.unique(  # number of conditions
N = len(data.values)  # conditions times participants
n = data.groupby('group').size()[0] #Participants in each condition
Boxplot of the different groups in our ANOVA with Python example
BoxPlot of Plantgrowth data

Judging by the Boxplot there are differences in the dried weight for the two treatments. However, easy to visually determine whether the treatments are different to the control group.

 Using SciPy

We start with using SciPy and its method f_oneway from stats.

from scipy import stats

F, p = stats.f_oneway(d_data['ctrl'], d_data['trt1'], d_data['trt2'])

One problem with using SciPy is that  following APA guidelines we should also effect size (e.g., eta squared) as well as Degree of freedom (DF). DFs needed for the example data is easily obtained

DFbetween = k - 1
DFwithin = N - k
DFtotal = N - 1

However, if we want to calculate eta-squared we need to do some more computations. Thus, the next section will deal with how to calculate a one-way ANOVA using the Pandas DataFrame and Python code.

Calculating using Python (i.e., pure Python ANOVA)

A one-way ANOVA is quite easy to calculate so below I am going to show how to do it. First, we need  to calculate the sum of squares between (SSbetween), sum of squares within (SSwithin), and sum of squares total (SSTotal).

Sum of Squares Between

We start with calculating the Sum of Squares between. Sum of Squares Between is the variability due to interaction between the groups. Sometimes known as the Sum of Squares of the Model.

SSbetween = \frac{\sum(\sum k_i) ^2} {n} - \frac{T^2}{N}

SSbetween = (sum(data.groupby('group').sum()['weight']**2)/n) \
    - (data['weight'].sum()**2)/N

How to Calculate Sum of Squares Within

The variability in the data due to differences within people. The calculation of Sum of Squares Within can be carried out according to this formula:

SSwithin = \sum Y^2 - \frac{\sum (\sum a_i)^2}{n}

sum_y_squared = sum([value**2 for value in data['weight'].values])
SSwithin = sum_y_squared - sum(data.groupby('group').sum()['weight']**2)/n

Calculation of Sum of Squares Total

Sum of Squares Total will be needed to calculate eta-squared later. This is the total variability in the data.

SStotal = \sum Y^2 - \frac{T^2}{N}

SStotal = sum_y_squared - (data['weight'].sum()**2)/N

How to Calculate Mean Square Between

Mean square between is the sum of squares within divided by degree of freedom between.

MSbetween = SSbetween/DFbetween

Calculation of Mean Square Within

Mean Square within is also an easy calculation;

MSwithin = SSwithin/DFwithin

Calculating the F-value

F = MSbetween/MSwithin

To reject the null hypothesis we check if the obtained F-value is above the critical value for rejecting the null hypothesis. We could look it up in a F-value table based on the DFwithin and DFbetween. However, there is a method in SciPy for obtaining a p-value.

p = stats.f.sf(F, DFbetween, DFwithin)

Finally, we are also going to calculate effect size. We start with the commonly used eta-squared (η² ):

eta_sqrd = SSbetween/SStotal

However, eta-squared is somewhat biased because it is based purely on sums of squares from the sample. No adjustment is made for the fact that what we aiming to do is to estimate the effect size in the population. Thus, we can use the less biased effect size measure Omega squared:

om_sqrd = (SSbetween - (DFbetween * MSwithin))/(SStotal + MSwithin)

The results we get from both the SciPy and the above method can be reported according to APA style; F(2, 27) = 4.846, p =  .016, η² =  .264. If you want to report Omega Squared: ω2 = .204

Using Statsmodels

The third method, using Statsmodels, is also easy. We start by using ordinary least squares method and then the anova_lm method. Also, if you are familiar with R-syntax. Statsmodels have a formula api where your model is very intuitively formulated. First, we import the api and the formula api. Second we, use ordinary least squares regression with our data. The object obtained is a fitted model that we later use with the anova_lm method to obtaine a ANOVA table.

import statsmodels.api as sm
from statsmodels.formula.api import ols

mod = ols('weight ~ group',
aov_table = sm.stats.anova_lm(mod, typ=2)
print aov_table

Output table:

sum_sq df F PR(>F)
group 3.76634 2 4.846088 0.01591
Residual 10.49209 27

Note, no effect sizes is calculated when we use Statsmodels.  To calculate eta squared we can use the sum of squares from the table:

esq_sm = aov_table['sum_sq'][0]/(aov_table['sum_sq'][0]+aov_table['sum_sq'][1])

Using pyvttbl anova1way

We can also use the method anova1way from the python package pyvttbl. This package also has a DataFrame method. We have to use this method instead of Pandas DataFrame to be able to carry out the one-way ANOVA.  Note, Pyvttbl is old and outdated. It requires Numpy to be at most version 1.1.x or else you will run in to an error ( “unsupported operand type(s) for +: ‘float’ and ‘NoneType’”).

This can, of course, be solved by downgrading Numpy (see my solution using a  virtual environment Step-by-step guide for solving the Pyvttbl Float and NoneType error).

from pyvttbl import DataFrame

aov_pyvttbl = df.anova1way('weight', 'group')
print aov_pyvttbl

Output anova1way

Anova: Single Factor on weight

Groups   Count    Sum     Average   Variance 
ctrl        10   50.320     5.032      0.340 
trt1        10   46.610     4.661      0.630 
trt2        10   55.260     5.526      0.196 

Source of Variation    SS     df    MS       F     P-value   eta^2   Obs. power 
Treatments            0.977    2   0.489   1.593     0.222   0.106        0.306 
Error                 8.281   27   0.307                                        
Total                 9.259   29                                                

Source of Variation     SS     df    MS       F     P-value   eta^2   Obs. power 
Treatments             3.766    2   1.883   4.846     0.016   0.264        0.661 
Error                 10.492   27   0.389                                        
Total                 14.258   29                                                


Tukey HSD: Table of q-statistics
       ctrl     trt1       trt2   
ctrl   0      1.882 ns   2.506 ns 
trt1          0          4.388 *  
trt2                     0        
  + p < .10 (q-critical[3, 27] = 3.0301664694)
  * p < .05 (q-critical[3, 27] = 3.50576984879)
 ** p < .01 (q-critical[3, 27] = 4.49413305084)

We get a lot of more information using the anova1way method. Maybe of particular interest here is that we get results from a post-hoc test (i.e., Tukey HSD).  Whereas the ANOVA only lets us know that there was a significant effect of treatment the post-hoc analysis reveal where this effect may be (between which groups).

That is it! In this tutorial you learned 4 methods that let you carry out one-way ANOVAs using Python. There are, of course, other ways to deal with the tests between the groups (e.g., the post-hoc analysis). One could carry out Multiple Comparisons (e.g., t-tests between each group. Just remember to correct for familywise error!) or Planned Contrasts.  In conclusion, doing ANOVAs in Python is pretty simple.


  1. Bunny Bunny

    Heck of a job there, it aboesutlly helps me out.

  2. Umit Umit

    Thank you for your effort, very clearly set.

    However, I am hitting a problem using ANOVA1Way, I wonder if you have any suggestions. When I make a copy of PlantGrowth.csv and type in new numbers for “weight” and then run your code, I get:

    Error: new-line character seen in unquoted field – do you need to open the file in universal-newline mode?

    Thanks and Regards

  3. Joel Verezhak Joel Verezhak

    Hi Erik,

    thanks for the great post. I wanted to offer an update to part 2 (python based ANOVA) for when the groups have different sample sizes.

    First, rewrite the calculation for n:
    n = data.groupby(var).size().values

    Then the calculation for SSbetween and SSwithin needs to be modified:
    SSbetween = (sum(data.groupby(var).sum()[‘LogSalePrice’].values**2/n)) – (data[‘LogSalePrice’].sum()**2)/N
    SSwithin = sum_y_squared – sum(data.groupby(var).sum()[‘LogSalePrice’].values**2/n)

    It just takes the division by n (element-wise) inside the outer sum in both cases.

    I tested this by comparing with the output from f_oneway and it seems to work. It should also generalize well to the case where n is the same for all groups.

    Thanks again for the write-up!

    • Hi Joel,

      thanks for your comment and thanks for the update! I’ll add this to the post (with a reference to your comment, of course).


  4. Andrés Vargas Andrés Vargas

    Thanks for your post… It was super useful for me

  5. Hi Erik!

    Thank you for the post. I’ve been working recently on a Python stats package that implements several ANOVA-related functions and post-hocs tests. Just thought I’d mention it in case this would turn useful to you or others:

    All the best,

    • Hey Raphael,

      This looks really interesting! Will install this later today and play around with it. I might just add it to one of my posts listing useful Python packages. We’ll see!

      Maybe I’ll also update this post (or write a new one). I’ll send you an email, if I do.

      Thanks for letting us know about the package,



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: