A common method in experimental psychology is within-subjects designs. One way to analysis the data collected using within-subjects designs are using repeated measures ANOVA. I recently wrote a post on how to conduct a repeated measures ANOVA using Python and rpy2. I wrote that post since the great Python package statsmodels do not include repeated measures ANOVA. However, the approach using rpy2 requires R statistical environment installed. Recently, I found a python library called pyvttbl whith which you can do within-subjects ANOVAs. Pyvttbl enables you to create multidimensional pivot tables, process data and carry out statistical tests. Using the method anova on pyvttbl’s DataFrame we can carry out repeated measures ANOVA using only Python.

Note, pyvttbl is no longer maintained and you should see new post using Pingouin for carrying out repeated Measures ANOVA: Repeated Measures ANOVA in R and Python using afex & pingouin

Why within subject designs?

There are, at least, two of the advantages using within-subjects design. First, more information is obtained from each subject in a within-subjects design compared to a between-subjects design. Each subject is measured in all conditions, whereas in the between-subjects design, each subject is typically measured in one or more but not all conditions. A within-subject design thus requires fewer subjects to obtain a certain level of statistical power. In situations where it is costly to find subjects this kind of design is clearly better than a between-subjects design. Second, the variability in individual differences between subjects is removed from the error term. That is, each subject is his or her own control and extraneous error variance is reduced.

Now, if your working with data from a study with between-subjects design and have only two groups you can carry out two-sample t-test with Python or Mann-Whitney U test in Python.

Repeated measures ANOVA in Python

As you may be aware of, in this tutorial we are going to use an old Python package called pyvttbl to do data analysis. First, we need to install this Python package.

Installing pyvttbl

pyvttbl can be installed using pip:

pip install pyvttblCode language: Bash (bash)

If you are using Linux you may need to add ‘sudo’ before the pip command. This method installs pyvttbl and, hopefully, any missing dependencies. Note, if you decide to work with pyvttbl you need also inteed dto use pip to install specific versions of the dependencies.

Python script

I continue with simulating a response time data set. If you have your own data set you want to do your analysis on you can use the method “read_tbl” to load your data from a CSV-file.

from numpy.random import normal
import pyvttbl as pt
from collections import namedtuple
N = 40
P = ["noise","quiet"]
rts = [998,511]
mus = rts*N
Sub = namedtuple('Sub', ['Sub_id', 'rt','condition'])
df = pt.DataFrame()
for subid in xrange(0,N):
    for i,condition in enumerate(P):
        df.insert(Sub(subid+1,
                     normal(mus[i], scale=112., size=1)[0],
                           condition)._asdict())Code language: Python (python)

Conducting the repeated measures ANOVA with pyvttbl is pretty straight forward. You just take the pyvttbl DataFrame object and use the method anova. The first argument is your dependent variable (e.g. response time), and you specify the column in which the subject IDs are (e.g., sub=’Sub_id’). Finally, you add your within subject factor(s) (e.g., wfactors). wfactors take a list of column names containing your within subject factors. In my simulated data there is only one (e.g. ‘condition’). Note, if your Numpy version is greater than 1.1.x you will have to install an older version. A good way to do this is to run Pyvttbl within a virtual environment (see Step-by-step guide for solving the Pyvttbl Float and NoneType error for a detailed solution both for Linux and Windows users).

aov = df.anova('rt', sub='Sub_id', wfactors=['condition'])
print(aov)Code language: Python (python)

Tests of Within-Subjects Effects

Measure: rt
Source Type III Sum of SquaresεdfMSFSig.η2GObs.SE of x̄±95% CIλObs. Power
conditionSphericity Assumed4209536.4281.0004209536.428309.0930.0004.16540.00019.04237.323317.0191.000
 Greenhouse-Geisser4209536.4281.0001.0004209536.428309.0930.0004.16540.00019.04237.323317.0191.000
 Huynh-Feldt4209536.4281.0001.0004209536.428309.0930.0004.16540.00019.04237.323317.0191.000
 Box4209536.4281.0001.0004209536.428309.0930.0004.16540.00019.04237.323317.0191.000
Error(condition)Sphericity Assumed531140.64639.00013618.991        
 Greenhouse-Geisser531140.6461.00039.00013618.991        
 Huynh-Feldt531140.6461.00039.00013618.991        
 Box531140.6461.00039.00013618.991        

As can be seen in the output table the Sum of Squares used is Type III which is what common statistical software use when calculating ANOVA (the F-statistic) (e.g., SPSS or R-packages such as ‘afex’ or ‘ez’). The table further contains correction in case our data violates the assumption of Sphericity (which in the case of only 2 factors, as in the simulated data, is nothing to worry about). As you can see we also get generalized eta squared as effect size measure and 95 % Confidence Intervals. It is stated in the docstring for the class Anova that standard Errors and 95% confidence intervals are calculated according to Loftus and Masson (1994). Furthermore, generalized eta squared allows comparability across between-subjects and within-subjects designs (see, Olejnik & Algina, 2003).

Conveniently, if you ever want to transform your data you can add the argument transform. There are several options here; log or log10, reciprocal or inverse, square-root or sqrt, arcsine or arcsin, and windsor10. For instance, if you want to use log-transformation you just add the argument “transform=’log'” (either of the previously mentioned methods can be used as arguments in string form):

aovlog = df.anova('rt', sub='Sub_id', wfactors=['condition'], transform='log')Code language: Python (python)

Using pyvttbl we can also analyse mixed-design/split-plot (within-between) data. Doing a split-plot is easy; just add the argument “bfactors=” and a list of your between-subject factors. If you are interested in one-way ANOVA for independent measures see my newer post: Four ways to conduct one-way ANOVAS with Python.

Finally, I created a function that extracts the F-statistics, Mean Square Error, generalized eta squared, and the p-value the results obtained with the anova method. It takes a factor as a string, a ANOVA object, and the values you want to extract. Keys for your different factors can be found using the key-method (e.g., aov.keys()).

def extract_for_apa(factor, aov, values = ['F', 'mse', 'eta', 'p']):
    results = {}
    for key,result in aov[(factor,)].iteritems():
        if key in values:
            results[key] = result
    return resultsCode language: Python (python)

Note, the table with the results in this post was created with the private method _within_html. To create an HTML table you will have to import SimpleHTML:

import SimpleHTML

output = SimpleHTML.SimpleHTML('Title of your HTML-table')
aov._within_html(output)
output.write('results_aov.html')Code language: Python (python)

That was all. There are at least one downside with using pyvttbl for doing within-subjects analysis in Python (ANOVA). Pyvttbl is not compatible with Pandas DataFrame which is commonly used. However, this may not be a problem since pyvttbl, as we have seen, has its own DataFrame method. There are also a some ways to aggregate and visualizing data using Pyvttbl. Another downside is that it seems like Pyvttbl no longer is maintained.

References

Loftus, G.R., & Masson, M.E. (1994). Using confidence intervals in within-subjects designs. The Psychonomic Bulletin & Review, 1(4), 476-490.
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychological Methods, 8(4), 434–47. http://doi.org/10.1037/1082-989X.8.4.434

Share via
Copy link
Powered by Social Snap