Data Analysis

On this page, you will find the Python tutorials related to carrying out statistical analysis. For instance, you will find a range of guides on how to do Python ANOVA, Bayesian analysis, and so on. Most of the posts, on data analysis in Python, use the packages statsmodels, NumPy, SciPy, and Pandas.

Checking the Assumptions

All of the statistical tests that you can carry out in Python have a number of assumptions. For example, the t-test and analysis of variance (ANOVA) tests all require the data to be normally distributed. Furthermore, the independent sample t-test as well as the between-group ANOVA also requires the dependent variable(s) to have equal variance. You can use SciPy and Pingouin to run Levene’s test and Bartlett’s test in Python. These two tests will let you see if your data holds for the homogeneity of variances assumption.

Correlation Analysis

Correlation analysis is a statistical method that you can use to examine the strength of the relationship between two, or many, quantitative variables. If we get a high correlation (e.g., close to 1 or -1) it means that two or more of our variables have a strong relationship with each other. On the other hand, a weak correlation means that the variables are hardly related.

Correlation Matrix in Python

Furthermore, the calculation of correlation coefficients can be used as input in other methods you may want to carry out (e.g., factor analysis). If you’re a NumPy or Pandas user it’s easy to compute a correlation matrix in Python, so make sure to check that post out.

T-test

There are, of course, a range of Python packages that enables us to carry out t-test in Python. In a recent post, you will find a guide on how to do a two-sample t-test in Python using Scipy, Statsmodels, and Pingouin. Furthermore, you can have a look at the YouTube video about independent (two) samples and paired samples t-test. In a more recent post, you can find a more elaborate explanation of how to carry out the paired sample t-test in Python.

Non-Parametric Tests

If your data does not e.g. are normally distributed you can use non-parametric tests. Right now, there’s a blog post in which you can learn how to carry out Mann-Whitney U-test in Python.

Python ANOVA

In this section, you will find the tutorials on how to carry out ANOVA in Python.

Between Subjects ANOVA in Python

In the first post, you will learn four methods to do Python ANOVA. Specifically, you will learn how to carry out one-way ANOVA in Python as well as some theory behind it. Furthermore, you will learn post-hoc analysis among other things.

In the second, post you will learn three ways to carry out ANOVA in Python (see here). Specifically, you will learn how to carry out two-way ANOVA using Python.

Within Subjects (Repeated Measures) ANOVA in Python

In this section, you will find the blog posts about how to carry out repeated measures ANOVA in Python. This kind of ANOVA is simple to carry out using both pyvttbl and pingouin.

One-Way ANOVA for Repeated Measures

First, you will learn how to carry out within-subjects ANOVA in Python using the package rpy2. That is, you will learn how to use r-packages from Python to do data analysis.

Second, you will learn about repeated measures ANOVA in Python using the packages pyvttbl, statsmodels, and pingouin. Note, the links on the previous sentence all lead to the posts about using these packages. However, I’d suggest that you focus on using statsmodels or pingouin because pyvttbl is not maintained anymore.

Two-Way ANOVA for Repeated Measures

Third, you will learn how to carry out two-way ANOVA for repeated measures in Python.

MANOVA in Python

There’s also a blog post about how to carry out Multivariate Analysis of Variance (MANOVA) in Python. In this post, you will learn how to use statsmodels to do a Python MANOVA. This technique may, furthermore, be useful if you have multivariate data (multiple dependent variables).

Probabilistic Programming in Python

In the post, probabilistic programming in Python, you will how to use PyMC3 and ArviZ, to do Bayesian statistics in Python. Note that this post is based on an excerpt from the second chapter of a book (see the post). Specifically, you will learn how to install and use PyMC3 and ArviZ.