Press "Enter" to skip to content

Month: August 2019

Python MANOVA Made Easy using Statsmodels

In previous posts, we learned how to use Python to detect group differences on a single dependent variable. However, there may be situations in which we are interested in several dependent variables. In these situations, the simple ANOVA model is inadequate.

One way to examine multiple dependent variables using Python would, of course, be to carry out multiple ANOVA. That is, one ANOVA for each of these dependent variables. However, the more tests we conduct on the same data, the more we inflate the family-wise error rate (the greater chance of making a Type I error).

This is where MANOVA comes in handy. MANOVA, or Multivariate Analysis of Variance, is an extension of Analysis of Variance (ANOVA). However, when using MANOVA we have two, or more, dependent variables.

MANOVA and ANOVA is similar when it comes to some of the assumptions. That is, the data have to be:

  • normally distributed dependent variables
  • equal covariance matrices)

In this post will learn how carry out MANOVA using Python (i.e., we will use Pandas and Statsmodels). Here, we are going to use the Iris dataset which can be downloaded here.

The Easiest Data Cleaning Method using Python & Pandas

In this post, we are going to learn how to do simplify our data preprocessing work using the Python package Pyjanitor. More specifically, we are going to learn how to:

  • Add a column to a Pandas dataframe
  • Remove missing values
  • Remove an empty column
  • Cleaning up column names

That is, we are going to learn how clean Pandas dataframes using Pyjanitor. In all Python data manipulation examples, here we are also going to see how to carry out them using only Pandas functionality.

Repeated Measures ANOVA in R and Python using afex & pingouin

In this post, we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA for within-subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons of using R or Python for data analysis (i.e., ANOVA).

How to Make a Scatter Plot in Python using Seaborn

Data visualization is a big part of the process of data analysis. In this post, we will learn how to make a scatter plot using Python and the package Seaborn. In detail, we will learn how to use the Seaborn methods scatterplot, regplot, lmplot, and pairplot to create scatter plots in Python.

More specifically, we will learn how to make scatter plots, change the size of the dots, change the markers, the colors, and change the number of ticks.  Furthermore, we will learn how to plot a regression line, add text, plot a distribution on a scatter plot, among other things. Finally, we will also learn how to save Seaborn plots in high resolution. That is, we learn how to make print-ready plots.