In this post, we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA for within-subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons of using R or Python for data analysis (i.e., ANOVA).
Python programming related stuff
In the posts in this category you will find Python scripts. Python is said to be one of the easiest programming language to learn. Learning one language will also make it easier to learn another, much more advanced, one. As a Bachelor student in the cognitive science programme I got to take Python courses. However, it was not before I started my Ph.D years that I realized how much use I had because I knew some programming.
For a psychology researcher Python might be ideal since it is relatively easy to learn and there is a huge Python community to get help from. How to build experiments using free and open-source tools such as PsychoPy, OpenSesame, Expyriment, and PyEPL is, for instance, something you could find in this category.
Data visualization is a big part of the process of data analysis. In this post, we will learn how to make a scatter plot using Python and the package Seaborn. In detail, we will learn how to use the Seaborn methods scatterplot, regplot, lmplot, and pairplot to create scatter plots in Python.
- If you are interested in learning about more Python data visualization methods see the post “9 Data Visualization Techniques You Should Learn in Python“.
More specifically, we will learn how to make scatter plots, change the size of the dots, change the markers, the colors, and change the number of ticks. Furthermore, we will learn how to plot a regression line, add text, plot a distribution on a scatter plot, among other things. Finally, we will also learn how to save Seaborn plots in high resolution. That is, we learn how to make print-ready plots.
In this post, we will learn how to read and write JSON files using Python. In the first part, we are going to use the Python package json to create and read a JSON file as well as write a JSON file. After that, we are going to use Pandas read_json method to load JSON files into Pandas dataframe. Here, we will learn how to read from a JSON file locally and from an URL as well as how to read a nested JSON file using Pandas.
Finally, as a bonus, we will also learn how to manipulate data in Pandas dataframes, rename columns, and plot the data using Seaborn.
In this post, we will learn how to create a binder so that our data analysis, for instance, can be fully reproduced by other researchers. That is, here we will learn how to use a tool called Binder for reproducible research.
In previous posts, we have learned how to carry out data analysis (e.g., ANOVA) and data visualization (e.g., Raincloud plots) using Python. The code we have used have been uploaded in the forms of Jupyter Notebooks.
For users of R Statistical Environment;
Although this is great, we also need to make sure that we share our computational environment so that our code can be re-run and produce the same output. That is, to have a fully reproducible example, we need a way to capture the different versions of the Python packages we’re using.
With ever-increasing volume of data, it is impossible to tell stories without visualizations. Data visualization is an art of how to turn numbers into useful knowledge. Using Python we can learn how to create data visualizations and present data in Python using the Seaborn package.
In this post we are going to learn how to create the following 9 plots:
- Scatter Plot
- Bar Plot
- Time Series Plot
- Box Plot
- Heat Map
- Violin Plot
- Raincloud Plot
Python Data Visualization Tutorial: Seaborn
As previously mentioned in this Python Data Visualization tutorial we are mainly going to use Seaborn but also Pandas, and Numpy. However, to create the Raincloud Plot we are going to have to use the Python package ptitprince.
Learn about probabilistic programming in this guest post by Osvaldo Martin, a researcher at The National Scientific and Technical Research Council of Argentina (CONICET) and author of Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition.
This post is based on an excerpt from the second chapter of the book that I have slightly adapted so it’s easier to read without having read the first chapter.
In this Pandas groupby tutorial, we are going to learn how to organize Pandas dataframes by groups. More specifically, we are going to learn how to group by one and multiple columns.
Furthermore, we are going to learn how to calculate some basics summary statistics (e.g., mean, median), convert Pandas groupby to dataframe, calculate the percentage of observations in each group, and many more useful things.
- More about working with Pandas: Pandas Dataframe Tutorial
First of all we are going to import pandas as pd, and read a CSV file, using the read_csv method, to a dataframe. In the example below, we use index_col=0 because the first row in the dataset is the index column.
In this post we are going to learn how to explore data using Python, Pandas, and Seaborn. The data we are going to explore is data from a Wikipedia article. In this post we are actually going to learn how to parse data from a URL using Python Pandas. Furthermore, we are going to explore the scraped data by grouping it and by Python data visualization. More specifically, we will learn how to count missing values, group data to calculate the mean, and then visualize relationships between two variables, among other things.
In previous posts we have used Pandas to import data from Excel and CSV files. In this post, however, we are going to use Pandas read_html, because it has support for reading data from HTML from URLs (https or http). To read HTML Pandas use one of the Python libraries LXML, Html5Lib, or BeautifulSoup4. This means that you have to make sure that at least one of these libraries are installed. In the specific Pandas read_html example here, we use BeautifulSoup4 to parse the html tables from the Wikipedia article.
In the first section, we will go through, with examples, how to read a CSV file, how to read specific columns from a CSV, how to read multiple CSV files and combine them to one dataframe, and, finally, how to convert data according to specific datatypes (e.g., using Pandas read_csv dtypes).
In the last section, we will continue by learning how to use Pandas to write CSV files. That is, we will learn how to export dataframes to CSV files.
In this tutorial, we will learn how to use Pandas sample to randomly select rows and columns from a Pandas dataframe. There are some reasons for randomly sample our data; for instance, we may have a very large dataset and want to build our models on a smaller sample of the data. Other examples are when carrying out bootstrapping or cross-validation. Here we will learn how to; select rows at random, set a random seed, sample by group, using weights, and conditions, among other useful things.