# Erik Marsja

In a previous post, we learned how to use Binder and Python for reproducible research. Now we are going to learn how to create a Binder for our data analysis in R, so it can be fully reproduced by other researchers. More specifically, in this post we will learn how to use Binder for reproducible research.

Many researchers upload their code for data analysis and visualization using git (e.g., to GitHub, Gitlab).

No doubt, uploading your R scripts is great. However, we also need to make sure that we share the complete computational environment so that our code can be re-run and so that others can reproduce the results. That is, to have a fully reproducible example, we need a way to capture the different versions of the R packages we were using, at that particular time.

In this post we are going to learn 1) how to read SPSS (.sav) files in Python, and 2) how to write to SPSS (.sav) files using Python.

Python is a great general-purpose language as well as for carrying out statistical analysis and data visualization. However, Python is not really user-friendly for data storage. Thus, often our data will be archived using Excel, SPSS or similar software.

In previous posts, we learned how to use Python to detect group differences on a single dependent variable. However, there may be situations in which we are interested in several dependent variables. In these situations, the simple ANOVA model is inadequate.

One way to examine multiple dependent variables using Python would, of course, be to carry out multiple ANOVA. That is, one ANOVA for each of these dependent variables. However, the more tests we conduct on the same data, the more we inflate the family-wise error rate (the greater chance of making a Type I error).

This is where MANOVA comes in handy. MANOVA, or Multivariate Analysis of Variance, is an extension of Analysis of Variance (ANOVA). However, when using MANOVA we have two, or more, dependent variables.

MANOVA and ANOVA is similar when it comes to some of the assumptions. That is, the data have to be:

• normally distributed dependent variables
• equal covariance matrices)

In this post will learn how carry out MANOVA using Python (i.e., we will use Pandas and Statsmodels). Here, we are going to use the Iris dataset which can be downloaded here.

In this post we are going to learn how to do simplify our data preprocessing work using the Python package Pyjanitor. More specifically, we are going to learn how to:

• Add a column to a Pandas dataframe
• Remove missing values
• Remove an empty column
• Cleaning up column names

That is, we are going to learn how clean Pandas dataframes using Pyjanitor. In all Python data manipulation examples, here we are also going to see how to carry out them using only Pandas functionality.

In this post we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA f or within subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons using R or Python for data analysis (i.e., ANOVA).

Data visualization is a big part of the process of data analysis. In this post, we will learn how make a scatter plot using Python and the package Seaborn. In detail, we will learn how to use the Seaborn methods scatterplot, regplot, lmplot, and pairplot to create scatter plots in Python.

More specifically, we will learn how to make scatter plots, change the size of the dots, change the markers, the colors, and change the number of ticks.  Furthermore, we will learn how to plot a regression line, add text, plot a distribution on a scatter plot, among other things. Finally, we will also learn how to save Seaborn plots in high resolution. That is, we learn how to make print-ready plots.

In this post, we will learn how to read and write JSON files using Python. In the first part, we are going to use the Python package json to create a JSON file and write a JSON file. After that, we are going to use Pandas json method to load JSON files into Pandas dataframe. Here, we will learn how to read from a JSON file locally and from an URL as well as how to read a nested JSON file using Pandas.

Finally, as a bonus, we will also learn how to manipulate data in Pandas dataframes, rename columns, and plot the data using Seaborn.

In this post we will learn how to create a binder so that our data analysis, for instance, can be fully reproduced by other researchers. That is, in this post we will learn how to use binder for reproducible research.

In previous posts, we have learned how to carry out data analysis (e.g., ANOVA) and visualization (e.g., Raincloud plots) using Python. The code we have used have been uploaded in the forms of Jupyter Notebooks.

For users of R Statistical Environment;

Although this is great, we also need to make sure that we share our computational environment so our code can be re-run and produce the same output. That is, to have a fully reproducible example, we need a way to capture the different versions of the Python packages we’re using.

With ever increasing volume of data, it is impossible to tell stories without visualizations. Data visualization is an art of how to turn numbers into useful knowledge. Using Python we can learn how to create data visualizations and present data in Python using the Seaborn package.

In this post we are going to learn how to use the following 9 plots:

1. Scatter Plot
2. Histogram
3. Bar Plot
4. Time Series Plot
5. Box Plot
6. Heat Map
7. Correlogram
8. Violin Plot
9. Raincloud Plot

## Python Data Visualization Tutorial: Seaborn

As previously mentioned in this Python Data Visualization tutorial we are mainly going to use Seaborn but also Pandas,  and Numpy. However, to create the Raincloud Plot we are going to have to use the Python package ptitprince. Python Raincloud Plot using the ptitprince package

Learn about probabilistic programming in this guest post by Osvaldo Martin, a researcher at The National Scientific and Technical Research Council of Argentina (CONICET) and author of Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition.

This post is based on an excerpt from the second chapter of the book that I have slightly adapted so it’s easier to read without having read the first chapter.