In this short post, we are going to learn how to turn the code from blog posts to Jupyter notebooks.
Author: Erik Marsja
PhD Student in Psychology, Umeå University. Main interest is experimental and cognitive psychology. Enjoy programming in Python and R.
In this post, we will learn how make scatter plots using R and the package ggplot2.
More specifically, we will learn how to make scatter plots, change the size of the dots, change the markers, the colors, and change the number of ticks.
Furthermore, we will learn how to plot a trend line, add text, plot a distribution on a scatter plot, among other things. In the final section of the scatter plot in R tutorial, we will learn how to save plots in high resolution.
In this post, we are going to learn how to read SAS (.sas7dbat) files in Python.
As previously described (in the read .sav files in Python post) Python is a general-purpose language that also can be used for doing data analysis and data visualization.
One potential downside, however, is that Python is not really user-friendly for data storage. This has, of course, lead to that our data many times are stored using Excel, SPSS, SAS, or similar software. See, for instance, the posts about reading .sav and .xlxs files in Python:
In this tutorial, we will learn how to work with Excel files in R statistical programming environment. It will provide an overview of how to use R to load xlsx files and write spreadsheets to Excel.
In this post, we are going to work with Pandas iloc, and loc. More specifically, we are going to learn slicing and indexing by iloc and loc examples.
Once we have a dataset loaded as a Pandas dataframe, we often want to start accessing specific parts of the data based on some criteria. For instance, if our dataset contains the result of an experiment comparing different experimental groups, we may want to calculate descriptive statistics for each experimental group separately.
In a previous post, we learned how to use Binder and Python for reproducible research. Now we are going to learn how to create a Binder for our data analysis in R, so it can be fully reproduced by other researchers. More specifically, in this post we will learn how to use Binder for reproducible research.
Many researchers upload their code for data analysis and visualization using git (e.g., to GitHub, Gitlab).
No doubt, uploading your R scripts is great. However, we also need to make sure that we share the complete computational environment so that our code can be re-run and so that others can reproduce the results. That is, to have a fully reproducible example, we need a way to capture the different versions of the R packages we were using, at that particular time.
In this post we are going to learn 1) how to read SPSS (.sav) files in Python, and 2) how to write to SPSS (.sav) files using Python.
Python is a great general-purpose language as well as for carrying out statistical analysis and data visualization. However, Python is not really user-friendly when it comes to data storage. Thus, often our data will be archived using Excel, SPSS or similar software.
For example, learn how to import data from other file types, such as Excel and SPSS in the following two posts:
In previous posts, we learned how to use Python to detect group differences on a single dependent variable. However, there may be situations in which we are interested in several dependent variables. In these situations, the simple ANOVA model is inadequate.
One way to examine multiple dependent variables using Python would, of course, be to carry out multiple ANOVA. That is, one ANOVA for each of these dependent variables. However, the more tests we conduct on the same data, the more we inflate the family-wise error rate (the greater chance of making a Type I error).
This is where MANOVA comes in handy. MANOVA, or Multivariate Analysis of Variance, is an extension of Analysis of Variance (ANOVA). However, when using MANOVA we have two, or more, dependent variables.
MANOVA and ANOVA is similar when it comes to some of the assumptions. That is, the data have to be:
- normally distributed dependent variables
- equal covariance matrices)
In this post we are going to learn how to do simplify our data preprocessing work using the Python package Pyjanitor. More specifically, we are going to learn how to:
- Add a column to a Pandas dataframe
- Remove missing values
- Remove an empty column
- Cleaning up column names
That is, we are going to learn how clean Pandas dataframes using Pyjanitor. In all Python data manipulation examples, here we are also going to see how to carry out them using only Pandas functionality.
In this post we will learn how to carry out repeated measures Analysis of Variance (ANOVA) in R and Python. To be specific, we will use the R package afex and the Python package pingouin to carry out one-way and two-way ANOVA f or within subject’s design. The structure of the following data analysis tutorial is as follows; a brief introduction to (repeated measures) ANOVA, carrying out within-subjects ANOVA in R using afex and in Python using pingouin. In the end, there will be a comparison of the results and the pros and cons using R or Python for data analysis (i.e., ANOVA).