In a previous post, we learned how to use Binder and Python for reproducible research. Now we are going to learn how to create a Binder for our data analysis in R, so it can be fully reproduced by other researchers. More specifically, in this post we will learn how to use Binder for reproducible research.
Many researchers upload their code for data analysis and visualization using git (e.g., to GitHub, Gitlab).
No doubt, uploading your R scripts is great. However, we also need to make sure that we share the complete computational environment so that our code can be re-run and so that others can reproduce the results. That is, to have a fully reproducible example, we need a way to capture the different versions of the R packages we were using, at that particular time.
As previously, mentioned, uploading the r scripts, data, and sharing the links to the online repositories (e.g., GitHub or the Open Science Framework) is good. Recreating, however, the exact environment, including the used versions, required to run another researcher’s analysis may involve a lot of time.
Luckily, there are some alternatives to help us with creating reproducible environments. For example, Binder and Code Ocean can make a different; it enables us to reproduce our exact R statistical programming environment, with an online RStudio, This, in turn, will enable other researchers to reproduce our exact work flow, as well as change it.
What is Binder
In this post, the focus is on Binder and how to use Binder for reproducible research. That is, the post will not cover any technical details about Binder.
Binder makes it possible for us to create custom computing environments. Furthermore, these computing environments can be shared and used by many other researchers. For example, the exact data analysis can be run, by another suser, by following a link to RStudio that can be run in a web browser. Finally, Binder is powered by BinderHub, an open-source tool that deploys the Binder service to the cloud.
In this R and Binder tutorial, we will use mybinder.org. You may now ask yourself: how does Binder work? Binder is pulling a repository, that we set up on GitHub into a Docker container. After this, Docker packages our data, code and all the dependencies, specified in a file, into a docker container. Finally, this is done to ensure that our scripts works seamlessly in any environment.
How to Use Binder for Reproducible Code
In this Binder tutorial, will create a scatter plot on a map and use when we create our Binder repository. Here, we use the following tools and r-packages:
All packages above can be installed on our computers using the following r code:
list.of.packages <- c("ggplot2", "ggmap", "osmdata") new.packages < - list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages)
Note, the above r packages will be installed when we create the Binder but, typically, we also want to use them locally, on our computers.
How to Use Git
In this post, on how to use Binder for reproducible code, we are going to use command line git. Thus, we need to create a GitHub account and install Git. If you already know how to use Git, click here to skip this part.
In the git example below, we are using Ubuntu 18.04, and a terminal prompt. If you are a Windows user you can see the how to use git and git bash in the Binder and Python tutorial.
Step 1: Install Git and Set it Up
Open up a terminal window and type code apt install git
When git is installed. we need to configure git.
In the terminal window we type:
git config --global user.name "Your name here" git config --global user.email "[email protected]"
Step 2: Create a GitHub Repository
Now that we have git installed, we are going to create a directory for our repository. Open up the terminal window, again, and type the following:
cd Documents mkdir Binder cd Binder mkdir bindRtut cd bindRtut
Now we can initialize a git repository:
Step 3: Create the environment and a README files
In the next step we are going to create our runtime.txt and README.md files. The file runtime.txt is where we tell which r environment we are going to use (the date we ran the script=
Creating the runtime.txt file:
First, we open up a text editor, in this case we’ll use RStudio to create a new file (New File > Text File):
When this is done, we are going to save this file as “runtime.txt” in the directory we already have created.
Creating the install.R File
We also need to create the install.r file to tell Binder to install the r-packages we use in our script. Again, File New > New Rscript and save this file as “install.r”, in the folder we previously created. Add the command below to that file.
install.packages(c(“ggplot2”, “ggmap”, “osmdata”))
Creating the README.md file:
In RStudio we create another file: File New > New R Markdown and select From Template and GitHub Document (Markdown). Finally, save this file, in the same directory as the runtime.txt, README.md and add the following text:
## How to Use Binder for Reproducible Research
Example README.md file for the tutorial on how to use Binder for reproducible research in R statistical enviornment.
Note, the above is an example README.md file for the tutorial on how to use Binder fore reproducible research in R. Naturally, for a real science project we would have a richer description about the study, methods, and statistical analysis in the README.md.
Step 4: Add the Files and Create a Commit
Finally, we are ready to commit our files to GitHub. Open up a terminal window, and type:
git add. git commit -m "Created the binder r environment, install.r, and README files"
Step 5: Create a new repository on GitHub
In the fifth step, we are going to create a new repository on GitHub. First, we go to the GitHub home page and press the green ‘+ New repository button:
After we have clicked the button, GitHub will ask us to name the repository. In this Binder example we leave out the description and give the repository the name “bindertut”:
In this Binder tutorial we only fill in the repository name and we can now press the green ‘Create repository’ button to make our new repository.
Note, we have already created a new repository locally and, thus, we want to push that onto GitHub. First, we open up the terminal window, again, and type the git commands below.
git remote add origin [email protected]:marsja/binderRtut.git git push -u origin master
Note, remember to change the username (i.e., marsja to yours). For more guides on how to use Github see here.
How to Create an Reproducible R Environment
In the next code example, we are going to create a scatterplot of Google location history data (your data can be downloaded from your Google account but need to be pre-processed before using the R script below).
R Code Example:
library(ggplot2) library(ggmap) library(osmdata) # Getting data for Rotterdam rotterdam < - getbb("Rotterdam") df <- read.csv('rotter.csv') head(df)
In R data can be imported from many formats. If the data is stored in Excel files, it is possible to use the readxl and xlsx packages to load the data into R dataframes. Learn more about working with Excel files in the recent R Excel tutorial.
Scatter Plot on a Map
In the next code example, we are creating a scatterplot using ggplot2 and ggmap:
rott <- get_map(rotterdam, zoom = 13, source="osm") ggm <- ggmap(rott, extent="device", legend="none") ggm <- ggm + geom_point(aes(x = lon, y =lat), data=df) print(ggm)
Now we have our code to upload to Binder. Again, we need to push this to git:
git add . git commit -m 'Added data vis' git push -u origin master
Learn more about data visualization in R:
Connecting Everything to Binder.
Finally, we are ready to create our Binder. This is quite easy, we open up a new tab in our browser, type “https://gke.mybinder.org/” and hit enter.
How to Create a mybinder in 3 Simple Steps
Time needed: 1 minute.
How do you create a Binder using mybinder? Here’s three simple steps:
- Paste URL to GitHub Repository
First, we paste the URL to the GitHub Repository we created
- Type “rstudio”
In the form, where it says “Path to a notebook file (optional”) type rstudio
- Press “Launch”
Now we are ready to launch the Binder
Now we can sit back, have a beer, and wait for it to be done. The process of building everything may take a long time. After everything is done we’ll get a link that we can share with other researchers.
Finally, it’s important to remember that any changes in the RStudio online, on mybinder.org, will not be saved. In fact, the RStudio will be automatically shut down after 10 minutes of inactivity. Of course, the Binder can be updated. In fact, all we need to do is go back to our the repository, make changes, commit, and push back to GitHub.
Note, for Windows users please have a look at the how to use Git section in the tutorial on how to use Python and Binder. In Windows, we can use Git Bash instead of a Linux terminal window.
In this tutorial we have learned how to set up git, use GitHub together with Binder for Reproducible data visualization. Furthermore, we used ggplot2, ggmap and osmdata for the visualizing Google location data in RStudio. This RStudio online, a long with the GitHub repository we created, can be found online here.