In this post, we are going to learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) how to read .dta files in R using Haven, and 2) how to write dataframes to .dta file.
Data Import in R: Reading Stata Files
Now, R is, as we all know, a superb statistical programming environment. When it comes to importing and storing data, we can store our data in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).
Now, this is when R shows us its brilliance; as an R user we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site there are other tutorials on how to import data from (some) of these formats:
- How to Import SAS files in R
- Reading and writing SPSS files in R
- How to read, and write, Excel (.xslx) files in R – e.g., multiple sheets
Before we go on and learn how to read Stata files in R, we will answer the questions:
Can R Read Stata Files?
The answer is “yes!, R can read Stata (.dta) files. This is easy to do with the Haven package. First, load the package:
library(haven). Second, use the
How to Read a Stata (.dta) File in R
Now, we are soon ready to answer how to open a Stata file in R? by using easy to follow examples. In R, there are many useful packages that make it possible for us to open .dta files. Here, in this tutorial, however, we are going to use the package Haven (which is part of the Tidyverse package).
Now, Haven can be installed separately or by installing the Tidyverse packages. First, if we want to only haven we open up R (or RStudio) and type
install.packages("haven"). If we, on the other hand, want to install all Tidyverse packages we change “haven” to “tidyverse”:
The Syntax of read_dta()
In this section, before learning the steps to reading a .dta file, we will have a quick look at the syntax of the read_dta() function. In it’s simples form here’s how to import data from a .dta file in R:
# import dta file in r dataframe <- read_dta('PATH_OR_URL_TO_STATA_FILE')
Of course, the function comes packed with a couple of arguments:
As you can see, the first argument is the file, which should be the path, or URL, to the file. This is evident from the syntax above, as well. In this tutorial, we will have a look at the col_select argument.
How to Read a Stata (.dta) file in R Step-By-Step
In this section, we are finally going to learn how to import .dta files in R. Here are the three simple steps to read a Stata file in R:
1) Load the haven Library:
First, we are going to load the Haven package:
library(haven). Now that we have all the functions of the Haven package in the namespace we can proceed to step 2: finding the .dta file we want to read.
2) Find the .dta File
Second, before we can import the Stata file, we need to know where the file is located. In the next step we, therefore, create a character variable with the path to the file.
dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta")
Note, in the read_dta() example above, the r-script and the data file is in two subfolders (i.e., Data is a subfolder of the script). To elaborate, we used the getwd() function to get the current working directory (e.g., “C:/Users/Erik/Documents”). Moreover, the .dta file is located in the subfolder (e.g., “Data”) to the “RScripts” folder. Thus, the next two character vectors are indicating where the file is and, finally, we have the file name.
3) Read the Stata (.dta) File into R using read_dta():
Now, we are ready to actually import the data, from our .dta file, into R. This is done using the read_dta() function, as previously mentioned. Here’s how to read a dta file in R:
# import dta file in r fifthD.df <- read_dta(dtafile) head(fifthD.df)
That was it, you have now read the .dta files into a dataframe. Next, you may want to carry out simple data manipulation e.g., add empty column to dataframe in R.
How to a Read .dta File in R from a URL
In this section, we are going to learn how to import a Stata file (.dta) from a URL. This is, of course, as simple as loading the data from the hard drive. Naturally, however, we need to change the character variable. Here’s an example on how to read a dta file from a URL:
url <- "http://www.principlesofeconometrics.com/stata/broiler.dta" data.df <- read_dta(dtafile) head(data.df)
If your data includes datetime, and you want to separate time from date, check the latest post:
How to Read Specific Columns from a Stata (.dta) file in R
In this section, of the read Stata files in R tutorial, we are going to learn how to use read_dta() to load specific columns. This may be useful when we plan to analyze some specific variables from very large datasets.
Reading One Column from a dta File in R
First, we are going to read only one column. In the code chunk below, we are reading the “pbeef” column. Thus, we are using the col_select argument and use a character that is specifying the column we want to read:
data.df <- read_dta(url, col_select="pbeef") head(data.df))
Reading Multiple Columns from a dta File in R
Now, if we want to read many columns from the .dta file we’ll first create a character vectors with the column names:
cols <- c("pbeef")
Finally, we are ready to read the columns. Note, here we use the all_of function:
data.df <- read_dta(url, col_select=all_of(cols)) head(data.df))
We have now learned how to read a Stata file in R, the next step might be to inspect the dataframe, visualize the data, and if we have categorical data we should dummy code them. See the posts on how to create scatter plots in R with ggplot2 and how to create dummy variables in R.
How to Save a Stata file
In this section, we will learn how to write a dataframe to a Stata file. First, we will learn how to do some data manipulation on a .dta file we have loaded in R and save it as a new .dta file. Second, we are going to learn how to read an Excel file in R and save it as a Stata file.
Saving a dataframe as a Stata file using write_dta()
In the example below, we are first going to load a .dta file using read_dta(). Second, we are going to remove columns in R using dplyr(). Finally, when we have deleted the columns we don’t want, we are going to save the dataframe as a .dta file.
library(haven);library(dplyr) ## Dta file: dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta") dta.df <- read_dta(dtafile)
In the code chunk, above, we did not do anything new (for this post). Now, in the next code chunk, we are deleting two columns.
newdta.df <- select(dta.df, -c(index, Day))
Finally, we are ready to write the dataframe as a .dta file:
write_dta(newdta.df, file.path(getwd(), "RScripts", "Data", "NewFifthDayData.dta"))
Save a CSV file as a Stata File
In this section, we are going to work with another R package, from the tidyverse package; readr. Now, we are going to use the read_csv to read data from a CSV file. After we have imported the CSV to a dataframe we are going to save it as a .dta file using Haven’s write_dta() function:
library(readr) csvfile <- file.path(getwd(), "RScripts", "Data", "FirstDayData.csv") data.df <- read_csv(csvfile) View(data.df) ## Saving it as a dta write_dta(data.df, file.path(getwd(), "RScripts", "Data", "FirstDayData.dta"))
Export an Excel file as a Stata File
library(readxl) xlfile <- file.path(getwd(), "RScripts", "Data", "example_concat.xlsx") data.df <- read_excel(xlfile) write_dta(data.df, file.path(getwd(), "RScripts", "Data", "STATADATA.dta"))</code></pre>
Summary: Read Stata Files using R
In this post, we have learned how to read Stata files in R. Specifically, we’ve learned how to load .dta files using the Haven package. Furthermore, we have learned how to write R dataframes to Stata files, as well as loading data from Excel and CSV files to save them as .dta files.