In this post, we are going to learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) who to import .dta files in R using Haven, and 2) how to write dataframes to .dta file.
Data Import in R: Reading Stata Files
Now, R is, as we all know, a superb statistical programming environment. When it comes to importing and storing data, we can store our data in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).
Now, this is when R shows us its brilliance; as an R user we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site there are other tutorials on how to import data from (some) of these formats:
- How to Import SAS files in R
- Reading and writing SPSS files in R
- How to read, and write, Excel (.xslx) files in R – e.g., multiple sheets
Before we go on and learn how to read SAS files in R, we will answer the questions:
Can R Read Stata Files?
The answer is “yes!, R can read Stata (.dta) files. This is easy to do with the Haven package. First, load the package:
library(haven). Second, use the
How to Open Stata (.dta) Files in R
Now, we are soon ready to answer how to open a Stata file in R? by using easy to follow examples. In R, there are many useful packages that make it possible for us to open .dta files. Here, in the how-to read Stata tutorial, however, we are going to use the package Haven (which is part of the Tidyverse package).
How to install Haven:
Now, Haven can be installed separately or by installing the Tidyverse packages. First, if we want to only haven we open up R (or RStudio) and type
install.packages("haven"). If we, on the other hand, want to install all Tidyverse packages we change “haven” to “tidyverse”:
How to Read a Stata (.dta) file in R Step-By-Step
In this section, of the read Stata files in R tutorial, we are finally going to learn how to import .dta files in R. Here are the three simple steps to read a Stata file in R:
1) Load the haven Library:
First, we are going to load the Haven package:
2) Find the .dat File
Second, before we can import the Stata file, we need to know where the file is located. Thus, we create a character variable with the filename.
dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta")
Note, in the read_dta() example above, the r-script and the data file is in two subfolders (i.e., Data is a subfolder of the script). To elaborate, we used the getwd() function to get the current working directory (e.g., “C:/Users/Erik/Documents”). Moreover, the .dta file is located in the subfolder (e.g., “Data”) to the “RScripts” folder. Thus, the next two character vectors are indicating where the file is and, finally, we have the file name.
3) Load the Stata (.dta) File using read_dta():
Now, we are ready to actually import the data, from our .dta file, into R. This is done using the read_dta() function, as previously mentioned.
fifthD.df <- read_dta(dtafile) head(fifthD.df)
How to Read .dta Files from a URL
In this section, we are going to learn how to import a Stata file from a URL. This is, of course, as simple as loading the data from the hard drive. Naturally, however, we need to change the character variable.
url <- "http://www.principlesofeconometrics.com/stata/broiler.dta" data.df <- read_dta(dtafile) head(data.df)
How to Read Specific Columns from a Stata file
In this section, of the read Stata files in R tutorial, we are going to learn how to use read_dta() to load specific columns. This may be useful when we plan to analyze some specific variables from very large datasets.
Reading One Column
First, we are going to read only one column. In the code chunk below, we are reading the “pbeef” column. Thus, we are using the col_select argument and use a character that is specifying the column we want to read:
data.df <- read_dta(url, col_select="pbeef") head(data.df))
Reading Multiple Columns
Now, if we want to read many columns from the .dta file we’ll first create a character vectors with the column names:
cols <- c("pbeef")
Finally, we are ready to read the columns. Note, here we use the all_of function:
data.df <- read_dta(url, col_select=all_of(cols)) head(data.df))
We have now learned how to read a Stata file in R, the next step might be to inspect the dataframe, visualize the data, and if we have categorical data we should dummy code them. See the posts on how to create scatter plots in R with ggplot2 and how to create dummy variables in R.
How to Save a Stata file
In this section, we will learn how to write a dataframe to a Stata file. First, we will learn how to do some data manipulation on a .dta file we have loaded in R and save it as a new .dta file. Second, we are going to learn how to read an Excel file in R and save it as a Stata file.
Saving a dataframe as a Stata file using write_dta()
In the example below, we are first going to load a .dta file using read_dta(). Second, we are going to remove some of the columns in R using dplyr(). Finally, when we have deleted the columns we don’t want, we are going to save the dataframe as a .dta file.
library(haven);library(dplyr) ## Dta file: dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta") dta.df <- read_dta(dtafile)
In the code chunk, above, we did not do anything new (for this post). Now, in the next code chunk, we are deleting two columns.
newdta.df <- select(dta.df, -c(index, Day))
Finally, we are ready to write the dataframe as a .dta file:
write_dta(newdta.df, file.path(getwd(), "RScripts", "Data", "NewFifthDayData.dta"))
Save a CSV file as a Stata File
In this section, we are going to work with another R package, from the tidyverse package; readr. Now, we are going to use the read_csv to read data from a CSV file. After we have imported the CSV to a dataframe we are going to save it as a .dta file using Haven’s write_dta() function:
library(readr) csvfile <- file.path(getwd(), "RScripts", "Data", "FirstDayData.csv") data.df <- read_csv(csvfile) View(data.df) ## Saving it as a dta write_dta(data.df, file.path(getwd(), "RScripts", "Data", "FirstDayData.dta"))
Export an Excel file as a Stata File
library(readxl) xlfile <- file.path(getwd(), "RScripts", "Data", "example_concat.xlsx") data.df <- read_excel(xlfile) write_dta(data.df, file.path(getwd(), "RScripts", "Data", "STATADATA.dta"))
Summary: Read Stata Files using R
In this post, we have learned how to read Stata files in R. Specifically, we’ve learned how to load .dta files using the Haven package. Furthermore, we have learned how to write R dataframes to Stata files, as well as loading data from Excel and CSV files to save them as .dta files.