Press "Enter" to skip to content

How to Read and Write Stata (.dta) Files in R with Haven

Last updated on February 2, 2020

In this post, we are going to learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) who to import .dta files in R using Haven, and 2) how to write dataframes to .dta file.

Data Import in R: Reading Stata Files

Now, R is, as we all know, a superb statistical programming environment. When it comes to importing and storing data, we can store our data in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).

Now, this is when R shows us its brilliance; as an R user we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site there are other tutorials on how to import data from (some) of these formats:

Before we go on and learn how to read SAS files in R, we will answer the questions:

Can R Read Stata Files?

The answer is “yes!, R can read Stata (.dta) files. This is easy to do with the Haven package. First, load the package: library(haven). Second, use the read_dta() function.

How to Open Stata (.dta) Files in R

Now, we are soon ready to answer how to open a Stata file in R? by using easy to follow examples. In R, there are many useful packages that make it possible for us to open .dta files. Here, in the how-to read Stata tutorial, however, we are going to use the package Haven (which is part of the Tidyverse package).

How to install Haven:

Now, Haven can be installed separately or by installing the Tidyverse packages. First, if we want to only haven we open up R (or RStudio) and type install.packages("haven"). If we, on the other hand, want to install all Tidyverse packages we change “haven” to “tidyverse”: install.packages('tidyverse').

three steps how to read stata files in R
  • Save
Three steps to read .dta files in R

How to Read a Stata (.dta) file in R Step-By-Step

In this section, of the read Stata files in R tutorial, we are finally going to learn how to import .dta files in R.

1) Load the haven Library:

First, we are going to load the Haven package: library(haven)

2) Find the .dat File

Second, before we can import the Stata file, we need to know where the file is located. Thus, we create a character variable with the filename.

dtafile <- file.path(getwd(), "RScripts", 
                     "Data", "FifthDayData.dta")

Note, in the read_dta() example above, the r-script and the data file is in two subfolders (i.e., Data is a subfolder of the script). To elaborate, we used the getwd() function to get the current working directory (e.g., “C:/Users/Erik/Documents”). Moreover, the .dta file is located in the subfolder (e.g., “Data”) to the “RScripts” folder. Thus, the next two character vectors are indicating where the file is and, finally, we have the file name.

3) Load the Stata (.dta) File using read_dta():

Now, we are ready to actually import the data, from our .dta file, into R. This is done using the read_dta() function, as previously mentioned.

fifthD.df <- read_dta(dtafile)
head(fifthD.df)
reading stata files in R
  • Save

How to Read .dta Files from a URL

In this section, we are going to learn how to import a Stata file from a URL. This is, of course, as simple as loading the data from the hard drive. Naturally, however, we need to change the character variable.

url <- "http://www.principlesofeconometrics.com/stata/broiler.dta"

data.df <- read_dta(dtafile)
head(data.df)
how to read stata files in R
  • Save

How to Read Specific Columns from a Stata file

In this section, of the read Stata files in R tutorial, we are going to learn how to use read_dta() to load specific columns. This may be useful when we plan to analyze some specific variables from very large datasets.

Reading One Column

First, we are going to read only one column. In the code chunk below, we are reading the “pbeef” column. Thus, we are using the col_select argument and use a character that is specifying the column we want to read:

data.df <- read_dta(url, col_select="pbeef")

head(data.df))

Reading Multiple Columns

Now, if we want to read many columns from the .dta file we’ll first create a character vectors with the column names:

cols <- c("pbeef")

Finally, we are ready to read the columns. Note, here we use the all_of function:

data.df <- read_dta(url, col_select=all_of(cols))

head(data.df))

How to Save a Stata file

In this section, we will learn how to write a dataframe to a Stata file. First, we will learn how to do some data manipulation on a .dta file we have loaded in R and save it as a new .dta file. Second, we are going to learn how to read an Excel file in R and save it as a Stata file.

Saving a dataframe as a Stata file using write_dta()

In the example below, we are first going to load a .dta file using read_dta(). Second, we are going to remove some of the columns in R using dplyr(). Finally, when we have deleted the columns we don’t want, we are going to save the dataframe as a .dta file.

library(haven);library(dplyr)

## Dta file:
dtafile <- file.path(getwd(), "RScripts", 
                     "Data", "FifthDayData.dta")

dta.df <- read_dta(dtafile)

In the code chunk, above, we did not do anything new (for this post). Now, in the next code chunk, we are deleting two columns.


newdta.df <- select(dta.df, -c(index, Day))

Finally, we are ready to write the dataframe as a .dta file:

write_dta(newdta.df, file.path(getwd(), "RScripts", 
                               "Data", "NewFifthDayData.dta"))
  • Save
New .dta file saved

Save a CSV file as a Stata File

In this section, we are going to work with another R package, from the tidyverse package; readr. Now, we are going to use the read_csv to read data from a CSV file. After we have imported the CSV to a dataframe we are going to save it as a .dta file using Haven’s write_dta() function:

library(readr)

csvfile <- file.path(getwd(), "RScripts", 
                      "Data", "FirstDayData.csv") 

data.df <- read_csv(csvfile)

View(data.df)

## Saving it as a dta


write_dta(data.df, file.path(getwd(), "RScripts", 
                               "Data", "FirstDayData.dta"))
  • Save

Export an Excel file as a Stata File

In the final example, we are going to use read_excel (from the readxl package) to import a .xslx file in R. After we have done that, we will save this Excel file as a Stata file.

library(readxl)

xlfile <- file.path(getwd(), "RScripts", 
                     "Data", "example_concat.xlsx") 

data.df <- read_excel(xlfile)

write_dta(data.df, file.path(getwd(), "RScripts", 
                             "Data", "STATADATA.dta"))

Note, all the files we have read using read_dta, read_stata, read_csv, and read_excel can be found here and anJupyter notebook can be found here.

Summary: Read Stata Files using R

In this post, we have learned how to read Stata files in R. Specifically, we’ve learned how to load .dta files using the Haven package. Furthermore, we have learned how to write R dataframes to Stata files, as well as loading data from Excel and CSV files to save them as .dta files.

  • Save

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Share via
    Copy link