How to Read and Write Stata (.dta) Files in R with Haven

In this post, we will learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) how to read .dta files in R using Haven, and 2) how to write dataframes to .dta file.

Data Import in R: Reading Stata Files

How to Read dta File in R
- Install Haven:
- The Syntax of read_dta()
How to Read a dta File in R Step-By-Step
How to Save a Stata file
- Saving a dataframe as a Stata file using write_dta()
Save a CSV file as a Stata File
Export an Excel file as a Stata File

Summary: Read Stata Files using R

Data Import in R: Reading Stata Files

R is, as we all know, a superb statistical programming environment. When importing and storing data, we can store it in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).

Now, this is when R shows us its brilliance; as an R user, we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site, there are other tutorials on how to import data from (some) of these formats:

How to read, and write, Excel (.xslx) files in R – e.g., multiple sheets

Before we go on and learn how to read Stata files in R, we will answer the questions:

Can R Read Stata .dta Files?

The answer is “yes!, R can read Stata (.dta) files. This is easy to do with the Haven package. First, load the package: library(haven). Second, use the read_dta() function.

How do I open a Stata file in R

To open a Stata file in R, you can use the read_dta() function from the library called haven. For example, study_df <- read_dta('study_data.dta') will open the Stata file called “study_data.dta” and create a data frame object.

How to Read dta File in R

Now, we are soon ready to answer how to open a Stata file in R? by using easy-to-follow examples. In R, there are many useful packages that make it possible for us to open .dta files. In this tutorial, however, we will use the package Haven (part of the Tidyverse package).

Install Haven:

First, the library need to be installed. Haven can be installed separately or by installing the Tidyverse packages. First, if we want only to install haven we open up R (or RStudio) and type install.packages("haven"). If we, on the other hand, want to install all Tidyverse packages we change “haven” to “tidyverse”: install.packages('tidyverse').

three steps how to read a stata (.dta) file in R — Three steps to read .dta files in R

Learn how to add a column to a dataframe in R based on other columns.

The Syntax of read_dta()

In this section, before learning the steps to reading a .dta file, we will look at the syntax of the read_dta() function. In its simples form, here’s how to import data from a .dta file in R:

# import dta file in r
dataframe <- read_dta('PATH_OR_URL_TO_STATA_FILE')Code language: PHP (php)

Here is a breakdown of the above syntax:

dataframe: This is the object’s name that will store the data after it’s read in. We can choose any name you like if it follows R’s naming rules.
read_dta: This is a function in the haven package, which is a package for working with data from other statistical software programs, including Stata.

'PATH_OR_URL_TO_STATA_FILE': This is the path or URL to the Stata data file that you want to read. The path should be enclosed in quotes.

Of course, the function comes packed with a couple of arguments:

read dta in r with Haven — read_dta’s arguments

As you can see, the first argument is the file, which should be the path, or URL, to the file. This is evident from the syntax above, as well. In this tutorial, we will have a look at the col_select argument.

How to Read a dta File in R Step-By-Step

In this section, we are finally going to learn how to import .dta files in R. Here are the three simple steps to read a Stata file in R:

1) Load the haven Library:

First, we are going to load the Haven package: library(haven). Now that we have all the functions of the Haven package in the namespace, we can proceed to step 2: finding the .dta file we want to read.

2) Find the .dta File

Second, we need to know where the file is located before we can import the Stata file. In the next step, we, therefore, create a character variable with the path to the file.

dtafile <- file.path(getwd(), "RScripts", 
                     "Data", "FifthDayData.dta")Code language: R (r)

Note, in the read_dta() example above, the r-script and the data file are in two subfolders (i.e., Data is a subfolder of the script). To elaborate, we used the getwd() function to get the current working directory (e.g., “C:/Users/Erik/Documents”). Moreover, the .dta file is located in the subfolder (e.g., “Data”) to the “RScripts” folder. Thus, the next two character vectors are indicating where the file is, and, finally, we have the file name.

3) Read the File using read_dta():

Now, we are ready to actually import the data, from our .dta file, into R. This is done using the read_dta() function, as previously mentioned. Here’s how to read a dta file in R:

# import dta file in r
fifthD.df <- read_dta(dtafile)
head(fifthD.df)Code language: R (r)

reading stata files in R — Dataframe from Stata file (.dta)

That was it, you have now read the .dta files into a dataframe. Next, you may want to carry out simple data manipulation e.g., add empty column to dataframe in R.

How to a Read .dta File in R from a URL

In this section, we will learn how to import a Stata file (.dta) from a URL. This is, of course, as simple as loading the data from the hard drive. Naturally, however, we need to change the character variable. Here’s an example on how to read a dta file from a URL:

url <- "http://www.principlesofeconometrics.com/stata/broiler.dta"

data.df <- read_dta(dtafile)
head(data.df)Code language: R (r)

the dta file we read into a R dataframe — The dta file imported into R

If your data includes datetime, and you want to separate time from date, check the latest post:

How to Extract Time from Datetime in R – with Examples

How to Read Specific Columns from a Stata (.dta) file in R

In this section of the read Stata files in R tutorial, we are going to learn how to use read_dta() to load specific columns. This may be useful when we plan to analyze some specific variables from very large datasets.

Reading One Column from a dta File in R

First, we are going to read only one column. In the code chunk below, we are reading the “pbeef” column. Thus, we are using the col_select argument and use a character that is specifying the column we want to read:

data.df <- read_dta(url, col_select="pbeef")

head(data.df))Code language: R (r)

In the code chunk above, we read a dta file in R from a URL and assign it to a data frame called “data.df”. We select only the column “pbeef” from the Stata file and store it in the data frame.

Reading Multiple Columns from a dta File in R

Now, if we want to read many columns from the .dta file we’ll first create a character vectors with the column names:

cols <- c("pbeef")Code language: R (r)

Finally, we are ready to read the columns. Note, here we use the all_of function:

data.df <- read_dta(url, col_select=all_of(cols))

head(data.df))Code language: R (r)

We have now learned how to read a Stata file in R, the next step might be to inspect the dataframe, visualize the data, and if we have categorical data we should dummy code them. See the posts about how to create scatter plots in R with ggplot2 and how to create dummy variables in R.

How to Save a Stata file

This section will teach us how to write a dataframe to a Stata file. First, we will learn how to manipulate data on a .dta file we have loaded in R and save it as a new .dta file. Second, we will learn how to read an Excel file in R and save it as a Stata file.

Saving a dataframe as a Stata file using write_dta()

In the example below, we will first load a .dta file using read_dta(). Second, we are going to remove columns in R using dplyr(). Finally, when we have deleted the columns we don’t want, we will save the dataframe as a .dta file.

library(haven);library(dplyr)

## Dta file:
dtafile <- file.path(getwd(), "RScripts", 
                     "Data", "FifthDayData.dta")

dta.df <- read_dta(dtafile)Code language: R (r)

In the code chunk, above, we did not do anything new (for this post). Now, in the next code chunk, we are deleting two columns.

newdta.df <- select(dta.df, -c(index, Day))Code language: R (r)

Finally, we are ready to write the dataframe as a .dta file:

write_dta(newdta.df, file.path(getwd(), "RScripts", 
                               "Data", "NewFifthDayData.dta"))Code language: R (r)

Note, before saving your dta file you might want to use R to remove duplicate rows and columns from the data frame. This can be done using the functions duplicated() or unique() functions in R.

Save a CSV file as a Stata File

In this section, we are going to work with another R package, from the tidyverse package; readr. Now, we will use the read_csv to read data from a CSV file. After we have imported the CSV to a dataframe we are going to save it as a .dta file using Haven’s write_dta() function:

library(readr)

csvfile <- file.path(getwd(), "RScripts", 
                      "Data", "FirstDayData.csv") 

data.df <- read_csv(csvfile)

View(data.df)

## Saving it as a dta


write_dta(data.df, file.path(getwd(), "RScripts", 
                               "Data", "FirstDayData.dta"))Code language: R (r)

Export an Excel file as a Stata File

In the final example, we are going to use read_excel (from the readxl package) to import a .xslx file in R. After we have done that, we will save this Excel file as a Stata file.

library(readxl)

xlfile <- file.path(getwd(), "RScripts", 
                     "Data", "example_concat.xlsx") 

data.df <- read_excel(xlfile)

write_dta(data.df, file.path(getwd(), "RScripts", 
                             "Data", "STATADATA.dta"))</code></pre>Code language: R (r)

Note, all the files we have read using read_dta, read_stata, read_csv, and read_excel can be found here and a Jupyter notebook can be found here.

Summary: Read Stata Files using R

In this blog post, you have learned how to read a dta file in R using the haven library. Now you can load the library, find the Stata file, and read it into R using the read_dta() function. We have looked at examples how this can be done from a local file or from a URL. Additionally, we had a look at how we can select specific columns from the Stata file to read into R.

Once we have imported data from the Stata file, we have learned how we can save it as a Stata file or export it as a CSV or Excel file. This can be done using the write_dta(), write.csv(), and writexl::write_xlsx() functions.

By learning how to read and save Stata files in R, you can take advantage of R’s powerful data analysis and visualization capabilities. Share this post with anyone who wants to learn more about working with Stata files in R.