In this post, we will learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) how to read .dta files in R using Haven, and 2) how to write dataframes to .dta file.
Data Import in R: Reading Stata Files
R is, as we all know, a superb statistical programming environment. When importing and storing data, we can store it in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).
Now, this is when R shows us its brilliance; as an R user, we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site, there are other tutorials on how to import data from (some) of these formats:
- How to Import SAS files in R
- Reading and writing SPSS files in R
- How to read, and write, Excel (.xslx) files in R – e.g., multiple sheets
Before we go on and learn how to read Stata files in R, we will answer the questions:
Can R Read Stata .dta Files?
The answer is “yes!, R can read Stata (.dta) files. This is easy to do with the Haven package. First, load the package:
library(haven). Second, use the
How do I open a Stata file in R
To open a Stata file in R, you can use the read_dta() function from the library called haven. For example,
study_df <- read_dta('study_data.dta') will open the Stata file called “study_data.dta” and create a data frame object.
How to Read dta File in R
Now, we are soon ready to answer how to open a Stata file in R? by using easy-to-follow examples. In R, there are many useful packages that make it possible for us to open .dta files. Here, in this tutorial, however, we will use the package Haven (part of the Tidyverse package).
First, the library need to be installed. Haven can be installed separately or by installing the Tidyverse packages. First, if we want only to install haven we open up R (or RStudio) and type
install.packages("haven"). If we, on the other hand, want to install all Tidyverse packages we change “haven” to “tidyverse”:
The Syntax of read_dta()
In this section, before learning the steps to reading a .dta file, we will look at the syntax of the read_dta() function. In its simples form, here’s how to import data from a .dta file in R:
# import dta file in r dataframe <- read_dta('PATH_OR_URL_TO_STATA_FILE')Code language: PHP (php)
Here is a breakdown of the above syntax:
dataframe: This is the object’s name that will store the data after it’s read in. We can choose any name you like if it follows R’s naming rules.
read_dta: This is a function in the
havenpackage, which is a package for working with data from other statistical software programs, including Stata.
'PATH_OR_URL_TO_STATA_FILE': This is the path or URL to the Stata data file that you want to read. The path should be enclosed in quotes.
Of course, the function comes packed with a couple of arguments:
As you can see, the first argument is the file, which should be the path, or URL, to the file. This is evident from the syntax above, as well. In this tutorial, we will have a look at the col_select argument.
How to Read a dta File in R Step-By-Step
In this section, we are finally going to learn how to import .dta files in R. Here are the three simple steps to read a Stata file in R:
1) Load the haven Library:
First, we are going to load the Haven package:
library(haven). Now that we have all the functions of the Haven package in the namespace, we can proceed to step 2: finding the .dta file we want to read.
2) Find the .dta File
Second, we need to know where the file is located before we can import the Stata file. In the next step, we, therefore, create a character variable with the path to the file.
dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta")Code language: R (r)
Note, in the read_dta() example above, the r-script and the data file are in two subfolders (i.e., Data is a subfolder of the script). To elaborate, we used the getwd() function to get the current working directory (e.g., “C:/Users/Erik/Documents”). Moreover, the .dta file is located in the subfolder (e.g., “Data”) to the “RScripts” folder. Thus, the next two character vectors are indicating where the file is, and, finally, we have the file name.
3) Read the File using read_dta():
Now, we are ready to actually import the data, from our .dta file, into R. This is done using the
read_dta() function, as previously mentioned. Here’s how to read a dta file in R:
# import dta file in r fifthD.df <- read_dta(dtafile) head(fifthD.df)Code language: R (r)
That was it, you have now read the .dta files into a dataframe. Next, you may want to carry out simple data manipulation e.g., add empty column to dataframe in R.
How to a Read .dta File in R from a URL
In this section, we will learn how to import a Stata file (.dta) from a URL. This is, of course, as simple as loading the data from the hard drive. Naturally, however, we need to change the character variable. Here’s an example on how to read a dta file from a URL:
url <- "http://www.principlesofeconometrics.com/stata/broiler.dta" data.df <- read_dta(dtafile) head(data.df)Code language: R (r)
If your data includes datetime, and you want to separate time from date, check the latest post:
How to Read Specific Columns from a Stata (.dta) file in R
In this section of the read Stata files in R tutorial, we are going to learn how to use read_dta() to load specific columns. This may be useful when we plan to analyze some specific variables from very large datasets.
Reading One Column from a dta File in R
First, we are going to read only one column. In the code chunk below, we are reading the “pbeef” column. Thus, we are using the col_select argument and use a character that is specifying the column we want to read:
data.df <- read_dta(url, col_select="pbeef") head(data.df))Code language: R (r)
In the code chunk above, we read a dta file in R from a URL and assign it to a data frame called “data.df”. We select only the column “pbeef” from the Stata file and store it in the data frame.
Reading Multiple Columns from a dta File in R
Now, if we want to read many columns from the .dta file we’ll first create a character vectors with the column names:
cols <- c("pbeef")Code language: R (r)
Finally, we are ready to read the columns. Note, here we use the all_of function:
data.df <- read_dta(url, col_select=all_of(cols)) head(data.df))Code language: R (r)
We have now learned how to read a Stata file in R, the next step might be to inspect the dataframe, visualize the data, and if we have categorical data we should dummy code them. See the posts about how to create scatter plots in R with ggplot2 and how to create dummy variables in R.
How to Save a Stata file
This section will teach us how to write a dataframe to a Stata file. First, we will learn how to manipulate data on a .dta file we have loaded in R and save it as a new .dta file. Second, we will learn how to read an Excel file in R and save it as a Stata file.
Saving a dataframe as a Stata file using write_dta()
In the example below, we will first load a .dta file using read_dta(). Second, we are going to remove columns in R using dplyr(). Finally, when we have deleted the columns we don’t want, we will save the dataframe as a .dta file.
library(haven);library(dplyr) ## Dta file: dtafile <- file.path(getwd(), "RScripts", "Data", "FifthDayData.dta") dta.df <- read_dta(dtafile)Code language: R (r)
In the code chunk, above, we did not do anything new (for this post). Now, in the next code chunk, we are deleting two columns.
newdta.df <- select(dta.df, -c(index, Day))Code language: R (r)
Finally, we are ready to write the dataframe as a .dta file:
write_dta(newdta.df, file.path(getwd(), "RScripts", "Data", "NewFifthDayData.dta"))Code language: R (r)
Note, before saving your dta file you might want to use R to remove duplicate rows and columns from the data frame. This can be done using the functions duplicated() or unique().
Save a CSV file as a Stata File
In this section, we are going to work with another R package, from the tidyverse package; readr. Now, we will use the read_csv to read data from a CSV file. After we have imported the CSV to a dataframe we are going to save it as a .dta file using Haven’s write_dta() function:
library(readr) csvfile <- file.path(getwd(), "RScripts", "Data", "FirstDayData.csv") data.df <- read_csv(csvfile) View(data.df) ## Saving it as a dta write_dta(data.df, file.path(getwd(), "RScripts", "Data", "FirstDayData.dta"))Code language: R (r)
Export an Excel file as a Stata File
In the final example, we are going to use read_excel (from the readxl package) to import a .xslx file in R. After we have done that, we will save this Excel file as a Stata file.
library(readxl) xlfile <- file.path(getwd(), "RScripts", "Data", "example_concat.xlsx") data.df <- read_excel(xlfile) write_dta(data.df, file.path(getwd(), "RScripts", "Data", "STATADATA.dta"))</code></pre>Code language: R (r)
Note, all the files we have read using read_dta, read_stata, read_csv, and read_excel can be found here and a Jupyter notebook can be found here.
Summary: Read Stata Files using R
In this blog post, you have learned how to read a dta file in R using the
haven library. Now you can load the library, find the Stata file, and read it into R using the
read_dta() function. We have looked at examples how this can be done from a local file or from a URL. Additionally, we had a look at how we can select specific columns from the Stata file to read into R.
Once we have imported data from the Stata file, we have learned how we can save it as a Stata file or export it as a CSV or Excel file. This can be done using the
By learning how to read and save Stata files in R, you can take advantage of R’s powerful data analysis and visualization capabilities. Share this post with anyone who wants to learn more about working with Stata files in R.