In this guide, you will learn how to concatenate two columns in R. You will learn how to merge multiple columns in R using base R (e.g., using the paste function) and Tidyverse (e.g., using
unite()). In the final section of this post, you will learn which function is the best to use when combining columns.
If you have some experience using dataframe (or, in this case, tibble) objects in R and you’re ready to learn how to combine data found in them, then this tutorial will help you do precisely that.
Knowing how to do this may prove useful when you have a dataframe containing information, in two columns, and you want to combine these two columns into one using R. For example, you might have a column containing first names and last names. In this case, you may want to concatenate these two columns into one e.g. called Names.
You can follow along with the examples in this tutorial using the interactive Jupyter Notebook found towards the end. Here’s the example data that we use to learn how to combine two, or more, columns into one variable.
In this post, you will learn, by example, how to concatenate two columns in R. As you will see, we will use R’s $ operator to select the columns we want to combine. The outline of the post is as follows. First, you will learn what you need to have to follow the tutorial. Second, you will get a quick answer on how to merge two columns. After this, you will learn a couple of examples using 1)
paste() and 2)
str_c() and 3)
unite(). In the final section, of this concatenating in R tutorial, you will learn which method I prefer and why. That is, you will get my opinion on why I like the
unite() function. In the next section, you will learn about the requirements of this post.
If you prefer to use base R you don’t need more than a working R installation. However, if you are going to use either str_() or unite() you need to have at least one of the packages stringr or tidyr. It is worth pointing out, here, that both of these packages are part of the Tidyverse package. This package contains multiple useful R packages that can be used for reading data, visualizing data (e.g., scatter plots with ggplot2), extracting year from date in R, adding new columns, among other things. Installing an R package is simple, here’s how you install Tidyverse:
install.packages("tidyverse")Code language: R (r)
Note, if you want to install stringr or tidyr just exchange “tidyverse” for e.g. “stringr”. In the next section, you will get a quick answer, without any details, on how to concatenate two columns in R.
How do I concatenate two columns in R?
To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df[‘AB’] <- paste(df$A, df$B)</code>. Note, however, that using <code>paste</code> will result in whitespace between the values in the new column.
Before we are going to have a more detailed look at how to use paste() to combine two columns, we are going to load an example dataset.
Reading Example Data from a .xlsx File
Here’s how to read a .xlsx file in R using the readxl package:
# Importing Example Data: library('readxl') dataf <- read_excel("combine_columns_in_R.xlsx")Code language: R (r)
Now, we can have a look at the structure of the imported data using the
We will also have a quick look at the first five rows using the
Now, in the images above, we can see that there are 5 variables and 7 observations. That is, there are 5 columns and 7 rows, in the tibble. Moreover, we can see the types of variables, and we can, of course, also use the column names. In the next section, we are going to start by concatenating the month and year columns using the paste() function.
- R Count the Number of Occurrences in a Column using dplyr
- How to Create a Matrix in R with Examples – empty, zeros
Concatenate Two Columns in R
Here’s one of the simplest ways to combine two columns in R using the
dataf$MY <- paste(dataf$Month, dataf$Year)Code language: R (r)
In the code above, we used $ in R to 1) create a new column but, as well, selecting the two columns we wanted to combine into one. Here’s the tibble with the new column, named MY:
In the next example, we will merge two columns and add a hyphen (“-”), as well. In a recent post, you will learn how to remove a row in R using e.g., dplyr. For more useful operators, and how to use them, see for example the post “How to use %in% in R: 7 Example Uses of the Operator“.
Concatenate Two Columns with – as a separator in R
Now, to add “-” (hyphen) between the values we want to combine, we add a third parameter to the
dataf$MY <- paste(dataf$Month, "-", dataf$Year)Code language: R (r)
In the code example above, we used the sep parameter and set it as “-”. As you can see, in the image below, we have whitespaces between the two values (i.e. “Month” and “Year”).
Now, using R’s
paste() function, we can add another parameter: the sep parameter. Here’s a code example combining the two columns, adding the “-” without the whitespaces:
dataf$MY <- paste(dataf$Month, dataf$Year, sep= "-")Code language: R (r)
Notice that instead of pasting the hyphen, we used it as a separator. Before moving on to the next example, it is worth pointing out that if we don’t want to add whitespace, we can use the paste0() function instead. This way, we don’t need the sep parameter. In the following example, we are going to have a look at how to combine multiple columns (i.e., three or more) in R.
Combine Multiple Columns in R
As you may have understood, combining more than 2 columns is as simple as adding a parameter to the
paste() function. Here’s how we combine three columns in R:
dataf$DMY <- paste(dataf$Date, dataf$Month, dataf$Year)Code language: R (r)
That was also pretty simple. It is worth, mentioning, that if you use the sep parameter, in a case as above, you will end up with whatever character you chose between each value from each column. For example, if we were to add the sep argument to the code above and put underscore (“_”) as a separator, here is how the resulting tibble would look like:
Now, you may understand that using the sep parameter lets you use almost any character to separate your combined values. In the next section, we will have a look at the str_c() function from the stringr package.
Concatenate Two Columns in R with the str_c() Function (stringr)
Combining two columns with the str_c() function is super simple. Here’s how to merge the columns “Snake” and “Size” using the str_c() function:
library(stringr) dataf$SnakeNSize <- str_c(dataf$Snake," ", dataf$Size)Code language: PHP (php)
Notice that we added something between the two columns we wanted to concatenate? When working with this function, we need to do this, or else we end up with nothing separating the two values we are combining. As previously mentioned, the stringr package is part of the Tidyverse packages which also includes packages such as tidyr and the unite() function. In the next section, we are going to merge two columns in R using the
unite() function as well.
- You may also like: How to Add a Column to a Dataframe in R with tibble & dplyr
Merge Columns in R with the unite() Function (tidyr)
Here’s how we concatenate two, or more, columns using the unite() function:
library(tidyverse) # or library(tidyr) dataf <- dataf %>% unite("DM", Date:Month)Code language: R (r)
Notice something in the code above. First, we used a new operator (i.e., %>%). Among a lot of things, this enables us to use unite() without the $ operator to select the columns. As you can see, in the code example above, we used two parameters. First, we name the new column we want to add (“DM”), second we select all the columns from “Date” to “Month” and combine them into the new column. Here’s the resulting dataframe/tibble:
As you can see in the image above, both columns that we combined have disappeared. If we want to keep the original columns after we have concatenated them, we can set the remove parameter to FALSE. Here’s a code chunk that you can use, instead to not remove the columns:
dataf <- dataf %>% unite("DM", Date:Month, remove = FALSE)Code language: R (r)
Finally, did you notice how we have an underscore as a separator? If we want to change to another separator, we can use the sep parameter. This is exactly what we will do in the next example:
Concatenate two Columns in R using “-” as a separator
Here’s how we use the unite() function together with the sep parameter to change the separator to “-” (hyphen):
dataf <- dataf %>% unite("DM", Date:Month, sep= "-", remove = FALSE)Code language: R (r)
That was as simple as the previous example. In the next section, you will learn which function I prefer to use and why.
Which Function is Best for Concatenating Columns in R?
Naturally, this section will contain my opinion. I have not done any optimization testing (e.g., I don’t know which function is the fastest when it comes to combining columns in R). That said, although all of the functions used in this post are simple to use I prefer the unite() function. Why? Well, together with the piping operator I think it makes the column very readable. It is, as well, very handy to use unite() if you are going to concatenate multiple columns in R. As you may have noticed, in the examples above, we can use “:” when combining columns. This means that we can merge multiple columns from the first column (i.e., left of the column sign) to the last column (i.e., right of the “:”). This is pretty neat and will definitely save some space in your code and make it easier to read!
Another neat thing is that we add the new column name as a parameter and we, automatically, get rid of the columns combined (if we don’t need them, later, of course). Finally, we can also set the na.rm parameter to TRUE if we want missing values to be removed before combining values. Here’s a Jupyter Notebook with all the code in this post.
In this post, you have learned how to concatenate two (or more) columns in R using three different functions. First, we used the paste() function from base R. Using this function, we combined two and three columns, changed the separator from whitespaces to hyphen (“-”). Second, we used the str_() function to merge columns. Third, we used the unite() function. Of course, it is possible (we saw some example of that) to change the separator using the two last functions as well. To conclude, the unite() function seems to be the handiest function to use to concatenate columns in R.
Hope you learned something! If you did, please leave a comment below, share on your social media, include a link to the post on your projects (e.g., blog posts, articles, reports), or become a Patreon:
Finally, if you have any suggestions, other comments, or there is something you wish me to cover: don’t hesitate to contact me.
- How to Calculate Five-Number Summary Statistics in R
- Learn How to Calculate Descriptive Statistics in R the Easy Way with dplyr
- How to Rename Column (or Columns) in R with dplyr
- R: Add a Column to Dataframe Based on Other Columns with dplyr
- How to Add an Empty Column to a Dataframe in R (with tibble)
First, let me say that your examples are very good and the way you organize the content of your pages makes it easy to follow. I follow this code in your example: “dataf$DMY <- paste(dataf$Date, dataf$Month, dataf$Year)" to get a new column using the text in two other columns. It kind of worked – it created the column and used the categorical identifiers in the two columns, however it added a space in between the two. I need no space in between. How do I modify the new column to eliminate the space. Or how can I modify the code to get the new variable without the space.
Thank you for your kind comment. When merging the columns, using
paste(), you can add the
separgument. For example,
dataf$DMY <- paste(dataf$Date, dataf$Month, dataf$Year, sep = "")will result in "10092021". Alternatively, you can use
paste0(). Removing whitespaces can be done in many ways, of course. But you can use
gsub(), for instance. Here's an example that should work:
dataf$DMY <- gsub(pattern = "\s", replacement = "", x = dataf$DMY)but I haven't tested it. Hope it helps,