In this R tutorial, you are going to learn how to add a column to a dataframe based on values in other columns. Specifically, you will learn to create a new column using the mutate() function from the package dplyr, along with some other useful functions.
Finally, we are also going to have a look on how to add the column, based on values in other columns, at a specific place in the dataframe. This will be done using the add_column() function from the Tibble package.
It is worth noting, that both tibble and dplyr are part of the Tidyverse package. Apart from adding columns to a dataframe, you can use dplyr to remove columns, with the select() function, for example.
In this post, we will first learn how to install the r-packages that we are going to use. Second, we are going to import example data that we can play around with and add columns based on conditions. After we have a dataframe, we will then go on and have a look at how to add a column to the dataframe with values depending on other columns. In these sections, we will use the mutate() and add_column() functions to accomplish the same task. That is, we will use these R functions to add a column based on conditions.
As this is an R tutorial, you will, of course, need to have R and, at least, the dplyr package installed. If you want to e.g. easily add a column, based on values in another column, at a specific position I would suggest that you install tibble. Furthermore, if you are going to read the example .xlsx file you will also need to install the readr package. Note, however, that if you install the tidyverse package you will get tibble, dplyr and readr, among a lot of other useful packages.
Installing Tidyverse enables you to easily calculate descriptive statistics, visualize data (e.g., scatter plots with ggplot2). Furthermore, there’s another useful package, that is part of the Tidyverse package, called lubridate. Lubridate is very handy if you are working with time-series data. For example, you can use the functions of this package to extract year from date in R as well as extracting day and extracting time. As usual, when installing r-packages we use the
install.packages(c('tibble', 'dplyr', 'readr'))
Note. if you want to install all packages available in the tidyverse package just exchange the character vector for ‘tidyverse’ (
install.packages('tidyverse')). Now that you should be set with these useful packages we can start reading the example Excel file.
Here’s how to read an xlsx file in R using
read_xlsx function from the readxl package:
library(readxl) # reading the xlsx file: depr_df <- read_excel('./SimData/add_column.xlsx')
In the code chunk above, we imported the Excel file that can be downloaded here. This file needs, furthermore, to be placed in the same directory as the R script (or change the path to the .xlsx file). Notice that we used the skip argument to skip the first two rows. Finally, we can have a glimpse of the data by using the head() function:
In the output, we can see that our dataset contains the following columns:
- ID – Subject ID
- Depr1 – First item on a depression scale
- Depr2 – Second item
- Depr3 – And so on…
- Depr4 – …
Note that all variables in this data set are made up and, thus, the data makes no sense. We are, of course, only going to use it so that we can practice adding new columns based on conditions on values in other columns. Now that we have our data we are jumping into the first example directly!
Add a Column to a Dataframe Based on Other Column
If we want to add a column based on the values in another column we can work with dplyr. Here’s how to append a column based on what the factor ends with in a column:
library(dplyr) # Adding column based on other column: depr_df %>% mutate(Status = case_when( endsWith(ID, "R") ~ "Recovered", endsWith(ID, "S") ~ "Sick" ))
As you can see, in the code chunk above, we used the
%>% operator and the
mutate() function together with the
endsWith() functions. Furthermore, we created the “Status” column (in mutate) and if the factor ended with R the value in the new column will be “Recovered”. On the other hand, if the factor is ending with S, the value in the new column will be “Sick”. Here’s the resulting dataframe to which we appended the new column:
%>% operator is very handy and, of course, there are more nice operators, as well as functions, in R statistical programming environment. See the following posts for more inspiration (or information):
- How to use %in% in R: 7 Example Uses of the Operator
- Learn How to Generate a Sequence of Numbers in R with :, seq() and rep()
- How to use the Repeat and Replicate functions in R
Add a Column to a Dataframe in R Based on Values in Other Columns
In the first example, we are going to add a new column based on whether the values in the columns “A” and “B” match. Here’s how to add a new column to the dataframe based on the condition that two values are equal:
# R adding a column to dataframe based on values in other columns: depr_df <- depr_df %>% mutate(C = if_else(A == B, A + B, A - B))
In the code example above, we added the column “C”. Here we used dplyr and the
mutate() function. As you can see, we also used the
if_else() function to check whether the values in column “A” and “B” were equal. If they were equal, we added the values together. If not, we subtracted the values. Here’s the resulting dataframe with the column added:
Notice how there was only one row in which the values matched and, in that column, our code added the values together. Of course, if we wanted to create e.g. groups based on whether the values in two columns are the same or not we can use change some things in the
if_else() function. For example, we can use this code:
# creating a column to dataframe based on values in other columns: depr_df <- depr_df %>% mutate(C = if_else(A == B, "Equal", "Not Equal"))
In the next code example, we are going to create a new column summarizing the values from five other columns. This can be useful, for instance, if we have collected data from e.g. a questionnaire measuring psychological constructs.
Create a New Column in an R dataframe Based on Values from other Columns
Here we are going to use the values in the columns named “Depr1” to “Depr5” and summarize them to create a new column called “DeprIndex”:
# Adding new column based on the sum of other columns: depr_df <- depr_df %>% rowwise() %>% mutate(DeprIndex = sum(c_across(Depr1:Depr5)))
To explain the code above, here we also used the
rowwise() function before the
mutate() function. As you may understand, we use the first function to perform row-wise operations. Furthermore, we used the
sum() function to summarize the columns we selected using the
In the final example, we are continuing working with these columns. However, we are going to add a new column based on different cutoff values. That is, we are going to create multiple groups out of the score summarized score we have created.
Adding a Column to a dataframe in R with Multiple Conditions
In this example, we are going to create a new column in the dataframe based on 4 conditions. That is, we are going to use the values in the “DeprIndex” column and create 3 different groups depending on the value in each row.
# Multiple conditions when adding new column to dataframe: depr_df %>% mutate(Group = case_when(DeprIndex <= 15 ~ "A", DeprIndex <= 20 ~ "B", DeprIndex >= 21 ~ "C") )
Again, we used mutate() together with case_when(). Here, in this example, we created a new column in the dataframe and added values based on whether “DeprIndex” was smaller or equal to 15, smaller or equal to 20, or larger or equal to 25.
This is cool! We’ve created another new column that categorizes each subject based on our arbitrary depression scale. We could now go on and calculate descriptive statistics in R, by this new group, if we want to. In the final example, we are going to use Tibble and the
add_column() function that we used to add an empty column to a dataframe in R.
Append a Column based on Conditions & at a Specific Spot in the Dataframe
In the final example, we are going to use add_column() to append a column, based on values in another column. Here’s how to append a column based on whether a value, in on columns, is larger than given value:
library(tibble) depr_df <- depr_df %>% add_column(Is_Depressed = if_else(.$DeprIndex < 18, TRUE, FALSE), .after="ID")
Notice how we now use tibble and the add_column() function. Again, we use the %>% operator and then in the function we are using if_else(). Here’s the trick we used “.$” to access the column “DeprIndex” and if the value is larger than 18 we add TRUE to the cell in the new column. Obviously, if it is smaller FALSE will be added. The new column that we have created is called “Is_Depressed” and is a boolean:
Importantly, to add the new column at a specific position we used the .after argument. As you can see, in the image above, we created the new column after the “ID” column. If we want to append our column before a specific column we can use the .before argument.
In this R tutorial, you have learned how to add a column to a dataframe based on conditions and/or values in other columns. First, we had a look at a simple example in which we created a new column based on the values in another column. Second, we appended a new column based on a condition. That is, we checked whether the values in the two columns were the same and created a new column based on this. In the third example, we had a look at more complex conditions (i.e., 3 conditions) and added a new variable with 3 different factor levels. Finally, we also had a look at how we could use <code>add_column()</code> to append the column where we wanted it in the dataframe.
Hope you found this post useful! If you did, make sure to share the post to show some love! Also, you can become a Patreon to support my work. Finally, make sure you leave a comment if you want something clarified or you found an error in the post!