In this remove a column in R tutorial, we are going to work with dplyr to delete a column. Here, we are going to learn how to remove columns in R using the
select() function. Specifically, we are going to remove columns by name and by index. Note, as the name implies this function can be used to select certain columns in R, as well.
Finally, we will also learn how to remove columns from R dataframes that start with a letter or a word, ends with a letter or word, or contains a character (like the underscore). For example, removing a column can be done either before, or after, you add an empty column to dataframe in R.
To follow this R tutorial, on how to delete columns in R, some basic knowledge in how to use R is needed. Furthermore, we need to have R and dplyr (or Tidyverse) installed. Make sure that the latest version of R is installed (it can be downloaded here).
Installing R-packages (i.e., dplyr or Tidyverse)
Installing R-packages is quite easy, to install dplyr we can just type
install.packages(“dplyr”). If we, on the other hand and want to install the whole Tidyverse package we type
In this section, of the remove column in R tutorial, we are going to learn how to load an r-package. Loading R-packages is quite easy; we just type
library(tidyverse), if we want to load dplyr, or the entire Tydiverse package, respectively.
It may be worth mentioning, here, that the Tidyverse package comes with a range of different good packages that can be used for other things. That is, even though this tutorial is focused on how to use dplyr to remove columns. For example, you can for use some of the packages to create dummy variables in R, extract year from datetime, extract day from datetime, and extract time from datetime.
How do I Delete a Column in Dplyr?
Deleting a column using dplyr is very easy using the
select() function and the
- sign. For example, if you want to remove the columns “X” and “Y” you’d do like this:
select(Your_Dataframe, -c(X, Y)). Note, in that example, you removed multiple columns (i.e. 2) but to remove a column by name in R, you can also use dplyr, and you’d just type:
select(Your_Dataframe, -X). Finally, if you want to delete a column by index, with dplyr and select, you change the name (e.g. “X”) to the index of the column:
How do I Remove the First Column in R?
The absoultely simplest way to delete the first column in R is to use the brackets (
NULL to the first column (put “1” between the brackets!). It is also very easy to remove the first column using dplyr’s
select() function. Just add your dataframe as first parameter and the number 1 as the second with a minus sign infront of it (i.e “-1”).
Now, before we start to use dplyr to remove columns, we need to load some data that we can practice to delete columns from. In this tutorial, we are going to start to drop columns from the Starwars data set that is available in the dplyr package:
# Loading Example Data: data("starwars", package = "dplyr") # Checking the first 5 rows of the dataset: head(starwars)
Data can, of course, be imported from different formats. In fact, when working with real data it will, of course, not be stored in R. Learn more about importing data in R in the following tutorials:
Now that we have some example data we can go to the next section where we start to clean the dataframe from variables that we don’t really need. In the next section, we will use dplyr to remove a column by its name.
How to Remove a Column by Name in R using dplyr
In the first example, we are going to drop one column by its name. To delete a column by the column name is quite easy using dplyr and select. First, we are going to use the select() function and we will use the name of the dataframe from which we want to delete a column as the first argument. Here’s how to remove a column in R with the
# Dplyr remove a column by name: select(starwars, -height)
As you can see, we used the name of the column (i.e, “height”) as the second argument. Here we used the “-” to tell the
select() function that this is the column we want to drop from the dataframe. Note, if you want the column to stay removed from the dataframe, you have to assign the dataframe. In the next example, we will drop a column by it’s index.
- How to Transpose a Dataframe or Matrix in R with the t() Function
- How to use %in% in R: 7 Example Uses of the Operator
How to Drop a Column by Index in R using dplyr
In the second delete a column in R example, we are going to drop one column by its index. This is also very easy and we are going to use dplyr and select again. Here’s how to remove a column in R if we know the index for that column:
Code language: R (r)
# Dplyr remove column by index: select(starwars, -1)
Notice, how we this time removed the first column from the dataframe in R. That is, we did not delete the same column like in the example when we removed the column by name. Again, the “-” sign means that we want to drop the variable at this index (i.e, 1). In the next section, we will go on and see that the same general idea, that we have learned here, can be used to remove multiple columns with dplyr (i.e, with the select() function). Note, sometimes you have to clean your data in more ways. For example, you can also use R to remove duplicate rows and columns.
How to Remove the Last Column in R
Here’s how we can use
select() and the helper function
last_col() to delete the last column in R:
That was pretty simple, right. All we did was adding the function and the minus sign as the second parameter and we deleted the last column.
How to Delete Columns by Names in R using dplyr
In this section we, are going to delete many columns in R. First, we are going to delete multiple columns from a dataframe by their names. To drop many columns, by their names, we just use the
c() function to define a vector. In this vector, we are going to add each of the names of the columns we want to remove. Here’s how to use dplyr to remove columns by name:
# Dplyr remove multiple columns by name: select(starwars, -c(name, height, mass))
Notice, again, that we used the “-” to remove the columns from the dataframe, much like when we removed one column by name in R. Remember, if you want the change to the dataframe to be permanent, you will have to assign the dataframe to a variable. Note, that we have removed variables (columns) now but we can, of course, also insert new variables. For example, with tibble we can add empty columns to the dataframe in R.
Remove Columns by Index in R using select()
In the second example on how to remove multiple columns, we are going to drop the columns from dataframe, in R, by indexes. Again, we use the
c() function and put in the indexes we want to remove from the dataframe.
# delete multiple columns by index using dplyr: select(starwars, -c(1, 2, 3))
Note, the above code example drops the 1st, 2nd, and 3rd columns from the R dataframe. That is, the same columns we deleted using the variable names, in the previous section of the remove variables from a dataframe in R tutorial. If we want to delete the 3rd, 4th, and 6th columns, for instance, we can change it to
-c(3, 4, 6). Furthermore, you can use both : and seq() to create a sequence of numbers in R. This means that if you want to remove many columns by their indexes you can generate the indexes. For example, if we wanted to use dplyr to remove columns 1 to 6 we can use the following code:
Code language: PHP (php)
select(starwars, -c(1:6)) # Alternative: # select(starwars, -seq(1, 6))
Notice how there is one line of code commented out. This is because both of the above examples produce the same results as they, as previously mentioned, they both generate numbers in a sequence.
How to Drop Columns Starting with using the starts_with() function
In this section, we are going to use the
starts_with() function to remove a column in R. For instance, if we want to remove a column, from a dataframe, that starts with the letter “g” we use the following command:
# dplyr dropping columns starting with a letter: select(starwars, -starts_with("f"))
As you can see, in the image above, we removed columns starting with a specific letter. Again, as in the previous examples, we used the “-” to tell select that we don’t want the columns starting with the letter “f”.
Removing Columns in R Starting with a Specific Letter
In this example, we are going to learn how to remove columns in R starting with a specific letter. In this case, we will remove all columns that start with the letter “s”. Note, however, we could also remove all columns starting with a certain word. If our dataframe contained such variables, that is. Now, to remove columns in R starting with a letter (i.e., “s”) we just do the following:
# deleting columns starting with the letter "s": select(starwars, -starts_with("s"))
Dropping a Column ending With a Character using the ends_with() function
Now we will continue by removing a column from a dataframe that ends with a specific word. For instance, if we want to remove a column ending with the word “year”, we will use the
ends_with() function like this:
# Dropping columns ending with a letter: select(starwars, -ends_with("r"))
In the code chunk above, we removed all columns that ends with the letter “r”. Here’s the resulting dataframe with the deleted variables:
Now that we know how to use dplyr to a drop a column ending with a letter, we will continue and applying the same method to drop variables ending with a word.
How to Remove Columns Ending with a Word in R
Now, we will continue using the
ends_with() function. In this case, however, we may use it in a more “real world” application. If we have multiple columns, ending with a certain word, we can remove all of these columns from the R dataframe using
ends_with(). For example, if we want to remove columns in R that ends with the word “color” we do as follows:
# removing multiple columns with dplyr, ending with a word: select(starwars, -ends_with("color"))
Deleting a Column from an R dataframe using the contains() function
In the final example of how to remove columns from an R dataframe we are going to use the contains() function. This is handy if we want to remove all columns containing a certain word, or character. For instance, if we want to remove all columns containing the underscore (“_”) we just type the following:
# dplyr remove columns containing character in name: select(starwars, -contains("_"))
Now that you’ve dropped columns you can go ahead and do some other data manipulation tasks. For instance, if your dataset happens to contain date, and you want to extract timestamps from datetime, you can now go ahead and do it.
Final Example on How to Remove a Column in R
Now, in this final, how to delete a column in R example we are going to use the pipe, “%>%”, and save the dataframe as a new dataframe.
# dplyr remove columns and saving it to new dataframe: new_df <- starwars %>% select(-contains("_")) head(new_df)
As can be seen in the image above, we have removed all columns, from the R dataframe, that contained the underscore. We also created a new dataframe, called new_df, and used the head() function to print the first 5 rows.
Now that we have dropped the columns we want to we can carry on doing descriptive statistics in R and creating a scatter plot in R. Note, there may be more data manipulation that needs to be done before we do this and the next step (e.g., repeated measures ANOVA in R).
Conclusion: Dropping Columns from Dataframe in R
In conclusion, removing a column in R was pretty easy to do. In this tutorial, we have dropped one column by name, and index, we have deleted multiple columns by name, and indexes. Furthermore, we have removed columns in R dataframes starting with, ending with, and containing, letters, words, and characters.
Support my blog so that I can create more content that you may find useful: become a patron.
I’d very much appreciate any pledge, especially if you use an adblocker.
As a final note, if we want to remove many columns we can use select without the minus sign (“-“). This will select specific columns that we may want to keep.