In this remove a column in R tutorial, we are going to work with dplyr to delete a column. Here, we are going to learn how to remove columns in R using the
select() function. Specifically, we are going to remove columns by name and by index.
Finally, we will also learn how to remove columns from R dataframes that start with a letter, or a word ends with a letter, or word, or contains a character (like the underscore).
To follow this R tutorial, on how to delete columns in R, some basic knowledge in how to use R is needed. Furthermore, we need to have R and dplyr (or Tidyverse) installed. Make sure that the latest version of R is installed (it can be downloaded here).
Installing R-packages (i.e., dplyr or Tidyverse)
Installing R-packages is quite easy, to install dplyr we can just type
install.packages(“dplyr”). If we, on the other hand and want to install the whole Tidyverse package we type
In this section, of the remove column in R tutorial, we are going to learn how to load an r-package. Loading R-packages is quite easy; we just type
library(tidyverse), if we want to load dplyr, or the entire Tydiverse package, respectively.
Example Data to Drop Columns from
Now, we need to load some data that we can practice to remove or delete columns from. In this tutorial, we are going to start to drop columns from the Starwars data set that is available in the dplyr package:
data("starwars", package = "dplyr") head(starwars)
Data can, of course, be imported from different formats. In fact, when working with real data it will, of course, not be stored in R. Learn more about importing data in R in the following tutorials:
How to Remove a Column by Name in R using dplyr
In the first delete a column in R example, we are going to drop one column by its name. To delete a column by the column name is quite easy using dplyr and select. First, we using the
select() function and we put in the name of the dataframe from which we want to delete a column. In the drop a column in the R example below, we are going to drop the height column.
How to Drop a Column by Index in R using dplyr
In the second delete a column in R example, we are going to drop one column by its index. This is also very easy and we are going to use dplyr and select again. Here’s how to remove a column in R if we new the index for that column:
Notice, how we this time removed the first column from the dataframe in R. That is, we did not delete the same column like in the example when we were using the column name.
How to Delete Columns by Names in R using dplyr
In this section we, are going to delete many columns in R. First, we are going to delete multiple columns from a dataframe using their names. To drop many columns, by their names, we just use the
c() function to define a vector. In this vector, we are going to put the names of the columns we want to remove:
select(starwars, -c(name, height, mass))
Remove Columns by Index in R using select()
In the second example on how to remove multiple columns, we are going to drop the columns from dataframe, in R, by indexes. Again, we use the
c() function and put in the indexes we want to remove from the dataframe.
code class="lang-r">select(starwars, -c(1, 2, 3))
Note, above code drops 1st, 2nd, and 3rd columns from the R dataframe. That is, the same columns we deleted using the variable names, in the previous section of the remove variables from a dataframe in R tutorial. If we want to delete the 3rd, 4th and 6th columns, for instance, we can change it to
-c(3, 4, 6).
How to Drop Columns Starting with using the starts_with() function
In this section, we are going to use the
starts_with() function to remove a column in R. For instance, if we want to remove a column, from a dataframe, that starts with the letter “g” we use the following command:
Removing Columns in R Starting with a Specific Letter
In this example, we are going to learn how to remove columns in R starting with a specific letter. In this case, we will remove all columns that start with the letter “s”. Note, however, we could also remove all columns starting with a certain word. If our dataframe contained such variables, that is. Now, to remove columns in R starting with a letter (i.e., “s”) we just do the following:
Dropping a Column ending With a Character using the ends_with() function
Now we will continue by removing a column from a dataframe that ends with a specific word. For instance, if we want to remove a column ending with the word “year”, we will use the
ends_with() function like this:
How to Remove Columns Ending with a Word in R
Now, we will continue using the
ends_with() function. In this case, however, we may use it in a more “real world” application. If we have multiple columns, ending with a certain word, we can remove all of these columns from the R dataframe using
ends_with(). For example, if we want to remove columns in R that ends with the word “color” we do as follows:
Deleting a Column from an R dataframe using the contains() function
In the final example of how to remove columns from an R dataframe we are going to use the contains() function. This is handy if we want to remove all columns containing a certain word, or character. For instance, if we want to remove all columns containing the underscore (“_”) we just type the following:
Final Example on How to Remove a Column in R
Now, in this final, how to delete a column in R example we are going to use the pipe, “%>%”, and save the dataframe as a new dataframe.
new_df <- starwars %>% select(-contains("_")) head(new_df)
As can be seen in the image above, we have removed all columns, from the R dataframe, that contained the underscore. We also created a new dataframe, called new_df, and used the head() function to print the first 5 rows.
Now that we have dropped the columns we want to we can carry on doing descriptive statistics in R and creating a scatter plot in R. Note, there may be more data manipulation that needs to be done before we do this and the next step (e.g., repeated measures ANOVA in R).
Conclusion: Dropping Columns from Dataframe in R
In conclusion, removing a column in R was pretty easy to do. In this tutorial, we have dropped one column by name, and index, we have deleted multiple columns by name, and indexes. Furthermore, we have removed columns in R dataframes starting with, ending with, and containing, letters, words, and characters.
As a final note, if we want to remove many columns we can use select without the minus sign (“-“). This will select specific columns that we may want to keep.