In this R tutorial, you will learn how to select columns in a dataframe. First, we will use base R, in a number of examples, to choose certain columns. Second, we will use dplyr to get columns from the dataframe.

Outline

In the first section, we are going to have a look at what you need to follow this tutorial. Second, we will answer some questions that might have brought you to this post. Third, we are going to use base R to select certain columns from the dataframe. In this section, we are also going to use the great operator %in% in R to select specific columns. Fourth, we are going to use dplyr and the select() family of functions. For example, we will use the select_if() to get all the numeric columns and some helper functions. The helper functions enable us to select columns starting with, or ending with, a certain word or a specific character, for instance.

select columns in R with [] and dplyr
  • Save

Note, the select_if() function is also great if you, for example, want to take the absolute value in R dataframe and only select the numerical columns.

How do I select a column in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

How do I select a column in R Dplyr?

If you want to use dplyr to select a column in R you can use the select() function. For instance, select(Data, 'Column_to_Get') will get the column “Column_to_Get” from the dataframe “Data”.

In the next section, we are going to learn about the prerequisites of this post and how to install R packages such as dplyr (or Tidyverse).

Prerequisites

To follow this post you, obviously, need a working installation of R. Furthermore, we are going to use the read the example data from an Excel file using the readxl package. Moreover, if you want to use dplyr’s select() and the different helper functions (e.g., startsWith(), endsWith()) you also need to install dplyr. It may be worth pointing out, that just by using the “-“-character you can use select() (from dplyr) to drop columns in R.

  • Save

It may be worth to point out that both readxl and dplyr are part of the tidyverse. Tidyverse comes with a number of great packages that are packed with great functions. Besides selecting, or removing, columns with dplyr (part of Tidyverse) you can extract year from date in R using the lubridate package, create scatter plots with ggplot2, and calculate descriptive statistics. That said, you can install one of these r-packages, depending on what you need, using the install.packages() function. For example, installing dplyr is done by running this in R: install.packages(c('dplyr', 'readxl')).

Example Data

Before we continue and practice selecting columns in R, we will read data from a .xlsx file.

library(readxl) dataf <- read_excel("add_column.xlsx") head(dataf)
Code language: R (r)
Example data to select columns from
  • Save

This example dataset is one that we used in the tutorial, in which we added a column based on other columns. We can see that it contains 9 different columns. If we want to, we can check the structure of the dataframe so that we can see what kind of data we have.

str(dataf)
Code language: R (r)
data types
  • Save

Now, we see that there are 20 rows, as well, and that all but one column is numeric. In a more recent post, you can learn how to rename columns in R with dplyr. In the next section, we are going to learn how to select certain columns from this dataframe using base R.

How to Select Certain Columns using Base R

In this section, we are going to practice selecting columns using base R. First, we will use the column indexes and, second, we will use the column names.

Example 1: Selecting Columns by Index

Here’s one example on how to select columns by their indexes in R:

dataf[, c(1, 2, 3)]
Code language: R (r)
First 6 rows of the selected columns in R
  • Save
First 6 rows of selected columns

As you can see, we selected the first three columns by using their indexes (1, 2, 3). Notice, how we also used the “,” within the brackets. This is done to get the columns rather than subsetting rows (i.e., by placing the “,” after the vector with indexes). Before moving on to the next example it may be worth knowing that the vector can contain a sequence. For instance, we can generate a sequence of numbers using :. For example, replacing c(1, 2, 3) with c(1:3) would give us the same output, as above. Naturally, we can also select e.g. the third, fifth, and the sixth column if we want to. In the next example, we are going to subset certain columns by their name. Note, sequences of numbers can also be generated in R with the seq() function.

Example 2: Selecting Specific Columns by their Names

Here’s how we can select columns in R by name:

dataf[, c('A', 'B', 'Cost')]
Code language: R (r)
First 6 rows of the columns selected by name
  • Save
Selected columns

In the code chunk above, we basically did the same as in the first example. Notice, however, how we removed the numbers and added the column names. In the vector, that is, we now used the names of the column we wanted to select. Ín the next example, we are going to learn a neat little trick by using the %in% operator when selecting columns by name.

Example 3: Using the %in%

Here’s how we can make use of the %in% operator to get columns by name from the R dataframe:

head(dataf[, (colnames(dataf) %in% c('Depr1', 'Depr2', 'Depr4', 'Depr7'))])
Code language: R (r)
  • Save

In the code chunk above, we used the great %in% operator. Notice something diffrent in the character vector? There’s a column that doesn’t exist in the example data. The cool thing, here, is that even though if we do this when using the %in% operator, we will get the columns that actually exists in the dataframe selected. In the next section, we are going to have a look at a couple of examples using dplyr’s select() and some of the great helper functions.

How to Select Columns in R with dplyr

In this section, we will start with the basic examples of selecting columns (e.g., by name and index). However, the focus will be on using the helper functions together with select(), and the select_if() function.

Example 4: Subsetting Columns by Index Using the select() Function

Here’s how we can get columns by index using the select() function:

library(dplyr) dataf %>% select(c(2, 5, 6))
  • Save

Notice how we used another great operator: %>%. This is the pipe operator and following this, we used the select() function. Again, when selecting columns with base R, we added a vector with the indexes of the columns we want. In the next example, we will basically do the same but select by column names.

Example 5: Getting Columns by Name with select()

Here’s how we use select() to get the columns we want by name:

library(dplyr) dataf %>% select(c('A', 'Cost', 'Depr1'))
Code language: R (r)

n the code chunk above, we just added the names of the columns in the vector. Simple! In the next example, we are going to have a look at how to use select_if() to select columns with containing data of a specific data type.

Example 6: Selecting All Numeric Columns in R

Here’s how to select all the numeric columns in an R dataframe:

dataf %>% select_if(is.numeric)
Code language: CSS (css)
  • Save

Remember, all columns except for one are of numeric type. This means that we will get 8 out of 9 columns running the above code. If we, on the other hand, added the is.character function we would only select the first column. In the next section, we will learn how to get columns starting with a certain letter.

Example 7: Select Columns Starting with a Certain Letter

Here’s how we use the starts_with() helper function and select() to get all columns starting with the letter “D”:

dataf %>% select(starts_with('D'))
Code language: R (r)

Selecting columns with names starting with a certain letter was pretty easy. In the starts_with() helper function we just added the letter.

Example 8: Select Columns Ending with a Specific Letter

Here’s how we use the ends_with() helper function and select() to get all columns ending with the letter “D”:

dataf %>% select(ends_with('D'))
Code language: R (r)

Note, that in the example dataset there is only one column ending with the letter “D”. In fact, all column names are ending with unique characters. That is, here it would not make sense to select columns using this method. It is worth noting here, that we can use a word when working with both the starts_with() and ends_with() helper functions. Let’s have a look!

Example 9: Select Columns Starting with a Certain Word

Here’s how we can select certain columns starting with a specific word:

dataf %>% select(starts_with('Depr'))
Code language: R (r)

Of course, “Depr” is not really a word, and, yes, we get the exact same columns as in example 7. However, you get the idea and should understand how to use this in your own application. One example, when this makes sense to do, is when having multiple columns beginning with the same letter but some of them beginning with the same word. In the final example, we are going to select certain column names that are containing a string (or a word).

Example 10: Select Columns Containing a Certain String

Here’s how we can select certain columns starting with a string:

dataf %>% select(starts_with('Depr'))
Code language: R (r)

Of course, “Depr” is not really a word, and, yes, we get the exact same columns as in example 7. However, you get the idea and should understand how to use this in your own application. One example, when this makes sense to do, is when having multiple columns beginning with the same letter but some of them beginning with the same word. Before going to the next section, it may be worth mentioning another great feature of the dplyr package. You can use dplyr to rename factor levels in R. In the final example, we are going to select certain column names that are containing a string (or a word).

Example 11: Select Columns Containing a Certain String

Here’s how we can select certain columns starting with a string:

dataf %>% select(contains('pr'))
Code language: R (r)
  • Save

Again, this particular example doesn’t make sense on the example dataset. There’s a final helper function that is worth mentioning: matches(). This function can be used to check whether column names contain a pattern (regular expression) such as digits. Now that you have selected the columns you need, you can continue manipulating your data and get it ready for data analysis. For example, you can now go ahead and create dummy variables in R or add a new column.

Conclusion

In this post, you have learned how to select certain columns using base R and dplyr. Specifically, you have learned how to get columns, from the dataframe, based on their indexes or names. Furthermore, you have learned to select columns of a specific type. After this, you learned how to subset columns based on whether the column names started or ended with a letter. Finally, you have also learned how to select based on whether the columns contained a string or not. Hope you found this blog post useful. If you did, please share it on your social media accounts, add a link to the tutorial in your project reports and such, and leave a comment below.

how to select certain columns in R
  • Save
Share via
Copy link
Powered by Social Snap