In this R tutorial, you will learn how to select columns in a dataframe. First, we will use base R, in several examples, to choose certain columns. Second, we will use dplyr to get columns from the dataframe.
Outline
In the first section, we will look at what you need to follow in this tutorial. Second, we will answer some questions that might have brought you to this post. Third, we will use base R to select certain columns from the dataframe. In this section, we are also going to use the great operator %in%
in R to select specific columns. Fourth, we will use dplyr and the select()
family of functions. For example, we will use the select_if()
function to get all the numeric columns and some helper functions. The helper functions enable us to select columns starting with or ending with a certain word or a specific character, for instance.
![select columns in R with [] and dplyr](https://www.marsja.se/wp-content/uploads/2020/11/how_to_get_certain_columns_in_R_examples.jpg)
Note, the select_if()
function is also great if you, for example, want to take the absolute value in R dataframe and only select the numerical columns.
How do I select a column in R?
To select a column in R, you can use brackets, e.g., YourDataFrame['Column']
will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B')
it will take the columns named “A” and “B” from the dataframe.
How do I select a column in R Dplyr?
If you want to use dplyr to select a column in R, you can use the select() function. For instance, select(Data, ‘Column_to_Get’) it will get the column “Column_to_Get” from the dataframe “Data”.
In the next section, we are going to learn about the prerequisites of this post and how to install R packages such as dplyr (or Tidyverse).
Prerequisites
To follow this post, you, obviously, need a working installation of R. Furthermore, we are going to use the read the example data from an Excel file using the readxl package. Moreover, if you want to use dplyr’s select() and the different helper functions (e.g., startsWith(), endsWith()) you also need to install dplyr. It may be worth pointing out, that just by using the “-“-character, you can use select() (from dplyr) to drop columns in R.

It may be worth pointing out that both readxl and dplyr are part of the tidyverse. Tidyverse comes with a number of great packages that are packed with great functions. Besides selecting or removing, columns with dplyr (part of Tidyverse) you can extract year from date in R using the lubridate package, create scatter plots with ggplot2, and calculate descriptive statistics. That said, you can install one of these r-packages, depending on what you need, using the install.packages()
function. For example, installing dplyr is done by running this in R: install.packages(c('dplyr', 'readxl'))
.
Example Data
Before we continue and practice selecting columns in R, we will read data from a .xlsx file.
library(readxl)
dataf <- read_excel("add_column.xlsx")
head(dataf)
Code language: R (r)

This example dataset is one that we used in the tutorial, in which we added a column based on other columns. We can see that it contains nine different columns. If we want to, we can check the structure of the dataframe so that we can see what kind of data we have.
str(dataf)
Code language: R (r)

Now, we see that there are 20 rows, as well, and that all but one column is numeric. In a more recent post, you can learn how to rename columns in R with dplyr. In the next section, we will learn how to select certain columns from this dataframe using base R.
How to Select Certain Columns using Base R
In this section, we are going to practice selecting columns using base R. First, we will use the column indexes, and second, we will use the column names.
Example 1: Selecting Columns by Index
Here’s one example of how to select columns by their indexes in R:
dataf[, c(1, 2, 3)]
Code language: R (r)

As you can see, we selected the first three columns by using their indexes (1, 2, 3). Notice how we also used the “,” within the brackets. This is done to get the columns rather than subsetting rows (i.e., by placing the “,” after the vector with indexes). Before moving on to the next example, it may be worth knowing that the vector can contain a sequence. For instance, we can generate a sequence of numbers using “:”. For example, replacing c(1, 2, 3) with c(1:3) would give us the same output as above. Naturally, we can also select e.g. the third, fifth, and sixth columns if we want to. In the next example, we are going to subset certain columns by their name. Note sequences of numbers can also be generated in R with the seq() function.
Example 2: Selecting Specific Columns by their Names
Here’s how we can select columns in R by name:
dataf[, c('A', 'B', 'Cost')]
Code language: R (r)

In the code chunk above, we did the same as in the first example. Notice how we removed the numbers and added the column names. In the vector, we now used the names of the column we wanted to select. In the next example, we will learn a neat little trick by using the %in% operator when selecting columns by name.
Example 3: Using the %in%
Here’s how we can make use of the %in% operator to get columns by name from the R dataframe:
head(dataf[, (colnames(dataf) %in% c('Depr1', 'Depr2',
'Depr4', 'Depr7'))])
Code language: R (r)

In the code chunk above, we used the great %in% operator. Notice something diffrent in the character vector? There’s a column that doesn’t exist in the example data. The cool thing, here, is that even though if we do this when using the %in% operator, we will get the columns that actually exists in the dataframe selected. In the next section, we are going to have a look at a couple of examples using dplyr’s select()
and some of the great helper functions.
How to Select Columns in R with dplyr
In this section, we will start with the basic examples of selecting columns (e.g., by name and index). However, the focus will be on using the helper functions together with select()
, and the select_if()
function.
Example 4: Subsetting Columns by Index Using the select() Function
Here’s how we can get columns by index using the select()
function:
library(dplyr)
dataf %>%
select(c(2, 5, 6))

Notice how we used another great operator: %>%. This is the pipe operator, and following the pipe operator, we used the select() function. Again, when selecting columns with base R, we added a vector with the indexes of the columns we wanted. In the next example, we will do the same but select by column names.
Example 5: Getting Columns by Name with select()
Here’s how we use select()
to get the columns, we want by name:
library(dplyr)
dataf %>%
select(c('A', 'Cost', 'Depr1'))
Code language: R (r)
n the code chunk above, we just added the names of the columns in the vector. Simple! In the next example, we are going to have a look at how to use select_if()
to select columns containing data of a specific data type.
Example 6: Selecting All Numeric Columns in R
Here’s how to select all the numeric columns in an R dataframe:
dataf %>%
select_if(is.numeric)
Code language: CSS (css)

Remember, all columns except for one are of numeric type. This means we will get 8 out of 9 columns running the above code. If we, on the other hand, added the is.character
function we would only select the first column. In the next section, we will learn how to get columns starting with a certain letter.
Example 7: Select Columns Starting with a Certain Letter
Here’s how we use the starts_with()
helper function and select()
to get all columns starting with the letter “D”:
dataf %>%
select(starts_with('D'))
Code language: R (r)
Selecting columns with names starting with a certain letter was pretty easy. In the starts_with()
helper function, we just added the letter.
Example 8: Select Columns Ending with a Specific Letter
Here’s how we use the ends_with()
helper function and select()
to get all columns ending with the letter “D”:
dataf %>%
select(ends_with('D'))
Code language: R (r)
Note, that in the example dataset there is only one column ending with the letter “D”. In fact, all column names end with unique characters. That is, here it would not make sense to select columns using this method. It is worth noting here that we can use a word when working with both the starts_with()
and ends_with()
helper functions. Let’s have a look!
Example 9: Select Columns Starting with a Certain Word
Here’s how we can select certain columns starting with a specific word:
dataf %>%
select(starts_with('Depr'))
Code language: R (r)
Of course, “Depr” is not a word, and yes, we get the same columns as in example 7. However, you get the idea and should understand how to use this in your own application. One example when this makes sense is when multiple columns begin with the same letter, but some of them begin with the same word. In the final example, we are going to select certain column names that contain a string (or a word).
Example 10: Select Columns Containing a Certain String
Here’s how we can select certain columns starting with a string:
dataf %>%
select(starts_with('Depr'))
Code language: R (r)
Of course, “Depr” is not really a word, and, yes, we get the exact same columns as in example 7. However, you get the idea and should understand how to use this in your own application. One example when this makes sense is when multiple columns begin with the same letter, but some of them begin with the same word. Before going to the next section, it may be worth mentioning another great feature of the dplyr package. You can use dplyr to rename factor levels in R. In the final example; we are going to select certain column names that contain a string (or a word).
Example 11: Select Columns Containing a Certain String
Here’s how we can select certain columns starting with a string:
dataf %>%
select(contains('pr'))
Code language: R (r)

Again, this particular example doesn’t make sense on the example dataset. There’s a final helper function that is worth mentioning: matches()
. This function can check whether column names contain a pattern (regular expression), such as digits. Now that you have selected the columns you need, you can continue manipulating your data and preparing it for data analysis. For example, you can now go ahead and create dummy variables in R or add a new column.
Conclusion
In this post, you have learned how to select certain columns using base R and dplyr. Specifically, you have learned how to get columns, from the dataframe, based on their indexes or names. Furthermore, you have learned to select columns of a specific type. After this, you learned how to subset columns based on whether the column names started or ended with a letter. Finally, you have also learned how to select based on whether the columns contain a string. Hope you found this blog post helpful. If you did, please share it on your social media accounts, add a link to the tutorial in your project reports, and leave a comment below.