6 Shares

In this tutorial, you will learn by examples how to use the %in% in R. Specifically, you will learn 7 different uses of this great operator.

## Outline

Here’s the outline of this post, described a bit more detailed than the table of contents. First, we start out with a couple of simple examples of how to use the `%in%` operator. Specifically, we will have a look at how to use the operator when testing whether two vectors are containing sequences of numbers and letters. As you may already have expected, the operator can be used in other, maybe more advanced cases. In the following sections, therefore, we are going to have a look at how we can work with this operator and dataframes. For example, you will see that you can use the operator to create new variables, remove columns, and select columns.

## What does %in% Mean in R

The `%in%` operator in R can be used to identify if an element (e.g., a number) belongs to a vector or dataframe. For example, it can be used the see if the number 1 is in the sequence of numbers 1 to 10.

## What is the Difference Between the == and %in% Operators in R

The `%in%` operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”. On the other hand, the `==` operator, is a logical operator and is used to compare if two elements are exactly equal. Using the `%in%` operator you can compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one). This is not possible when utilizing the `==` operator.

## What is the use of %in% in R?

The use of the %in% operator is to match values in e.g. two different vectors, as already answered in the to previous questions. You can use the operator, also, to select certain columns in the dataframe or to subset the dataframe.

Now that you know that `%in%` is in R and what the difference is between this operator and `==` is we can go on and have a look at the example usages.

## 7 Ways to Use %in% Operator in R

In this section, we are going through 8 examples of how to use %in% in R. As you already know, we will start by working with vectors. After that, we will have a look at how to use the operator when working with dataframes.

### 1: Using %in% to Compare two Sequences of Numbers (vectors)

In this example, we will use `%in%` to check if two vectors contain overlapping numbers. Specifically, we will have a look at how we can get a logical value for more specific elements, whether they are also present in a longer vector. Here’s the first example of an excellent usage of the operator:

```.wp-block-code {
border: 0;
}

.wp-block-code > div {
overflow: auto;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
```# sequence of numbers 1:
a <- seq(1, 5)
# sequence of numbers 2:
b <- seq(3, 12)

# using the %in% operator to check matching values in the vectors
a %in% b``````

In the code above we get an output as long as the longer vector (i.e., b). Furthermore, we used the `seq()` function, to create the first one sequence of numbers in R and then another. In a real-world example, our vectors might not be containing sequences but just random numbers. If we, on the other hand, want to test which elements of a longer vector are in a short vector we do as follows:

``````# shorter vector:
a <- seq(12, 19)
# longer vector:
b <- seq(1, 16)

# test if elements in longer vector is in shorter:
b %in a``````

As you can see, both above methods will result in a boolean. Additionally, if we use the which() function, we can the the indexes of where the overlapping elements:

``````# Using the operator together with the which() function
which(seq(1:10) %in% seq(4:12))``````

In the next example, we will see that we can apply the same methods for letters, or factors, in R. That is, we will test if two vectors, containing letters, are overlapping.

### 2: Utilizing %in% to Compare two Vectors Containing Letters or Factors

In this example, we will use `%in%` to check if two vectors contain overlapping letters. Note, this can also be done for words (e.g., factors). First, we will compare letters in a shorter vector and in a longer vector. Here’s how to compare two vectors containing letters:

``````# Sequences of Letters:
a <- LETTERS[1:10]

# Second seq of ltters
b <- LETTERS[4:10]

# longer in shorter
a %in% b``````

As you can see, and probably already figured out, we used the `%in%` operator exactly in the same way as for vectors containing sequences of numbers. Again we can test which letters in a long vector are in a short vector:

``b %in% a``

Naturally, as with the examples where we used sequences of numbers in R, the result when working with letters, words, or factors is a boolean vector. Furthermore, as in the first example, we can use the `which()` function to get indexes:

``````g <-  c("C", "D", "E")

h <- c("A", "E", "B", "C", "D", "E", "A", "B", "C", "D", "E")

which(h %in% g)``````

Finally, here’s an example of why using the `%in%` operator is better than the `==`. If we use `which()`, together with `==`, we will get the only the two 3 elements:

``````# %in% vs == the equal operator wrong!
which(g == h)``````

In the next example, we will work with a dataframe, instead of vectors. First, however, we are going to read the readxl package to read a .xlsx file in R. Here’s how we get our dataframe to play around with:

``````library(readxl)
library(httr)

#URL to Excel File:
xlsx_URL <- 'https://mathcs.org/statistics/datasets/titanic.xlsx'

# Get the .xlsx file as an temporary file
GET(xlsx_URL, write_disk(tf <- tempfile(fileext = ".xlsx")))

# Reading the temporary .xlsx file in R:

# Checkiing dataframe:

A quick note, before going on to the third example, is that readxl as well as dplyr, a package that we will use later, are part of the Tidyverse package. If you install Tidyverse you will get some powerful tools to extract year from date in R, carry out descriptive statistics, visualize data (e.g., scatter plots with ggplot2), to name a few.

### 3: How to use the %in% Operator in R to Test if Value is in Column

In this example, we will have a look at a very simple example of how we can use this operator. Namely, we are going to use `%in%` to check if a value is in one of the columns in a dataframe:

``````# %in% column
2 %in% dataf\$boat``````

Now, if you have read through the first 2 examples you already know that we get a boolean vector. In this vector, the value TRUE means that the cell contained the value we sought. Notice also how we used the `\$` operator to select one of the columns.

### 4: Using %in% to Add a New Column to a Dataframe in R

Here’s how to use the `%in%` operator to create a new variable:

``````# Creating a dataframe:
dataf2 <- data.frame(Type = c("Fruit","Fruit","Fruit","Fruit","Fruit",
"Vegetable", "Vegetable", "Vegetable", "Vegetable", "Fruit"),
Name = c("Red Apple","Strawberries","Orange","Watermelon","Papaya",
"Carrot","Tomato","Chili","Cucumber", "Green Apple"),
Color = c(NA, "Red", "Orange", "Red", "Green",
"Orange", "Red", "Red", "Green", "Green"))

dataf2 <- within(dataf2, {
Red_Fruit = "No"
Red_Fruit[Type %in% c("Fruit")] = "No"
Red_Fruit[Type %in% "Vegetable"] = "No"
Red_Fruit[Name %in% c("Red Apple", "Strawberries", "Watermelon", "Chili", "Tomato")] = "Yes"
})``````

Notice how we make use of the operator, Here’s the dataframe, with the added column “Red_Fruit”:

In another post, you will learn how to use R to add a column to a dataframe based on conditions and/or values in other columns.

### 5: Utilizing the %in% Operator to Subset Data

In this example, we are going to use the `%in%` operator to subset the data:

``````library(dplyr)

home.dests <- c("St Louis, MO", "New York, NY", "Hudson, NY")

# Subsetting using %in% in R:
dataf %>%
filter(home.dest %in% home.dests)
``````

Notice how we created a vector of the elements that we want to be included in our new, subsetted, dataframe. Furthermore, we also used the dplyr package and the filter() function together with the %in% operator. Finally, we get the resulting, subsetted, dataframe:

In the next section, we will have a look at another way we may use the %in% operator: namely, to drop columns from a dataframe.

### 6: Using %in% to Remove Columns from Dataframe

In this example, we are going to use `%in%` to drop columns from the datafarme:

``````# Drop columns using %in% operator in R
dataf[, !(colnames(dataf) %in% c("pclass", "embarked", "boat"))]``````

In the code cunk above, we used the I to tell R that we do not want select these columns. Running the code, above, will result in a new dataframe with the columns removed:

Note, it is also possible to use dplyr to remove columns in R. For example, using the select() function together with the pipe operator may result in a slightly more readable code.

In the next example, we are going to have a look at how we can use the `%in%` operator to do the opposite of dropping columns. That is, we are going to select columns, instead.

### 7: Make use of the %in% Operator to Select Columns

Let us use the `%in%` operator to select a number of variables from the dataframe:

``````# Select columns using %in%:
dataf[, (colnames(dataf) %in% c("pclass", "embarked", "boat"))]``````

Note that we removed the ! before the paranthese which will tell R to select these columns (see example 6, above, for the opposite).

Selecting columns, instead of deleting them, might be a more efficient way to go if we have a lot of variables in our dataset and we want to create a new dataframe with only some of them. Notice how we used another nice function: select_if(). This function is also from the dplyr package and when we wanted to select columns if they had certain names.

## Conclusion

In this R tutorial, you have learned 7 ways you can use the %in% operator in R. Specifically, you have learned how to compare vectors of numbers and letters (factors). You have also learned how to check if a value is in a column (as well as how many times), how to add a new variable, remove a columns, and select columns. 6 Shares