136 Shares

In this tutorial, you will learn by examples how to use the %in% in R. Specifically, you will learn 7 different uses of this great operator.

## Outline

Here’s the outline of this post, described a bit more detailed than the table of contents. First, we start out with a couple of simple examples of how to use the `%in%` operator. Specifically, we will have a look at how to use the operator when testing whether two vectors are containing sequences of numbers and letters. As you may already have expected, the operator can be used in other, maybe more advanced cases. In the following sections, therefore, we are going to have a look at how we can work with this operator and dataframes. For example, you will see that you can use the operator to create new variables, remove columns, and select columns.

## What does %in% Mean in R

The `%in%` operator in R can be used to identify if an element (e.g., a number) belongs to a vector or dataframe. For example, it can be used the see if the number 1 is in the sequence of numbers 1 to 10.

## What is the Difference Between the == and %in% Operators in R

The `%in%` operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”. On the other hand, the `==` operator, is a logical operator and is used to compare if two elements are exactly equal. Using the `%in%` operator you can compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one). This is not possible when utilizing the `==` operator.

## What is the use of %in% in R?

The use of the %in% operator is to match values in e.g. two different vectors, as already answered in the to previous questions. You can use the operator, also, to select certain columns in the dataframe or to subset the dataframe.

If you need to know what is \$ in r, another operator, check the linked post. Now that you know that `%in%` is in R and what the difference is between this operator and `==` is we can go on and have a look at the example usages.

## 7 Ways to Use the %in% Operator in R

In this section, we are going through 8 examples of how to use %in% in R. As you already know, we will start by working with vectors. After that, we will have a look at how to use the operator when working with dataframes.

### 1: Using %in% to Compare two Sequences of Numbers (vectors)

In this example, we will use `%in%` to check if two vectors contain overlapping numbers. Specifically, we will have a look at how we can get a logical value for more specific elements, whether they are also present in a longer vector. Here’s the first example of an excellent usage of the operator:

```.wp-block-code {
border: 0;
}

.wp-block-code > div {
overflow: auto;
}

.shcb-language {
border: 0;
clip: rect(1px, 1px, 1px, 1px);
-webkit-clip-path: inset(50%);
clip-path: inset(50%);
height: 1px;
margin: -1px;
overflow: hidden;
position: absolute;
width: 1px;
word-wrap: normal;
word-break: normal;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
```# sequence of numbers 1:
a <- seq(1, 5)
# sequence of numbers 2:
b <- seq(3, 12)

# using the %in% operator to check matching values in the vectors
a %in% b```Code language: R (r)```

In the code above we get an output as long as the longer vector (i.e., b). Furthermore, we used the `seq()` function, to create the first one sequence of numbers in R and then another. In a real-world example, our vectors might not be containing sequences but just random numbers. If we, on the other hand, want to test which elements of a longer vector are in a short vector we do as follows:

``````# shorter vector:
a <- seq(12, 19)
# longer vector:
b <- seq(1, 16)

# test if elements in longer vector is in shorter:
b %in a```Code language: R (r)```

As you can see, both above methods will result in a boolean. Additionally, if we use the which() function, we can the indexes of where the overlapping elements:

``````# Using the operator together with the which() function
which(seq(1:10) %in% seq(4:12))```Code language: PHP (php)```

Might also interest you: How to use \$ (dollar sign) in R: 6 Examples – list & dataframe

In the next example, we will see that we can apply the same methods for letters, or factors, in R. That is, we will test if two vectors, containing letters, are overlapping.

### 2: Utilizing %in% to Compare two Vectors Containing Letters or Factors

In this example, we will use `%in%` to check if two vectors contain overlapping letters. Note, this can also be done for words (e.g., factors). First, we will compare letters in a shorter vector and in a longer vector. Here’s how to compare two vectors containing letters:

``````# Sequences of Letters:
a <- LETTERS[1:10]

# Second seq of ltters
b <- LETTERS[4:10]

# longer in shorter
a %in% b```Code language: PHP (php)```

As you can see, and probably already figured out, we used the `%in%` operator exactly in the same way as for vectors containing sequences of numbers. Again we can test which letters in a long vector are in a short vector:

``b %in% a``

Naturally, as with the examples where we used sequences of numbers in R, the result when working with letters, words, or factors is a boolean vector. Furthermore, as in the first example, we can use the `which()` function to get indexes:

``````g <-  c("C", "D", "E")

h <- c("A", "E", "B", "C", "D", "E", "A", "B", "C", "D", "E")

which(h %in% g)```Code language: JavaScript (javascript)```

Finally, here’s an example of why using the `%in%` operator is better than the `==`. If we use `which()`, together with `==`, we will get only the two 3 elements:

``````# %in% vs == the equal operator wrong!
which(g == h)```Code language: R (r)```

In the next example, we will work with a dataframe, instead of vectors. First, however, we are going to read the readxl package to read a .xlsx file in R. Here’s how we get our dataframe to play around with:

``````library(readxl)
library(httr)

#URL to Excel File:
xlsx_URL <- 'https://mathcs.org/statistics/datasets/titanic.xlsx'

# Get the .xlsx file as an temporary file
GET(xlsx_URL, write_disk(tf <- tempfile(fileext = ".xlsx")))

# Reading the temporary .xlsx file in R:

# Checkiing dataframe:

A quick note, before going on to the third example, is that readxl as well as dplyr, a package that we will use later, are part of the Tidyverse package. If you install Tidyverse you will get some powerful tools to extract year from date in R, carry out descriptive statistics, visualize data (e.g., scatter plots with ggplot2), to name a few.

### 3: How to use the %in% Operator in R to Test if Value is in Column

In this example, we will have a look at a very simple example of how we can use this operator. Namely, we are going to use `%in%` to check if a value is in one of the columns in a dataframe:

``````# %in% column
2 %in% dataf\$boat```Code language: R (r)```

Now, if you have read through the first 2 examples you already know that we get a boolean vector. In this vector, the value TRUE means that the cell contained the value we sought. Notice also how we used the `\$` operator to select one of the columns.

### 4: Using %in% to Add a New Column to a Dataframe in R

Here’s how to use the `%in%` operator to create a new variable:

``````# Creating a dataframe:
dataf2 <- data.frame(Type = c("Fruit","Fruit","Fruit","Fruit","Fruit",
"Vegetable", "Vegetable", "Vegetable", "Vegetable", "Fruit"),
Name = c("Red Apple","Strawberries","Orange","Watermelon","Papaya",
"Carrot","Tomato","Chili","Cucumber", "Green Apple"),
Color = c(NA, "Red", "Orange", "Red", "Green",
"Orange", "Red", "Red", "Green", "Green"))

dataf2 <- within(dataf2, {
Red_Fruit = "No"
Red_Fruit[Type %in% c("Fruit")] = "No"
Red_Fruit[Type %in% "Vegetable"] = "No"
Red_Fruit[Name %in% c("Red Apple", "Strawberries", "Watermelon", "Chili", "Tomato")] = "Yes"
})```Code language: PHP (php)```

Notice how we make use of the operator, Here’s the dataframe, with the added column “Red_Fruit”:

In another post, you will learn how to use R to add a column to a dataframe based on conditions and/or values in other columns.

### 5: Utilizing the %in% Operator to Subset Data

In this example, we are going to use the `%in%` operator to subset the data:

``````library(dplyr)

home.dests <- c("St Louis, MO", "New York, NY", "Hudson, NY")

# Subsetting using %in% in R:
dataf %>%
filter(home.dest %in% home.dests)
```Code language: R (r)```

Notice how we created a vector of the elements that we want to be included in our new, subsetted, dataframe. Furthermore, we also used the dplyr package and the filter() function together with the %in% operator. Finally, we get the resulting, subsetted, dataframe:

Note, dplyr comes with a lot of other handy functions such as the select-family. For example, you can use dplyr to select columns in R or to take the absolute value in R, using the function only on numerical columns. In the next section, we will have a look at another way we may use the %in% operator: namely, to drop columns from a dataframe.

### 6: Using %in% to Remove Columns from Dataframe

In this example, we are going to use `%in%` to drop columns from the datafarme:

``````# Drop columns using %in% operator in R
dataf[, !(colnames(dataf) %in% c("pclass", "embarked", "boat"))]```Code language: R (r)```

In the code cunk above, we used the I to tell R that we do not want select these columns. Running the code, above, will result in a new dataframe with the columns removed:

Note, it is also possible to use dplyr to remove columns in R. For example, using the select() function together with the pipe operator may result in a slightly more readable code.

In the next example, we are going to have a look at how we can use the `%in%` operator to do the opposite of dropping columns. That is, we are going to select columns, instead.

### 7: Make use of the %in% Operator to Select Columns

Let us use the `%in%` operator to select a number of variables from the dataframe:

``````# Select columns using %in%:
dataf[, (colnames(dataf) %in% c("pclass", "embarked", "boat"))]```Code language: CSS (css)```

Note that we removed the ! before the parentheses which will tell R to select these columns (see example 6, above, for the opposite).

Selecting columns, instead of deleting them, might be a more efficient way to go if we have a lot of variables in our dataset and we want to create a new dataframe with only some of them. Notice how we used another nice function: select_if(). This function is also from the dplyr package and when we wanted to select columns if they had certain names.

In the final bonus section, we are going to see how we can negate the %in% operator. Now, we are going to do this because there is not a built-in “not in” operator in R.

## Bonus: Creating a not in operator in R

Here’s how we can create our own not in operator in R:

``````# Creating a not in operator:
`%notin%` <- Negate(`%in%`)```Code language: R (r)```

Pretty simple. It is now possible to use this new R not in operator to check if e.g. a number is not in a vector:

``````# Generating a sequence of numbers:
numbs <- rep(seq(3), 4)

# Using the not in operator:
4 %notin% numbs
# Output:  TRUE```Code language: R (r)```

As you can see in the example above, we can use the %notin% operator similarly as we would use the %in% operator. Note that it is also possible to use both the operators on lists, as well. Finally, it is worth noting that there are some R packages that contain “not in” functions. For example, the package mefa4 has the %notin% function.

## Conclusion

In this R tutorial, you have learned 7 ways you can use the %in% operator in R. Specifically, you have learned how to compare vectors of numbers and letters (factors). You have also learned how to check if a value is in a column (as well as how many times), how to add a new variable, remove a columns, and select columns.

## R Tutorials

Here are some other useful tutorials: 136 Shares