The Unique Function in R: How to Use it in 4 Ways

In this post, we will explore how to utilize the unique function in R, a valuable tool for identifying and removing duplicate values across various data structures. We will start by examining the unique() function to learn how it works. Next, we will examine examples of how to apply it to vectors, matrices, and data frames, all common data structures in R. Furthermore, we will explore how to count the unique values in a dataset and compare two or more datasets for unique values.

Outline
What is the unique() function in R?
How to use the unique() function on vectors
How to use the unique() function on matrices
How to use R's unique() function on dataframes
- Subsetting using unique() and subset()
How to count the number of unique values in a data set using unique()

Summary
Resources

Outline

In this post, we will first explain what the unique() function in R is and how it works. Next, we will show you how to use the unique() function on vectors, which are one-dimensional data arrays. Then, we will demonstrate how to use the unique() function on matrices, which are two-dimensional data arrays. After that, we will illustrate how to use the unique() function on data frames, which are special types of data structures that can store different data types in each column. Furthermore, we will teach you how to count the unique values in a data set using the length() function. By the end of this post, you will have a solid understanding of R’s unique() function and how to apply it to different types of data structures.

What is the unique() function in R?

The unique function in R is a built-in function that returns a vector, matrix, or data frame with only the unique values from the original data. The syntax of the unique() function is as follows:

unique(x, incomparables = FALSE, fromLast = FALSE, nmax = NA)Code language: R (r)

The arguments of the unique() function are:

x: the data to be processed. It can be a vector, matrix, or data frame.

incomparables: a vector of values that are not to be compared. The default is FALSE, which means that all values are compared.
fromLast: a logical value that indicates whether to scan the data from the last element or the first element. The default is FALSE, meaning the data is scanned from the first element.
nmax: an integer that specifies the maximum number of unique values to be returned. The default is NA, which means that there is no limit.

The unique() function returns a vector, matrix, or data frame with the same attributes as the original data, but with only the unique values. The order of the values is preserved, unless the fromLast argument is set to TRUE. The unique() function also has a method for lists, which applies the function to each list element and returns a list of unique values.

How to use the unique() function on vectors

One of the simplest ways to use the unique() function in R is to apply it to a vector. A vector is a one-dimensional data array that can be numeric, character, logical, or complex in nature. For example, suppose we have a vector of numbers called x:

x <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)Code language: R (r)

Here is how the numeric vector, x, looks like:

We can see that this vector has 10 elements, but only five unique values. To get a vector with only the unique values, we can use the unique() function:

unique(x)

We can see that the unique() function returns a vector with only the unique values from x, in the same order as they appear in x. The length of the output vector is 5, which is the number of unique values in x. We can also use the length() function to check this:

We can also use the unique() function on character vectors, logical vectors, or complex vectors. For example, suppose we have a character vector of names called y:

y <- c("Erik", "Björn", "Sven", "Lars", "Anna", "Erik", "Björn", "Sven")Code language: R (r)

This vector has 8 elements, but only 5 unique values. To get a vector with only the unique values, we can use the unique() function:

unique(y)Code language: R (r)

The unique() function returns a character vector with only the unique values from y, in the same order as they appear in y. The length of the output vector is 5, which is the number of unique values in y.

finding the unique values in a vector in R

How to use the unique() function on matrices

Another way to use the unique() function in R is to apply it to a matrix. A matrix is a two-dimensional data array that can be numeric, character, logical, or complex. For example, suppose we have a matrix of numbers called z:

z <- matrix(c(1, 2, 3, 4, 5, 6, 1, 2, 3, 6, 5, 4), nrow = 3, ncol = 4)Code language: R (r)

We can see that this matrix has 12 elements, but only 6 unique values. To get a matrix with only the unique values, column-wise, we can use the unique() function and the MARGIN = 2 parameter:

unique(z, MARGIN = 2)

using the unique function on a matrix in R.

We can see that the unique() function returns a matrix with only the unique values from z, in the same order as in z. The dimensions of the output matrix are three rows and three columns, which is the number of unique values in z. We can also use the dim() function to check this:

We can also use the unique() function on character matrices, logical matrices, or complex matrices. For example, suppose we have a character matrix of names called w:

w <- matrix(c("Erik", "Björn", "Sven", "Lars", 
              "Anna", "Fredrik", "Erik", "Björn", 
              "Sven", "Fredrik", "Anna", "Lars"), nrow = 3, ncol = 4)Code language: R (r)

We can see that this matrix has 12 elements, but only six unique values. We can use the function to get a matrix with only the unique values. Here, with the t() function:

t(unique(t(w)))Code language: R (r)

In the code chunk above, t() is used to transpose a matrix in R. We convert the columns into rows and rows into columns. This is done because the unique() function operates on rows when applied to a matrix. Using t() will allow unique() to identify and remove any duplicated columns.

Then, we use t() again to transpose the matrix back to its original orientation. This second application of t() restores the original row-column structure, but now without the duplicated columns.

How to use R’s unique() function on dataframes

A third way to use the `unique()` function in R is to apply it to a data frame. A data frame is a special type of data structure that can store different data types in each column, such as numeric, character, logical, or factor. For example, suppose we have a data frame of students’ information called df:

df <- data.frame(name = c("Erik", "Björn", "Sven", "Lars", "Anna", "Erik", "Björn", "Sven"),
age = c(20, 21, 22, 23, 24, 20, 21, 22),
gender = c("M", "M", "M", "M", "F", "M", "M", "M"),
grade = c("A", "B", "C", "D", "E", "A", "B", "C"))Code language: R (r)

We can see that this dataframe has 8 rows and 4 columns, but only 5 unique rows. To get a dataframe with only the unique rows, we can use the unique() function:

unique(df)Code language: R (r)

using unique function in R on a dataframe to remove rows

We can see that the unique() function returns a data frame with only the unique rows from df, in the same order as they appear in df. The dimensions of the output data frame are 5 rows and 4 columns, which is the number of unique rows in df. We can also use the dim() function to check this:

Subsetting using unique() and subset()

The unique() function on dataframes compares the values in each column and returns only the rows with distinct values in all columns. If we want to compare the values in a specific column or a subset of columns, we can use the subset() function to select the columns we want to compare. For example, suppose we want to get the unique rows based only on the name column. We can use the subset() function to select the name column and then apply the unique() function:

unique(subset(df, select = name))Code language: R (r)

We can see that the unique() function returns a data frame with only the unique values in the name column, in the same order as they appear in df. The dimensions of the output data frame are 5 rows and 1 column, corresponding to the number of unique values in the name column. We can also use the dim() function to check this.

Moreover, we can use the subset() function to select more than one column to compare. For example, suppose we want to get the unique rows based on the name and gender columns. We can use the subset() function to select the name and gender columns and then apply the unique() function.

unique(subset(df, select = c(name, gender)))Code language: R (r)

We can see that the unique() function returns a dataframe with only the unique values in the name and gender columns, in the same order as they appear in df. The dimensions of the output dataframe are 5 rows and 2 columns, which is the number of unique values in the name and gender columns. Again, we can use the dim() function to check this.

How to count the number of unique values in a data set using unique()

Another valuable application of the unique() function in R is to count the number of unique values in a dataset. This can be done by applying the length() function to the output of the function, which returns the number of elements in a vector, matrix, or data frame. For example, suppose we have a dataframe called df (the same as above).

# Count the number of unique names
length(unique(df$name))
# Output: 5
# Count the number of unique ages
length(unique(df$age))
# Output: 5
# Count the number of unique genders
length(unique(df$gender))
# Output: 2Code language: PHP (php)

As you can see in the example above, the functions can provide a quick overview of the diversity of our dataset and help us identify any potential errors or outliers. For example, if we expected to have more than two genders in our dataset, we might want to check if there was any missing or incorrect data in the gender column. For more posts about counting:

R Count the Number of Occurrences in a Column using dplyr

Countif function in R with Base and dplyr

Summary

In this post, we have learned how to use the unique() function in R in five ways. The post focused on examples and code snippets for each case. First, we learned how the unique() function can return a vector, matrix, or dataframe with only the unique values from the original data. Here, we also learned how it can be used to count the number of unique values in a dataset. If you did, please share it with your friends and colleagues on social media, and leave your feedback and questions in the comments section below. I would love to hear from you and answer issues you may have.

Resources

Here are some more R tutorials you may find helpful:

Sum Across Columns in R – with dplyr & base Functions