This post is my first on R and it will describe a method on how to reverse scores using R.
Reverse scoring in R
Many instruments (i.e., questionnaires) contain items that are phrased so that a strong agreement indicates something negative (e.g., “When there is music in the room I find it hard to concentrate on reading”). These items need to be reversed so that the data will be correct later for statistical analysis.
For more information on reverse scoring, please see my earlier post: Reverse scoring in Python. Since I was more familiar with Python compared to R, and I had no clue on how to do this in SPSS, I wrote a Python script. The Python script used a function that used Pandas DataFrame and it reversed the scores nicely and quickly.
However, in R was pretty much as simple to do reverse scoring as in Python. In the following script, a data frame is generated with column names (i.e., columnNames,’Q1′ to ‘Q6’) and some data is generated using replicate and sample (100 responses, on the 6 questions).
After that, you will find two methods, that are pretty much the same. More specifically, the methods to reverse code, only differ in how the columns are selected. In the first reverse scoring method, items are selected based on the index of the column. The second recode method, variables are selected based on the column names (might be preferable if you know the names of columns but not the indices).
Importing Data in R
In R, there are numerous ways to load data into a dataframe object. In the example below, we are creating a fake dataframe but, of course, most of the time we have the data stored on a computer hard drive.
- Read the How to read a Excel (xlsx) file in R blog post to learn about reading, and writing, .xlsx files in R.
If we collaborate with other researchers that work with SPSS we may want to read a sav file in R. Now, this can be done with the Haven package.
Sometimes, of course, we have a dataframe with a lot of variables and we may want to remove a column in R after we have imported the data.
R Script for Reverse Scoring – Step by Step
How do you reverse scores in R? In the following section, the two steps for doing this are put forward.
#Generating data for the DataFrame scores <- as.data.frame(replicate(6, replicate(100, sample(1:5,1)))) columnNames = c('Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6') colnames(scores) <- columnNames
1. Create a Character Vector
First, create a vector in R, containing the column names, that is to be reversed:
#Generate column names for the DataFrame (Question 1; Q1, and so on) columnsToReverse <- c('Q2', 'Q3', 'Q4')
2. Reverse the Scores based using the Character Vector
In the second step, we use the column names (i.e., the character vector) to do the actual reversing of the scores.
#Reversing scores in columns 'Q2', 'Q3', and 'Q4' reversed.scores[,columnsToReverse] <- 6-scores[,columnsToReverse] # Alternative if we want to use the column indices: reversed.scores <- scores reversed.scores[,c(2,3,4)] <- 6-scores[,c(2,3,4)]
Reverse Code in R using the Psych Package
In this section, of the reverse coding tutorial, you will learn how to use the psych package when reversing the items.
Working with the reverse.code Function
First, you need to install the r-package called “psych” and then you are going to use the reverse.code function to switch the coding of some items.
Reverse-Code Variables in R using reverse.code
Finally, you can use the psych package to reverse the items. First, you import the package, create a numeric vector (keys), and then we use the reverse.code function. We created the numeric vector because of the reverse.code function, uses these indexes to know which columns to reverse (1st, 3rd, and 4th).
require(psych) #Reversing scores in columns 'Q2', 'Q3', and 'Q4' keys <- c(1, -1, -1, -1, 1, 1) new <- reverse.code(keys, df) df[1:3,] new[1:3,]
What to Do After Reverse Scoring
Now, you have reversed your questionnaire data using R. The next step might be to clean the data. This may, in some cases, include removing unwanted data (see the blog post how to remove a column in R, for more information). After variables that you don’t need is removed it may be time to carry out some descriptive statistics and create some scatter plots to examine the relationships between some of your variables.
R and Reproducible Code
Although, the code in this post can be found in this R Jupyter Notebook, this may not be the most optimal way to share the R script including analysis and visualization. This is, because, with time R and it’s packages get updated, sometimes a function may change the name, and so on. Thus. it may take time for another researcher to, fully, reproduce the R computational environment that we used in a study.
Luckily, there are tools such as Binder and Code Ocean to help us with this. See the tutorial on how to use Binder, R, and RStudio for reproducible research. That way, other researchers can run the R script, such as how we reversed the scores here, and (hopefully) get the exact same results.