R – Combining Data from Two Variables

rcragun

2 years ago

(NOTE: This was done in R 4.3.1 using RStudio 2023.03.0.)

Here’s the scenario: I have a variable (V1) from Country A with roughly ~1,000 responses that measures religious affiliation with the following options:

Christian
Buddhist
Hindu
Muslim
Jewish
Other religion
No religion

To be clear, there are no responses to V1 from the participants in the other country (Country B), only from individuals from Country A. For the participants from Country B, the value in V1 is “NA.”

I have a variable in the same dataset (V2) that measures religious affiliation in a different country, Country B, (again, roughly ~1,000 responses) with the following options:

Catholic
Protestant Christian
Spiritist
Indigenous Religion
Jewish
Other religion
No religion

What I want to do is combine the two variables. I, of course, need to keep all the information in both variables to the extent possible, though I have to do some recoding so I have universal categories for the two variables. I can collapse options 1 and 2 for V2 and put options 3 and 4 into the “other religion” category. That will give me the same categories for the two variables. First, then, I need to do the recode, for which I’ll use the “car” package:

library(car)
V2X <- car::recode(DATASET$V2, '1:2=1; 3:4=6; 5=5; 6=6; 7=7; else=NA')

This gives me a new variable, V2X, which is a recoded version of V2, with options that align with those in V1.

Now, the tricky part. How do I add the newly recoded values in V2X into V1? Actually, the V2X values should be combined with those from V1 into a new variable (e.g., V1X) so we retain the values in V1 in case we need them for something else. Enter an “ifelse” function. Here’s the function I used:

DATASET$V1X = ifelse(!is.na(DATASET$V2X),DATASET$V2X,DATASET$V1)

Here’s what that command does. First, it is going to create a new variable in the DATASET, V1X. The values for that variable are contingent upon the next part of the command. The “ifelse” function tells R to do something if a condition is met and, if it is not, then do something else. The structure is:

ifelse("test condition", "do this if condition is met", "otherwise do this")

So, the first part of my code is the test of the condition. “!is.na” is how you tell R “if the value is not NA.” So, the test condition code is telling R, “if the value in V2X is not NA (or missing), then do…”

What is it going to do? That’s the second part. In this case, it is going to copy the value in V2X into the newly created V1X.

The last part of that code tells R what to do if the condition is not met (the “else” portion of the “ifelse” function). In this case, I want R to simply copy the values from V1 into V1X.

This line of code iterates through the entire dataset and checks first whether there is a value in V2X. If there is, it copies that value into V1X. If there is not a value in V2X, it then copies the value from V1 into V1X. Effectively, this combines the values from two variables into a single variable, retaining the information in both variables.

I have run into this issue before but struggle to solve it every time I have to deal with it, so I figured I’d create a post about it.