I ran into a situation where I needed to add a variable to a dataset. I knew that I was then going to modify some of the values in the variable, but most of the values were going to be zeros. So, I wanted to create a new variable and fill it with all zeros.
As with most of my R examples, I’m going to use the 2010 wave of the General Social Survey (R version here) to illustrate. You can open that file in R and follow along.
Here’s the code I used to create the variable:
GSS2010$TEMPANALYSIS <- replicate(2044, 0)
Here is what the code above does…
GSS2010 is the name of the dataset into which I wanted to create the variable. In this case, it is a copy of the 2010 wave of the GSS.
TEMPANALYSIS is what I called the variable. (The “$” tells R that it is a variable in the dataset.)
The “replicate” function tells R to replicate the second value in the parentheses (0) the number of times noted as the first value in the parentheses (2044). I used 2,044 because that is how many cases there are in the dataset. You can obviously adjust the value for the number of cases in your dataset/dataframe. If you have 320 cases, adjust it to 320.
If you don’t include the exact number of cases, you’ll get an error like this:
Error in `$<-.data.frame`(`*tmp*`, TEMPANALYSIS, value = c(0, 0, 0, 0, : replacement has 2042 rows, data has 2044
That error is saying that you tried to add a variable but R needs to know what to put in every one of the rows and since it is short 2 rows, it can’t do it.
Of course, with R, there is always another way to do something. Here’s an alternative command that will do the same thing:
GSS2010$TEMPANALYSIS2 <- rep(0, times=2044)
I won’t repeat the description of the dataset and variable but will detail what the rest of the code is doing.
“rep” tells R to repeat the first value in the parentheses (0) the number of times specified as the second number in the parentheses (2044; technically, the “times=” portion is not required.
Here’s a script file for these commands.