Converting Survey Variables with SPSS Syntax

With its emphasis on analyzing statistical data, SPSS is a powerful tool for manipulating survey data.

It is, however, an intricate process that demands a high level of expertise. In this article Joe Glines, an SPSS pro with years of experience in data analysis, offers his quick tricks on how to manipulate the SPSS data we receive from SurveyGizmo

String Variables to Numeric

While SurveyGizmo allows for changing of variable format inside the tool, occasionally I end up with string variables that really want to be numeric. As long as the values are numbers with spaces on either side, this is very easy to accomplish.

Let’s say you wish to convert the variable named “String_Zip” to a numeric variable entitled “Numeric_zip.” The following is the SPSS syntax that would create the new variable:

RECODE String_Zip (Convert) INTO Numeric_zip.
EXE.

This gets us what we want however, as can be seen in the screenshot below, the new “Numeric_Zip” variable is currently set as a “Scale” variable.

spss variables example 1

To help remind yourself not to take the “average” of zip code you might want to change the variable level to Nominal. This can be achieved with the following line of syntax:

Variable Level Numeric_Zip (Nominal).

Now you’ll see the Numeric_Zip variable is set to be Nominal:

spss variables example 2

As a reference, below are the other variable levels supported in SPSS

  • Variable Level Numeric_Zip (Ordinal)
  • Variable Level Numeric_Zip (Scale)

Numeric to String

If you need to convert a numeric value to a string there is one additional step: you must first declare the string variable. The following creates a string variable named String_Zip2 with a width of five characters:

STRING String_Zip2(A5)

Now that you’ve created the variable, you can convert the numeric value to its string counterpart.

COMPUTE String_zip2=STRING(Numeric_zip,N5).
EXE.

The below screenshot shows an example of what the three variables could look like.

spss variables example 3

A couple of things to point out from above, when you convert a string value to a numeric, for obvious reasons any leading zeros will no longer be kept.

In ninety-nine percent of cases this is what we want; however, the North East has zip codes that begin with zeros. Depending on what we plan to do with the data, this situation may be irrelevant. But if you’re going to join your data with external sources, you should first examine the zip code there and see what format it is in.

In the example I showed above where we converted the numeric zip code back to a string, we padded the left side of the number with up to five zeros. Also note, because the last value was more than five characters, the conversion did not work. You’ll want to trim the original values first if you have more than five characters.

The SurveyGizmo Developer Toolkit

Supercharge your surveys by unpacking our developer toolkit!

Get the Free Ebook

Dropping Unused Variables

Want to drop the old numeric variable from the data set? Simply use the match files command with the drop statement.

Match files file=* / DROP Numeric_Zip .
EXE.

Parsing Multiple Values Out to Their Own Columns

Sometimes we get data in “interesting” formats. Occasionally we’ll get data which has multiple responses within one column or field. We really want to have each value in its own column and remove all delimiters. Below is an example:

spss example 4

I used to run into this problem very frequently. Thankfully it isn’t nearly as common now, but it does still occur. I wrote the below macro which will parse out up to 25 columns. Explaining how to write a macro is beyond the scope of this post, but that’s okay because using macros is very easy.

*///////////////.
DEFINE !Parse (Var !TOKENS (1) / Stem !TOKENS (1) /Del !TOKENS (1))
STRING #(A1000).
VECTOR !Stem(25A500).
COMPUTE #=CONCAT(RTRIM(!Var),!Del).
COMPUTE #cnt=1.
LOOP IF INDEX(#,!Del)>0.
COMPUTE !Stem(#cnt)=SUBSTR(#,1,INDEX(#,!Del)-1).
COMPUTE #cnt=#cnt + 1.
COMPUTE #=SUBSTR(#,INDEX(#,!Del)+1).
Var width !Concat(!Stem,1) to !Concat(!Stem,25) (10).
END LOOP.
EXECUTE.
!ENDDEFINE.
*///////////////.

The first thing you’ll need to do is to define the macro. Everything between the two *///////////////. lines is the actual macro definition.
After running this you will not get any output as you have yet to call the macro.

To do so, you’ll need to pass parameters to help instruct SPSS on how what to use or create. Here are the three parameters this macro takes:

  1. Var=Zip_Codes: tells it what variable is the haystack it should search.
  2. Stem=zips: tells the macro what to begin the name of each new variable.
  3. Del=”;”: tells SPSS what to use as the delimiter.

To call the macro, you do so as follows:

!Parse Var=Zip_Codes Stem=zips Del=";".

This will generate the below variables in the data set:

spss variables example 5

If you have a different delimiter, you just change what you put between quotes in the third parameter. If this is confusing to you, you can watch a video demonstrating its usage.

Concatenating Strings

Occasionally we only want to merge (concatenate or append) two string values together. For example, let’s say you had first and last names in two variables, but you wanted them in one.

spss variable example 6

The below syntax would create new variable entitled First_Last however it would put them together without a space:

string First_Last (A30).
COMPUTE First_Last=concat(First,Last).
exe.

The following code would be a bit better as it inserts a space in between the first and last name.

string First_Last (A30).
COMPUTE First_Last=concat(First," ",Last).
exe.

spss variable example 7

Hint: if you needed to parse out the First_Last name you could re-use our above macro and parse out the First_Last variable by changing the delimiter to a space.

Replace or Remove Characters

If you want to replace or remove specific character(s) from a string value, SPSS has a replace command which is easy to use. The following line would replace “yes” with “y.” If you wanted to remove “yes” you would simply leave out the y between the second double quotes.

compute string_Var2= replace(string_Var,'yes','y').
exe.

Hint: Earlier when we converted strings to numbers, you might want to use the following to remove the comma delimiter on numeric values over 999.

compute string_Var= replace(string_Var,",","").

Automatically Recoding Strings

Did you know SPSS has a built in command to recode your string variables and generate the corresponding labels? The AutoRecode command can be extremely helpful when you have a data set that is standardized, i.e. string values are consistent across cells.

Let’s say you have a data set which has City as a string variable:

spss example 8

We could write a recode command to recode each city name to a number, and then assign the city name as a label. If our list of cities is large, this would take a considerable amount of time. Instead of manually doing this we can use AutoRecode to take care of the entire process.

The below syntax will tell SPSS to examine the City variable then recode it to a numeric variable entitled City_CD and show the syntax generated to do so (in case you wish to repeat it later).

AUTORECODE VARIABLES=City /INTO City_CD /PRINT.

Here is the output generated which helps you see how the cities were mapped:

spss variable example 9

Hint: notice the 1 with nothing visible? This indicates there were blank values. You might want to define these as missing in your data set.

When looking at the data set, you can see how the coding worked. Below are two screen shots side by side. The one on the left is with Value labels visible (notice how the City_CD variable is right justified; this is an indication it is a numeric value with labels). The screen shot on the right shows the data value labels disabled.

spss variable example 10

Why would you want to go through the trouble of recoding your string variables to numeric and applying labels? Here are five reasons to go through the trouble.

  1. Some analysis is only available if your independent variable is numeric.
  2. The size of your data set and the processing power needed to run analysis is greatly reduced when dealing with numeric variables over strings. In survey research this typically isn’t an issue however, as “Big Data” becomes the norm, it can be a lifesaver!
  3. Labels are much easier to control how they look/display in output. It’s much easier to control how labels look and display in output.
  4. You have full control to change the order of how your output is generated, meaning it doesn’t have to be alphabetical. Granted the AutoRecode tool cannot be used for this, but it still might be worth writing it by hand if it means you’ll be able to get the output in the preferred order once instead of adjusting it over and over in your output.
  5. If you’re going to pass your data to another tool, some work much better with numeric variables than strings yet they do not have an “auto recode” feature.

Do have you have burning questions about SPSS? Give us a shout in the comments and Joe might answer them in a future article!

Bio:
Joe Glines is the co-founder of the-Automator, a small company that specializes in automating reporting and daily tasks. He is an expert at SPSS as well as Market Research, and will be bringing his expertise to the SurveyGizmo blog on a regular basis.

Join the Conversation