There may be times that you receive a file that has many or all of the variables defined as strings , that is, character variables.
The variables may contain numeric values, but if they are defined as type string , there are very few things you can do to analyze the data. You cannot get means, you cannot do a regression, you cannot do an ANOVA, etc… Sometimes the dataset contains numerical values that are stored as strings. We will address this scenario first. Then we will address the case where the string variables actually contain strings, and the goal is to assign each value the string takes on to a numeric value.
The example dataset, hsbs , is a subset of the High School and Beyond data file with all of the variables as string variables. As you see from the describe command below, the variables are all defined as string variables e. Now that we know the variables are string variables, we can use the list command to see what the strings stored in these variables look like.
Although the variable science is defined as str2, you can see from the list below that it contains just numeric values. Even so, because the variable is defined as str2, Stata cannot perform any kind of numerical analysis of the variable science.
The same is true for the variable read. One method of converting numbers stored as strings into numerical variables is to use a string function called real that translates numeric values stored as strings into numeric values Stata can recognize as such.
The first line of syntax reads in the dataset shown above. The real s is the function that translates the values held as strings, where s is the variable containing strings.
A second method of achieving the same result is the command destring. The first line of syntax loads the dataset again, so that we are starting with a dataset containing only string variables again. The second line of syntax runs the destring command. As you can see from the describe command below, the destring command converted all of the variables to numeric, except for race , gender and schtyp.
Since these variables had characters in them, the destring command left such variables alone. If there had been any numeric variables in the dataset, they would remain unchanged. Both of the techniques described above have attributes that in some situations are advantages and in other situations may be disadvantages. To some extent destring can be made to behave similarly, but not identically.
In order to convert a string variable containing any non-numeric value using destring one must list the characters that should be ignored e. How do we convert gender and schtyp into numeric values? We can use the encode command as shown below. These commands create gender2 and schtyp2. Notice in the describe command below that gender2 and schtyp2 are numeric variables and they have labels associated with them called gender2 and schtyp2. If we list out the data, it appears that gender2 and schtyp2 are identical to gender and schtyp , however they are really numeric and what you are seeing are the value labels associated with the variables.
Below we use the nolabel option and you see that gender2 and schtyp2 are really numeric. What about the variable race? It is still a character variable because our prior destring command saw the X in the data and did not attempt to convert it because it had non-numeric values. Below we can convert it to numeric by include the ignore X option that tells destring to convert the variable to numeric and when it encounters X to convert that to a missing value.
You can see the results in the list command below. As you have seen, we can use destring to convert string variables that contain numbers into numeric variables, and it can handle situations where some values are stored as a character like the X we saw with race.
If you have a character variable that is stored as all characters, you can use encode to convert the character variable to numeric and it will create value labels that have the values that were stored with the character variable. Fore more information, see the help or reference manual about the destring and encode commands.More...