OCAIR

Overseas Chinese Association for Institutional Research
An AIR Affiliate That Supports IR Professionals Since 1996

How to create ONE variable "ethnicity" which contains 8 categories?


Question from Xiaomei Feng:


I have a multiple response question - say, race. Each category is an individual question. Thus, I have 8 items:

race1 (black) (0=no, 1=yes)
race2 (hispanic) (0=no, 1=yes)
race3 (Asian) (0=no, 1=yes)
.. ..
race8 (white) (0=no, 1=yes)

Students can select more than one categories as long as they think it is appropriate. How to create ONE variable "ethnicity" with the 8 categories?


SPSS Solution:

Shuqin:

DO IF (race1=1) .
RECODE
race1 (1=1) INTO ethnicity.
END IF .
EXECUTE .

DO IF (race2=1) .
RECODE
race2 (1=2) INTO ethnicity .
END IF .
EXECUTE .


Brian:
You may use "recode" function in the Transform menu. Create a new variable: Ethnicity. Then recode each individual variable into "Ethnicity". Set up the numerical codes for each race, e.g. Black=1, Asian=2.... decline to state =9 etc. Change old value in each race into the correspondence number,for example, Asian: old value = 1, new value =2. Once you have recode all individual variables into Ethnicity you will have each race represented by different numbers. But I don't know how to solve the problem if someone chose more than one categories. You may need to do a loop programming to work out a new category called mixed race. Anyone is good at SPSS programming may give a try. Good luck.


Jiali:
1) use the SPSS "Compute" to create a new variable:
race = race1*1 + race2*2 + race3*4 + race4*8 + race5*16 + race6*36 + race7*64 + race8*128

2)then recode the variable so that respondents will be in one category only:
1 = race1
2 = race2
4 = race3
8 = race4
16 = race5
36 = race6
64 = race7
128 = race8


Lillian's explanation for Jiali: 
Here is my understanding of Jiali's solution below. It's a very smart way of handling this problem and can be adapted to the situation as if the answer to each categories of a question is more than 2.

In XiaoMei's case, we have 8 type of race categories. For each race type, the answer is yes or no, two choices. The theoretic base of the solution is the binary system. The key is the uniqueness of each choice based on the exponent of 2. In this case, we make each race category into the format of exponent of 2. first, then sum up the chosen race type:

Race 1 = 2**0 = 1 (response to race1 as 'yes'will be coded as 1)
Race 2 = 2**1 = 2 ...
Race 3 = 2**2 = 4
Race 4 = 2**3 = 8
Race 5 = 2**4 = 16
Race 6 = 2**5 = 32
Race 7 = 2**6 = 64 ...
Race 8 = 2**7 = 128 (response to race8 as 'yes' will be coded as 128)

Anyone's race code will be the sum of each race s/he responded as 'yes'. For a student chose race2, race5, race7 as'yes', his/her race code will be (2**1)+(2**4)+(2**6) = 2+16+64 = 82. With 8 categories, we can be sure that nobody's race code exceeds 255 even if this person said 'yes' to all 8 categories as an extreme case (i.e. 1+2+4+...+128=255). On the other hand, if we know someone's race code is 148, we can figure out the race(s) s/he chose as race3, race5, and race8: 148 = (2**7)+(2**4) +(2**2) = 128+16+4.

The above example is just the explanation of how the solution works. Using formula on spreadsheet (MS Excel), you can get calculation done easily. If running data on SAS, there has to be another set of codes.


Chun-Mei:
You can first create a "count" variable to see how many race/ethnicities selected, and if it's larger than one, then it's grouped into a multiple race category. The code is below.

* Create COUNT of race/ethnicities selected.
VECTOR lip= latin to reothr1.
NUMERIC numrace(F8.0).
COMPUTE numrace=0.
LOOP #i = 1 to 6.
IF lip(#i)=1 numrace=numrace+1.
END LOOP.
EXE.
variable labels numrace "Count of race/ethnicities selected".
exe.
FREQUENCIES
VARIABLES= numrace
/ORDER ANALYSIS.

* MAKE MULTIPLE RACE VARIABLE.
* MAKE MISSING RESPONSES THAT SELECTED NO RACES OR ALL RACES.
NUMERIC multre(F8.0).
exe.
if (numrace=0) multre=999.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
if (numrace=1) multre=1.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
if (numrace>1) multre=2.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
***If a student checked all the race categories, make it missing.
if (numrace=6) multre=999.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.

recode multre (999=sysmis) (1=1) (2=2).
exe.
variable labels multre "Multiple racial or ethnic identifications".
value labels multre
1 "One racial or ethnic identification checked"
2 "More than one racial or ethnic identification checked".
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.


SAS Solution:

Beiling: 
DATA DAT1;
INPUT ID $ RACE1 RACE2 RACE3 RACE4 RACE5 RACE6 RACE7 RACE8;
SUMRACE = RACE1+RACE2+RACE3+RACE4+RACE5+RACE6+RACE7+RACE8;
IF RACE1 = 1 THEN RACE = '1.BLANK';
IF RACE2 = 1 THEN RACE = '2.HISPA';
IF RACE3 = 1 THEN RACE = '3.AMIND';
IF RACE4 = 1 THEN RACE = '4.ASIAN';
IF RACE5 = 1 THEN RACE = '5.WHITE';
IF RACE6 = 1 THEN RACE = '6.RACEa';
IF RACE7 = 1 THEN RACE = '7.RACEb';
IF RACE8 = 1 THEN RACE = '8.RACEc';
IF SUMRACE > 1 THEN RACE = '9.OTHER';
DATALINES;
A 1 0 0 0 0 0 0 0
B 0 1 0 0 0 0 1 0
C 0 0 0 0 1 0 0 0
D 1 0 0 0 0 0 0 0
E 0 0 0 0 1 0 0 0
F 0 1 0 0 0 0 0 0
G 0 0 1 0 0 0 0 0
H 0 0 0 0 0 0 1 0
I 1 0 0 0 1 0 0 1
J 0 0 0 1 0 0 0 0
K 0 0 0 0 0 0 0 1
L 1 0 0 1 0 0 0 0
;
PROC PRINT;
RUN;
PROC FREQ;TABLES RACE;
TITLE '** INCLUDING CATEGORY **';
RUN;

DATA DAT2;
SET DAT1;
IF SUMRACE = 1;
PROC PRINT;
RUN;
PROC FREQ;TABLES RACE;
TITLE '** EXCLUDING CATEGORY **';
RUN;
/*


Robert: 

** SAS PROGRAM FOR COMPUTING FREQ OF RACE/ETHNICITY
**
** WHEN MULTIPLE RACE CATEGORIES ARE PRESENTED
**;

DATA DAT1;
INPUT ID $ RACE1 RACE2 RACE3 RACE4 RACE5 RACE6 RACE7
RACE8;

sumrace=sum(race1*1,race2*2,race3*4,race4*8,race5*16,race6*32,race7*64,race8*128);
select;
when (sumrace=1) race = '1.BLACK';
when (sumrace=2) race = '2.HISPA';
when (sumrace=4) race = '3.AMIND';
when (sumrace=8) race = '4.ASIAN';
when (sumrace=16) race = '5.WHITE';
when (sumrace=32) race = '6.RACEa';
when (sumrace=64) race = '7.RACEb';
when (sumrace=0) race= '10.UNKNWN'; /* IN
CASE sumrace=0 */
otherwise race = '9.OTHER'; /* ANY
COMBINATION */
end;

DATALINES;
A 1 0 0 0 0 0 0 0
B 0 1 0 0 0 0 1 0
C 0 0 0 0 1 0 0 0
D 1 0 0 0 0 0 0 0
E 0 0 0 0 1 0 0 0
F 0 1 0 0 0 0 0 0
G 0 0 1 0 0 0 0 0
H 0 0 0 0 0 0 1 0
I 1 0 0 0 1 0 0 1
J 0 0 0 1 0 0 0 0
K 0 0 0 0 0 0 0 1
L 1 0 0 1 0 0 0 0
M 0 0 0 0 0 0 0 0
;
PROC PRINT;
TITLE '** INCLUDING CATEGORY **';
RUN;
PROC FREQ;TABLES RACE;
TITLE '** INCLUDING CATEGORY **';
RUN;

DATA DAT2;
SET DAT1;
IF SUMRACE IN (1,2,4,8,16,32,64,128);
PROC PRINT;
TITLE '** EXCLUDING CATEGORY **';
RUN;
PROC FREQ;TABLES RACE;
TITLE '** EXCLUDING CATEGORY **';
RUN;


Other Solution:

Shiji: 
If you just want to have a frequency of the responses, you can use Multiple Response command. Go to Analysis - Mutiple Responses - Define Sets After you define your sets, then you can go to Analysis - Multiple Responses - Frequency.