How to create ONE variable "ethnicity" which contains 8 categories?
Question from Xiaomei Feng:
I have a multiple response question - say, race. Each category is an
individual question. Thus, I have 8 items:
race1 (black) (0=no, 1=yes)
race2 (hispanic) (0=no, 1=yes)
race3 (Asian) (0=no, 1=yes)
..
..
race8 (white) (0=no, 1=yes)
Students can select more than one categories as long as they think it
is appropriate. How to create ONE variable "ethnicity" with the 8
categories?
SPSS Solution:
Shuqin:
DO IF (race1=1) .
RECODE
race1
(1=1) INTO ethnicity.
END IF .
EXECUTE .
DO IF (race2=1) .
RECODE
race2
(1=2) INTO ethnicity .
END IF .
EXECUTE .
Brian:
You may use "recode" function in the Transform menu. Create a new
variable:
Ethnicity. Then recode each individual variable into "Ethnicity". Set
up
the numerical codes for each race, e.g. Black=1, Asian=2.... decline to
state =9 etc. Change old value in each race into the correspondence
number,for example, Asian: old value = 1, new value =2. Once you have
recode all individual variables into Ethnicity you will have each race
represented by different numbers. But I don't know how to solve the
problem if someone chose more than one categories. You may need to do a
loop programming to work out a new category called mixed race. Anyone is
good at SPSS programming may give a try. Good luck.
Jiali:
1) use the SPSS "Compute" to create a new variable:
race = race1*1 + race2*2 + race3*4 + race4*8 + race5*16 +
race6*36 + race7*64 + race8*128
2)then recode the variable so that respondents will be in one category
only:
1 = race1
2 = race2
4 = race3
8 = race4
16 = race5
36 = race6
64 = race7
128 = race8
Lillian's explanation for Jiali:
Here is my understanding of Jiali's solution
below. It's a very smart way of handling this problem and can be
adapted to
the situation as if the answer to each categories of a question is more
than 2.
In XiaoMei's case, we have 8 type of race categories. For each race
type,
the answer is yes or no, two choices. The theoretic base of the
solution
is the binary system. The key is the uniqueness of
each choice based on the exponent of 2. In this case, we make each race
category into
the format of exponent of 2. first, then sum up the chosen race type:
Race 1 = 2**0 = 1 (response to race1 as 'yes'will be coded as 1)
Race 2 = 2**1 = 2 ...
Race 3 = 2**2 = 4
Race 4 = 2**3 = 8
Race 5 = 2**4 = 16
Race 6 = 2**5 = 32
Race 7 = 2**6 = 64 ...
Race 8 = 2**7 = 128 (response to race8 as 'yes' will be coded as
128)
Anyone's race code will be the sum of each race s/he responded as 'yes'.
For a student chose race2, race5, race7 as'yes',
his/her race code will be (2**1)+(2**4)+(2**6) = 2+16+64 = 82. With 8
categories, we can be sure that nobody's race code exceeds 255 even if
this
person said 'yes' to all 8 categories as an extreme case (i.e.
1+2+4+...+128=255). On the other hand, if we know someone's race
code is 148, we can figure out the race(s) s/he chose as race3, race5,
and
race8:
148 = (2**7)+(2**4)
+(2**2) = 128+16+4.
The above example is just the explanation of how
the solution works. Using
formula on spreadsheet (MS Excel), you can get
calculation done easily. If
running data on SAS, there has to be another set of codes.
Chun-Mei:
You can first create a "count" variable to see how many race/ethnicities
selected, and if it's larger than one, then it's grouped into a multiple
race category. The code is below.
* Create COUNT of race/ethnicities selected.
VECTOR lip= latin to reothr1.
NUMERIC numrace(F8.0).
COMPUTE numrace=0.
LOOP #i = 1 to 6.
IF lip(#i)=1 numrace=numrace+1.
END LOOP.
EXE.
variable labels numrace "Count of race/ethnicities selected".
exe.
FREQUENCIES
VARIABLES= numrace
/ORDER ANALYSIS.
* MAKE MULTIPLE RACE VARIABLE.
* MAKE MISSING RESPONSES THAT SELECTED NO RACES OR ALL RACES.
NUMERIC multre(F8.0).
exe.
if (numrace=0) multre=999.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
if (numrace=1) multre=1.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
if (numrace>1) multre=2.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
***If a student checked all the race categories, make it missing.
if (numrace=6) multre=999.
exe.
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
recode multre (999=sysmis) (1=1) (2=2).
exe.
variable labels multre "Multiple racial or ethnic identifications".
value labels multre
1 "One racial or ethnic identification checked"
2 "More than one racial or ethnic identification checked".
FREQUENCIES
VARIABLES= multre
/ORDER ANALYSIS.
SAS Solution:
Beiling:
DATA DAT1;
INPUT ID $ RACE1 RACE2 RACE3 RACE4 RACE5 RACE6 RACE7 RACE8;
SUMRACE = RACE1+RACE2+RACE3+RACE4+RACE5+RACE6+RACE7+RACE8;
IF RACE1 = 1 THEN RACE = '1.BLANK';
IF RACE2 = 1 THEN RACE = '2.HISPA';
IF RACE3 = 1 THEN RACE = '3.AMIND';
IF RACE4 = 1 THEN RACE = '4.ASIAN';
IF RACE5 = 1 THEN RACE = '5.WHITE';
IF RACE6 = 1 THEN RACE = '6.RACEa';
IF RACE7 = 1 THEN RACE = '7.RACEb';
IF RACE8 = 1 THEN RACE = '8.RACEc';
IF SUMRACE > 1 THEN RACE = '9.OTHER';
DATALINES;
A 1 0 0 0 0 0 0 0
B 0 1 0 0 0 0 1 0
C 0 0 0 0 1 0 0 0
D 1 0 0 0 0 0 0 0
E 0 0 0 0 1 0 0 0
F 0 1 0 0 0 0 0 0
G 0 0 1 0 0 0 0 0
H 0 0 0 0 0 0 1 0
I 1 0 0 0 1 0 0 1
J 0 0 0 1 0 0 0 0
K 0 0 0 0 0 0 0 1
L 1 0 0 1 0 0 0 0
;
PROC PRINT;
RUN;
PROC FREQ;TABLES RACE;
TITLE '** INCLUDING CATEGORY **';
RUN;
DATA DAT2;
SET DAT1;
IF SUMRACE = 1;
PROC PRINT;
RUN;
PROC FREQ;TABLES RACE;
TITLE '** EXCLUDING CATEGORY **';
RUN;
/*
Robert:
** SAS PROGRAM FOR COMPUTING FREQ OF RACE/ETHNICITY
**
** WHEN MULTIPLE RACE CATEGORIES ARE PRESENTED
**;
DATA DAT1;
INPUT ID $ RACE1 RACE2 RACE3 RACE4 RACE5 RACE6 RACE7
RACE8;
sumrace=sum(race1*1,race2*2,race3*4,race4*8,race5*16,race6*32,race7*64,race8*128);
select;
when (sumrace=1) race = '1.BLACK';
when (sumrace=2) race = '2.HISPA';
when (sumrace=4) race = '3.AMIND';
when (sumrace=8) race = '4.ASIAN';
when (sumrace=16) race = '5.WHITE';
when (sumrace=32) race = '6.RACEa';
when (sumrace=64) race = '7.RACEb';
when (sumrace=0) race= '10.UNKNWN'; /* IN
CASE sumrace=0 */
otherwise race = '9.OTHER'; /* ANY
COMBINATION */
end;
DATALINES;
A 1 0 0 0 0 0 0 0
B 0 1 0 0 0 0 1 0
C 0 0 0 0 1 0 0 0
D 1 0 0 0 0 0 0 0
E 0 0 0 0 1 0 0 0
F 0 1 0 0 0 0 0 0
G 0 0 1 0 0 0 0 0
H 0 0 0 0 0 0 1 0
I 1 0 0 0 1 0 0 1
J 0 0 0 1 0 0 0 0
K 0 0 0 0 0 0 0 1
L 1 0 0 1 0 0 0 0
M 0 0 0 0 0 0 0 0
;
PROC PRINT;
TITLE '** INCLUDING CATEGORY **';
RUN;
PROC FREQ;TABLES RACE;
TITLE '** INCLUDING CATEGORY **';
RUN;
DATA DAT2;
SET DAT1;
IF SUMRACE IN (1,2,4,8,16,32,64,128);
PROC PRINT;
TITLE '** EXCLUDING CATEGORY **';
RUN;
PROC FREQ;TABLES RACE;
TITLE '** EXCLUDING CATEGORY **';
RUN;
Other Solution:
Shiji:
If you just want to have a frequency of the responses, you can use Multiple
Response command. Go to Analysis - Mutiple Responses - Define Sets After
you define your sets, then you can go to Analysis - Multiple Responses
- Frequency.