OCAIR

Overseas Chinese Association for Institutional Research
An AIR Affiliate That Supports IR Professionals Since 1996

How to flag cases with duplicate ID numbers  and duplicate SUBJECT code?
         
Problem: Need to process the following sample data set...

ID  SUBJ
--- ----
1   ENGL
1   ENGL
1   ENGL
2   MATH
2   ENGL
3   MATH
3   MATH
 
... to look like the following resulting data file:
 
ID  SUBJ    FLAG
--- ----    ----
1   ENGL    S
1   ENGL    S
1   ENGL    S
2   MATH    .
2   ENGL    .
3   MATH    S
3   MATH    S
 
      
Meihua Zhai's SPSS syntaxes for the task:
 
SORT CASES BY ID (A) SUBJ (A).
If (subj ne lag(subj,1)) RECNUM=0.
COMPUTE RECNUM = RECNUM+1.
LEAVE RECNUM.
FORMAT RECNUM(F1).
EXE.
 
STRING FLAG (A1) /* creating a var to hold the flag.
DO IF (RECNUM = 1).
+ COMPUTE Flag = '.'.
ELSE.
+ COMPUTE FLAG = 'S'.
END IF.
SAVE OUTFILE 'C:\TEMP\LARRY2.SAV'
  /DROP = RECNUM.
  
SPSS Tech Support's version:
 
SORT CASES by id subject.
AGGREGATE outfile='aggr.sav' / presorted / break=id
subject
  /flag=n.
MATCH FILES file=* / table='aggr.sav' / by id subject.
RECODE flag (2 thru hi=1)(1=sysmis).
EXECUTE.
  
SAS Solutions to Flag Problem:
 
Create Data Set:     
data SASflag1;
  input id subj $;
   cards;
 1 ENGL
 1 ENGL
 1 ENGL
 2 ENGL
 2 MATH
 3 MATH
 3 MATH
   ;
 proc sort; by id subj;
 run;
 
Solution-1:
data SASflag2;
  set SASflag1;
   if id=lag(id) and subj=lag(subj) then flag='S';
else delete;
  data SASflag3;
   merge SASflag1 SASflag2; by id subj;
run;
 
Solution-2:
proc summary data=SASflag1 nway; class id subj;
  output out=SUM1(drop=_type_);
 data SUM2;
  set SUM1;
   if _freq_ > 1 then flag='S'; else delete;
 data SUM3;
  merge SASflag1 SUM2(drop=_freq_); by id subj;
 run;
 
Note: A line of PROC PRINT command can be added right before RUN if you want to check the result.