Malaise Precis: Convert pre-2002 CPS industry and occupation codes

The idea is straightforward. Most of the pre-2002 occupations have been redistributed into other occupations post-2002. In order to get to the new codes generate a random number over the entire data set. If the old occupation has a random number within the range defined by Census then we set this to be its new code.

The rules for the proportions of old occupations and industries to be redistributed to the new codes are available at the Census web site or at IPUMS. The spreadsheets containing the rules for industry were downloaded from here while the rules for the occupations were downloaded from here.

The caveats are that the porportions are assumed to remain constant over time and that the data set is large enough that how the random numbers are generated does not matter 'much'.

The SAS code first converts the occupation spreadsheet to SAS data to be written out to a text file:

PROC IMPORT OUT= WORK.temp 
            DATAFILE= "I:\occ_90-00.xls" 
            DBMS=EXCEL REPLACE;
     RANGE="Sheet1$"; 
     GETNAMES=YES;
     MIXED=YES;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

data temp1(drop = Table_2___1990_Census_Occupation f6);
  set temp(drop = f5 f7 f8);
  rename /* Table_2___1990_Census_Occupation = oldcode*/
    f2 = oldlabel f3 = newcode f4 = newlabel ;
  if _N_ < 5 then delete; 
  if index (Table_2___1990_Census_Occupation, 'categor')>0 then delete;
  oldcode = input (Table_2___1990_Census_Occupation,4.);
  if f6 ne 'NA' then percent = input(f6, 8.);
run;

data temp1(drop = oldcode oldlabel newcode);
  retain c clabel;
  length clabel $ 65;
  set temp1;
  if oldcode ne . then c = oldcode;
  if oldlabel ne ' ' then clabel = oldlabel;
  ncode = input(newcode, 4.);
  if percent = . then delete;
run;

data temp2;
  retain c ncode;
  set temp1(where = (ncode ne .)); by c;
  retain cumpct;
  if first.c then do; 
    cumpct = percent;
    lpercent = 0;
  end;
  else do;
    lpercent = cumpct;
    cumpct = cumpct + percent;
  end;
run;

data _null_;
  set temp2; by c;
  file "I:\CPS\SAS Programs\occ1990_2000_xwalk.sas";
  if first.c then do;
    if _N_ = 1 then put "if old_code = " c " then do;";
    else put "else if old_code = " c " then do;";
    put "  if " lpercent "< y <= " cumpct " then new_code = " ncode ";";
  end;
  else if not first.c then do;
    put "  if " lpercent "< y <= " cumpct " then new_code = " ncode ";";
  end;
  if last.c then do;
  put "end;";
  end;
run;

The following does the same for industry:

PROC IMPORT OUT= WORK.temp 
            DATAFILE= "I:\ind_90-00.xls" 
            DBMS=EXCEL REPLACE;
     RANGE="Sheet1$"; 
     GETNAMES=YES;
     MIXED=NO;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

data temp1;
  set temp(drop = f5 f7 f8);
  rename Table_1___1990_Census_Industry_C = oldcode
    f2 = oldlabel f3 = newcode f4 = newlabel f6 = percent;
  if _N_ < 5 then delete; 
  if index (Table_1___1990_Census_Industry_C, 'categor')>0 then delete;
run;

data temp1(drop = oldcode oldlabel newcode);
  retain c clabel;
  length clabel $ 65;
  set temp1;
  if oldcode ne ' ' then c = input(oldcode, 4.);
  if oldlabel ne ' ' then clabel = oldlabel;
  ncode = input(newcode, 4.);
  if percent = . then delete;
run;

data temp2;
  retain c ncode;
  set temp1; by c;
  retain cumpct;
  if first.c then do; 
    cumpct = percent;
    lpercent = 0;
  end;
  else do;
    lpercent = cumpct;
    cumpct = cumpct + percent;
  end;
run;

data _null_;
  set temp2; by c;
  file "I:\CPS\SAS Programs\ind1990_2000_xwalk.sas";
  if first.c then do;
    if _N_ = 1 then put "if old_code = " c " then do;";
    else put "else if old_code = " c " then do;";
    put "  if " lpercent "< x <= " cumpct " then new_code = " ncode ";";
  end;
  else if not first.c then do;
    put "  if " lpercent "< x <= " cumpct " then new_code = " ncode ";";
  end;
  if last.c then do;
  put "end;";
  end;
run;

Now there are two SAS programs to be included. First a search and replace is used to replace the "old_code" to PEIO1OCD (occupation) and PEIO1ICD (industry) and "new_code" to NEW_OCC and NEW_IND respectively (for example). It would have been simpler to have SAS write them out but I wanted to be able to use this if some other data set used a different variable name and I needed to remember which was old and new. The following is a snippet to do the actual recode:

  call streaminit (6875309&yr);
  x  = rand('uniform') * 100;
  y  = rand('uniform') * 100;
  
  %include "occ1990_2000_xwalk.sas";
  %include "ind1990_2000_xwalk.sas";

Here's a snippet of what the actual code will look like (for occupation):

if peio1ocd = 3  then do;
  if 0 < y <= 100  then peio1ocd_r = 3 ;
end;
else if peio1ocd = 4  then do;
  if 0 < y <= 77.143  then peio1ocd_r = 1 ;
  if 77.143 < y <= 100  then peio1ocd_r = 43 ;
end;
else if peio1ocd = 5  then do;
  if 0 < y <= 1.571  then peio1ocd_r = 1 ;
  if 1.571 < y <= 4.712  then peio1ocd_r = 2 ;
  if 4.712 < y <= 6.806  then peio1ocd_r = 10 ;
  if 6.806 < y <= 8.377  then peio1ocd_r = 11 ;
  if 8.377 < y <= 13.613  then peio1ocd_r = 12 ;
  if 13.613 < y <= 14.66  then peio1ocd_r = 13 ;
  if 14.66 < y <= 16.754  then peio1ocd_r = 15 ;
  if 16.754 < y <= 17.278  then peio1ocd_r = 16 ;
  if 17.278 < y <= 17.802  then peio1ocd_r = 22 ;
  if 17.802 < y <= 19.896  then peio1ocd_r = 23 ;
  if 19.896 < y <= 20.42  then peio1ocd_r = 30 ;
  if 20.42 < y <= 20.944  then peio1ocd_r = 36 ;
  if 20.944 < y <= 21.991  then peio1ocd_r = 41 ;
  if 21.991 < y <= 26.703  then peio1ocd_r = 42 ;
  if 26.703 < y <= 67.541  then peio1ocd_r = 43 ;
  if 67.541 < y <= 71.73  then peio1ocd_r = 54 ;
  if 71.73 < y <= 73.824  then peio1ocd_r = 62 ;
  if 73.824 < y <= 76.965  then peio1ocd_r = 81 ;
  if 76.965 < y <= 77.489  then peio1ocd_r = 84 ;
  if 77.489 < y <= 82.725  then peio1ocd_r = 93 ;
  if 82.725 < y <= 88.484  then peio1ocd_r = 122 ;
  if 88.484 < y <= 89.008  then peio1ocd_r = 164 ;
  if 89.008 < y <= 91.102  then peio1ocd_r = 202 ;
  if 91.102 < y <= 93.72  then peio1ocd_r = 211 ;
  if 93.72 < y <= 94.244  then peio1ocd_r = 215 ;
  if 94.244 < y <= 94.768  then peio1ocd_r = 354 ;
  if 94.768 < y <= 95.292  then peio1ocd_r = 382 ;
  if 95.292 < y <= 95.816  then peio1ocd_r = 395 ;
  if 95.816 < y <= 100.005  then peio1ocd_r = 525 ;
end;

Malaise Precis

Sunday, February 13, 2011

Convert pre-2002 CPS industry and occupation codes

No comments:

About Me