Tutorial: Constructing Comparable Samples across the NLSY79 and NLSY97

Example: Constructing work status at age 20 for both samples

Step 3: Create a tagset of variables to define work status at age 20 for the NLSY79

Now we'll show an example of looking at respondents in each cohort at a given age--age 20. By the 2006 interview, all NLSY97 respondents are over 20 years old, and by 1985 all NLSY79 respondents are over 20 years old.

Let's define the following variable for both cohorts: work status during the week that includes October 1st for the year the respondent turns 20. We'll need year of birth and weekly labor force status variables from the work history arrays for that particular week for both cohorts.

  1. In the NLSY79, respondents were born in the years 1957 through 1964. That means the year the respondents turn 20 ranges from 1977 to 1984. Note that the work history arrays begin on January 1, 1978, so we’ll exclude the 1957 birth year from our analysis.
  2. If we turn to Appendix 18 of the NLSY79 Codebook Supplement, we'll find a table that tells us the week numbers in the work history arrays that correspond to each date. Two numbers are given, the week of the year and the week of the array. For example, in 1979, the week of October 1st is week number 39 in 1979, but week number 92 in the work-history array (number of weeks since 1/1/78). Given the layout of the data, the latter number is what we need. (In the NLSY97 the opposite is true.) We want to find the week numbers that correspond to the week of October 1st for the years 1978-1984--the years our respondents turn age 20. Here's what we need: week 40 in 1978, 92 in 1979, 144 in 1980, 196 in 1981, 248 in 1982, 300 in 1983, and 353 in 1984.
  3. Searching on the "Work History-Weekly Labor Status" Area of Interest will give us the weekly labor force status arrays in the NLSY79. Tag the variables that correspond with the list above (W0065200, W0070400, W0110300, W0150200, W0190100, W0230000, W0270500).
  4. We'll also need variables for year of birth (R0000500), sample composition (R0173600), and, of course, respondent ID (R0000100). Note that respondent ID is preselected in the new Investigator.
  5. Now create an extract of your NLSY79 data set. The variables included are as follows:

Reference Number Question Name Question Title Year
R00001.00 CASEID Identification Code 1979
R00005.00 S01Q01A Date of Birth - Year 1979
R01736.00 S24Q01 Sample Identification Code 1979
W00652.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 40 1979
W00704.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 92 1979
W01103.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 144 1980
W01502.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 196 1981
W01901.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 248 1982
W02300.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 300 1983
W02705.00 STATUS_WK_NUM0040 Labor Force Status (1978) Week 353 1984

 

Step 4: Create a tagset of variables to define work status at age 20 for the NLSY97

Now we need to create a similar tagset for the NLSY97.

  1. In the NLSY97, respondents were born in the years 1980 through 1984. That means the year the respondents turn 20 ranges from 2000 to 2004.
  2. If we turn to Appendix 7 of the NLSY97 Codebook Supplement, we'll find Table 1, an Excel spreadsheet, which tells us the week numbers in the work history arrays that correspond to each date. Two weekly numbers are given, the week of the year and the week of the array. For example, in 2000, the week that includes October 1st is week number 41 in 2000, but week number 1084 in the work-history array (number of weeks since 1/1/80). Given the layout of he data, the former number is what we need. Here's what we need: week 41 in 2000, 40 in 2001, 40 in 2002, 40 in 2003, and 40 in 2004.
  3. Searching on the Survey Year = "XRND", Word in Title (enter search term) contains "Status", and Word in Title (enter search term) contains "Employment" gives us variables that include the weekly labor force status arrays in the NLSY97. Tag the variables that correspond with the list above (R8812500, R8908000, R9043500, R9048700, R9179400). Note these variables have convenient question names in the data set: EMP_STATUS_year.week number.
  4. We'll also need variables for year of birth (R0536402), sample composition (R1235800), and, of course, respondent ID (R0000100). Note that all three variables are preselected in the new Investigator.
  5. Now create an extract of your NLSY97 data set. The variables included are as follows:

Reference Number Question Name Question Title Year
R00001.00 PUBID PUBID, Youth Case Identification Code 1997
R05364.02 KEY!BDATE_Y KEY!BDATE, Rs Birthdate Month/Year (Symbol) 1997
R12358.00 CV_SAMPLE_TYPE Sample Type. Cross-Sectional or Oversample 1997
R88125.00 EMP_STATUS_2000.41 2000 Employment: Employment Status in Week 41 XRND
R89080.00 EMP_STATUS_2000.40 2001 Employment: Employment Status in Week 40 XRND
R90435.00 EMP_STATUS_2000.40 2002 Employment: Employment Status in Week 40 XRND
R90487.00 EMP_STATUS_2000.40 2003 Employment: Employment Status in Week 40 XRND
R91794.00 EMP_STATUS_2000.40 2004 Employment: Employment Status in Week 40 XRND

 

Step 5: Construct work status variable for both samples

Now that we have our two data sets from Steps 3 and 4, we're ready to start programming our variables. The logic of this is as follows:

  1. We'll start by restricting our NLSY79 data to the cross-section and oversamples of black, and Hispanic individuals. These are the same sample types available in the NLSY97.
  2. Next, we'll want to look at the definitions of the employment status variables in both cohorts. We want to define an indicator variable equal to 1 if the respondent is working in a civilian or military job during the week that includes October 1st in the year he or she turns 20 and 0 if the respondent is not working.
  3. If we look at the codebook for the employment status variables in the NLSY79, we can see that a value of 100 or more means the respondent was working in a civilian job in that week. A value of 7 means they were in the military. The values 2, 4, and 5 correspond to not working in the particular week, and we will treat 0 and 3 as missing information.
  4. Similarly, the codebook for the employment status variables in the NLSY97 indicates that a value of 9701 or more means the respondent was working in a civilian job in that week. A value of 6 means they were in the military. The values 1, 2, 4, and 5 correspond to not working in the particular week, and again, we will treat 0 and 3 as missing information.

Sample programming code for these steps is available in SAS and STATA.

Additional Information

Final statistics from program:
work79_20 (mean = .604, N =8603)
work97_20 (mean = .665, N =8435)

Next Step: This tutorial focuses on forming comparable variables that measure work status at a given age in the cross-sectional sample and over-samples of blacks and Hispanics in the NLSY79 and NLSY97. One could calculate additional comparable variables in the two surveys over many different domains across different ages and years, such as labor market experience and number of jobs held, marital status and transitions, and fertility.