Skip to main content
National Longitudinal Survey of Youth 1997 (NLSY97)

Cross-Cohort Harmonization Dataset

This is the beta release of a dataset to harmonize NLS data across the various cohorts. For now, we have harmonized data available for the NLSY79 and NLSY97. We see three benefits in this harmonization project:

  • It will make it easier to perform cross-cohort comparisons.
  • Researchers doing more complex analyses can use data from this system as background independent variables with minimal added work. This will allow them to concentrate on more complicated variables for their analysis. It will insure that variables across cohorts are calculated in as close to the same fashion as possible.
  • By providing standardized variables across cohorts, we hope to provide an entry for novice NLS users. We hope this will get more NLS data into classrooms.

The idea is that in the longer run we may have a set of variables that are comparable across all the cohorts and could be simply selected into one dataset for analysis with a minimum of effort. Included would be documentation that would explain differences across surveys as well as across rounds within the same survey.

Methodology

This dataset contains a subset of all variables in the datasets: at present only around 15, although in the future there could be over 100. Many are created variables. The goal is to have variables that can be used directly or with minimal manipulation in an analysis. All tell status as of the interview date. Analyses of data on a different date (say on the 29th birthday or the date of first marriage) is beyond the scope of this dataset and would have to be calculated by the researcher.

ID numbers have been standardized to 7 digits by adding a two-digit prefix that represents the survey. For example, ID 9534 in the NLSY79 is 7909534 in this dataset. This makes merging in data from the individual survey datasets fairly easy.

Variables that are not one-time variables are indexed by year and age. Since it is possible for sample members to be interviewed twice in the same year or at the same age, year and age are standardized. Year is the year in which interviewing started for that round of data collection. Age is year in which interviewing started minus (year of birth + 1), e.g., 1997 – (1984 + 1) = 12. Tables 1 (NLSY97) and 2 (NLSY79) display how this assumption works. For example, all NLSY97 respondents born in 1980 will have their round 1 interview data under age 16 and year 1997, even though they may have turned 17 by the interview date, or have been interviewed in 1998. This age and year breakdown facilitates comparative work across cohorts in a particular year or at comparable ages. Variable names are standardized across cohorts with the ending indicating from which year or which age the observation comes. For example, MARST99 is marital status in 1999, while MARSTA24 is marital status at age 24.

Table 1. Assumed Age and Year in Harmonization: NLSY97
Interview Round Survey Year Year of Birth
1980 1981 1982 1983 1984
1 1997 16 15 14 13 12
2 1998 17 16 15 14 13
3 1999 18 17 16 15 14
4 2000 19 18 17 16 15
5 2001 20 19 18 17 16
6 2002 21 20 19 18 17
7 2003 22 21 20 19 18
8 2004 23 22 21 20 19
9 2005 24 23 22 21 20
10 2006 25 24 23 22 21
11 2007 26 25 24 23 22
12 2008 27 26 25 24 23
13 2009 28 27 26 25 24
14 2010 29 28 27 26 25
15 2011 30 29 28 27 26
16 2013 32 31 30 29 28
17 2015 34 33 32 31 30
Table 2. Assumed Age and Year in Harmonization: NLSY79
Interview Round Survey Year Year of Birth
1957 1958 1959 1960 1961 1962 1963 1964
1 1979 21 20 19 18 17 16 15 14
2 1980 22 21 20 19 18 17 16 15
3 1981 23 22 21 20 19 18 17 16
4 1982 24 23 22 21 20 19 18 17
5 1983 25 24 23 22 21 20 19 18
6 1984 26 25 24 23 22 21 20 19
7 1985 27 26 25 24 23 22 21 20
8 1986 28 27 26 25 24 23 22 21
9 1987 29 28 27 26 25 24 23 22
10 1988 30 29 28 27 26 25 24 23
11 1989 31 30 29 28 27 26 25 24
12 1990 32 31 30 29 28 27 26 25
13 1991 33 32 31 30 29 28 27 26
14 1992 34 33 32 31 30 29 28 27
15 1993 35 34 33 32 31 30 29 28
16 1994 36 35 34 33 32 31 30 29
17 1996 38 37 36 35 34 33 32 31
18 1998 40 39 38 37 36 35 34 33
19 2000 42 41 40 39 38 37 36 35
20 2002 44 43 42 41 40 39 38 37
21 2004 46 45 44 43 42 41 40 39
22 2006 48 47 46 45 44 43 42 41
23 2008 50 49 48 47 46 45 44 43
24 2010 52 51 50 49 48 47 46 45
25 2012 54 53 52 51 50 49 48 47
26 2014 56 55 54 53 52 51 50 49

Variables

Table 3 displays the background variables for the NLSY79 and NLSY97 harmonization beta release.

Table 3: Harmonization Variables, Fixed Background Variables
Variable Description
CASEID Cross-cohort identification code
COHORT 79 or 97
SAMPLE_SEX from round 1 1 Male
2 Female
SAMPLE_RACE from round 1, NLSY79 has no mixed race 1 Black
2 Hispanic
3 Non-Black / Non-Hispanic
4 Mixed race (Non-Hispanic)
BIRTHDATE~M Birth Date – month
BIRTHDATE~Y Birth Date – year
AFQT Armed Forces Qualifying Test percentile score
Variable AFQT_3 from the NLSY79 and ASVAB_MATH_VERBAL_SCORE_PCT from the NLSY97

Table 4 shows the variables that vary by age and year for the NLSY79 and NLSY97 harmonization beta release.

Table 4: Harmonization Variables, by Age and Year
Variable Description Age Year
INTDATE~M Interview month

X

X

INTDATE~Y  Interview year

X

X

RFNI
 
Reason for non-interview
0 Interviewed
1 Refusal
2 Not able to locate
3 Deceased
4 Not fielded due to prior refusals
5 In sample that was dropped
6 Other

X

X

AGEMONTHS Age in months at interview date It is calculated by subtracting the month of birth from the month of the interview and adding 12 times the difference in year between birth and interview. The day of the month of birth or interview is not used to maintain confidentiality.

X

X

MARSTAT
 
Marital Status at interview date
0 Never-married
1 Married
2 Separated
3 Divorced
4 Widowed

X

X

HIGRATT Highest Grade attended at interview date

X

X

HIGRCOMP

Highest Grade completed at interview date

These are from self-reports, and are not the edited created variables for highest grade completed also available in both datasets.

X

X

EMPSTAT

Employment status at interview date
1 EMPLOYED
0 NO INFO REPORTED FOR WEEK
2 NOT WORKING (UNEMP V. OLF NOT DETERMINED)
3 ASSOC. WITH EMP, GAP DATES MISSING, ALL TIME NOT ACCTD FOR 4 UNEMPLOYED
5 OUT OF LABOR FORCE
7 ACTIVE MILITARY SERVICE

EMP_STAT variables are created using the work history arrays which are loaded with week-by-week records of the respondent’s labor force status.

X

X

Important information: Employment status (EMPSTAT) for younger ages/early interview years

In the NLSY79, weekly employment history arrays mostly begin at age 16 (or January 1, 1978 if older than 16 in first round). In the NLSY97, weekly employment history arrays mostly begin at age 14. Therefore, although there may be information in the weekly employment history arrays for younger ages or years that are used to make the variable EMPSTAT, it should be used with caution as most respondents won’t have that information for young ages/early years.