Sample Design & Screening Process

National Longitudinal Survey of Youth - 1997 Cohort

Sample Design & Screening Process

The following information provides details on NLSY97 sampling and screening procedures.

Sampling Procedures

Screening Procedures

Sampling Procedures

Important Information

To correct for sample clustering, two survey design variables were added to the dataset:

R14897.00  [VSTRAT], VARIANCE STRATUM: Variable for use with variance PSU to correct for clustering in the sample design. The stratum reflects the first-stage units for the initial sampling of NLSY97 respondents. This variable should be used in conjunction with VPSU.

R14898.00  [VPSU], VARIANCE PSU: Variable for use with variance stratum to correct for clustering in the sample design. The VPSU reflects the second-stage units for the initial sampling of NLSY97 respondents. There are two second-stage units (VPSU) for each first-stage unit (VSTRAT). This variable should be used in conjunction with VSTRAT.

The NLSY97 cohort comprises two independent probability samples: a cross-sectional sample and an oversample of black and/or Hispanic or Latino respondents. The cohort was selected using these two samples to meet the survey design requirement of providing sufficient numbers of black and Hispanic or Latino respondents for statistical analysis. 

The NLSY97 cohort was selected in two phases, as pictured in Figure 1. In the first phase, a list of housing units for the cross-sectional sample and the oversample was derived from two independently selected, stratified multistage area probability samples. This ensured an accurate representation of different sections of the population defined by race, income, region, and other factors. In the second phase, subsamples of the eligible persons identified in the first phase were selected for interview.

Figure 1. Selection of NLSY97 Respondents

The listing of eligible housing units was composed of 96,512 households, defined as a single room or group of rooms intended as separate living quarters for a family, for a group of unrelated persons living together, or for a person living alone. The list of housing units for each sample was selected in the following manner: First, 100 primary sampling units (PSUs) for each sample were chosen from the National Opinion Research Center's (NORC) 1990 master probability sample of the United States. NORC is the organization that was contracted to manage the sampling process. Note: There are 100 PSUs in the cross-sectional sample and 100 PSUs in the oversample; however, some PSUs were selected in both samples. Thus, there are a total of 147 non-overlapping PSUs included in the NLSY97. In the cross-sectional sample, each PSU represented either a metropolitan area or one or more non-metropolitan counties with a minimum of 2,000 housing units. The supplemental sample defined PSUs differently from the cross-sectional sample; counties containing large percentages of minorities were merged to create areas containing a minimum of 2,000 housing units. Second, regardless of sample, segments containing one or more adjoining blocks-and at least 75 housing units-were selected from each PSU. Finally, a subset of housing units within the segment comprised the listing of households eligible for interview.

The second phase identified all NLSY97-eligible individuals born between 1980 and 1984 (age 12 to 16 as of December 31, 1996) in each household. NORC interviewers went to the households and administered a short interview called the simple screener, a portion of the Screener, Household Roster, and Nonresident Roster Questionnaire, which collected the age or date of birth of every person linked to a particular household. The survey collected these data for more than 150,000 people. In cross-sectional sampling units, if the household included one or more occupants in the eligible age range, interviewers asked those individuals to participate in the first NLSY97 interview. In supplemental sampling units, the interviewer continued with the extended screener, which established the race and ethnicity of household members. If a person of the correct age and of black or Hispanic or Latino race/ethnicity resided in the household, he or she was asked to participate in the survey. Any person in the above age range who completed the first round interview is considered a member of the NLSY97 cohort. Base-year interviews were conducted between January and early October 1997 and between March and May 1998 (see Interview Methods for details). Of the 9,907 individuals selected for interview during household screenings, a total of 8,984 (90.7 percent) were interviewed.

Table 1. NLSY97 Round 1 Interview Completion

Sample Eligible for Interviewing Interviewed Round 1 Percent
Total Cohort 9907 8984 90.7%
Cross-Sectional Sample 7391 6748 91.3%
Supplemental Sample 2516 2236 88.9%

During the NLSY97 screening process, two additional nationally representative samples were identified to participate in the administration of the CAT-ASVAB. The first group, the Student Testing Program (STP), consisted of students who expected to be in the 10th through 12th grades in the fall of 1997. Included were many respondents who also participated in the main NLSY97 survey, as well as youths who refused to participate in or were not eligible for the NLSY97. The second sample, the Enlistment Testing Program (ETP), was a nationally representative sample of youths 18 to 23 years old as of June 1, 1997. This group provided the normative information used by the Department of Defense to determine the score distribution of military-eligible youths and to help assess the impact of these tests on minority and female military eligibility.

Cross-Sectional Sample

For the cross-sectional sample, 54,179 screening interviews were carried out among 1,149 sample segments in 100 primary sampling units (PSUs), drawn from the NORC master probability sample of the United States. The cross-sectional screening established three samples:

    1. Main NLSY97 Sample: A cross-sectional sample designed to be representative of young people living in the United States during round 1 and born January 1, 1980, through December 31, 1984. This sample is designed to maximize the statistical efficiency of samples through the several stages of sample selection (counties, enumeration districts, blocks, sample listing units). Probabilities of selection are based upon total housing units in a geographic area. Following the initial screening process, 7,327 individuals from the cross-sectional sample were designated to be interviewed in the NLSY97 survey; of those, 92.1 percent, or 6,748 respondents, completed the round 1 interview.
    2. Department of Defense Student Testing Program (STP) Sample: A nationally representative sample of students living in the United States during round 1 and born June 2, 1973, through December 31, 1984, who were in grades 9-11 in the spring or summer of 1997, were not enrolled during the spring and summer but expected to be in grades 10-12 in the fall of 1997, or were enrolled in grades 10-12 during the fall of 1997. (See the Administration of the CAT-ASVAB section of this guide for more information.) Some NLSY97 respondents were also eligible for the STP sample.
    3. Department of Defense Enlistment Testing Program (ETP) Sample: A cross-sectional sample designed to be representative of the noninstitutionalized segment of young people living in the United States during round 1 and born June 2, 1973, through June 1, 1979. 

    Supplemental Sample

    Statistically efficient samples of black and Hispanic or Latino respondents were created by oversampling these minorities in 100 PSUs in NORC's national sample. For the supplemental sample, 21,112 screening interviews were conducted in 599 sample segments. The supplemental screening produced three samples:

      1. NLSY97 Black and Hispanic or Latino Oversample: A supplemental sample designed to oversample Hispanic or Latino and black respondents living in the United States during round 1 and born January 1, 1980, through December 31, 1984. Stratification specifically relevant for Hispanics or Latinos and blacks was used. While the main NLSY97 sample was selected based on housing units, the black and Hispanic or Latino oversamples were based on the number of blacks and Hispanic or Latinos living in a particular geographic area. The larger the percentage of blacks and Hispanic or Latinos living in a particular area, the greater the chance the area was selected for the oversample. Once an area was selected for the oversample, the housing units in that area were screened to ensure only blacks or only Hispanic or Latinos were given the opportunity to complete the first-round interview. After screening, 2,479 individuals from the supplemental sample were designated for interview in the NLSY97, and of these, 90.2 percent, or 2,236 respondents, completed the round 1 interview.
      2. Department of Defense STP Sample: A nationally representative sample of students, selected regardless of race and/or ethnicity, living in the United States during round 1 and born June 2, 1973, through December 31, 1984. Members of this sample are those who-depending on the time of the household screening-were in grades 9-11 in the spring or summer of 1997, were not enrolled during the spring and summer but expected to be in grades 10-12 in the fall of 1997, or were enrolled in grades 10-12 during the fall of 1997.
      3. Department of Defense ETP Black and Hispanic or Latino Oversample: A sample of black and Hispanic or Latino youths living in the United States during round 1 and born June 2, 1973, through June 1, 1979.

       

      Data Hint

      Users can identify whether each respondent was a member of the cross-sectional or supplemental sample type by referring to the sample type variable (CV_SAMPLE_TYPE, R12358.00).

      Screening Procedures

      The screening interview was completed in 75,291 housing units. These interviews occurred in 1,748 sample segments of 147 non-overlapping PSUs, including most of the fifty states and the District of Columbia (There are 100 PSUs in the cross-sectional sample and 100 PSUs in the oversample; however, some PSUs were selected in both samples. Thus, there are a total of 147 non-overlapping PSUs included in the NLSY97). The screening interview was designed to elicit information allowing identification of household occupants eligible for inclusion in the NLSY97 sample. The NLSY97 screening interviews were completed within 94.1 percent of the cross-sectional and 93.1 percent of the supplemental occupied housing units selected for screening. Table 1 presents a summary of completed interviews in round 1.

      Sampling procedures were developed to establish links between housing units in the sample PSUs and individuals who might be temporarily absent. As part of the screening process, household informants were asked if there were any persons for whom the housing unit was the usual place of residence but who were away from the housing unit at the time of the survey. Included in this group were college students, persons in the military, and persons in prisons or other institutions. Sampling procedures were also established for those residing in a selected housing unit whose usual place of residence was elsewhere. Table 2 lists the NLSY97 status (e.g., included in the sample, excluded, or restricted) for youths not in their usual residence at the time of the survey.

      Table 2. NLSY97 Sampling Status of Youths by Housing Arrangement

      Housing arrangement Status
      Exchange students Included if the youth lived in the sample housing unit for at least six months during 1997.
      Youths whose temporary residence was a group quarters structure (e.g., prisons, boarding school, college dormitories) Included if their usual place of residence was in a selected PSU. Excluded otherwise.
      Youths whose usual place of residence was not in a selected PSU, but whose temporary residence was within a PSU Excluded.
      Youths in a foreign school Included.
      Youths linked to two or more housing units If the respondent's mother is alive and her housing unit is in a sample housing unit, the youth is linked there. Otherwise, the youth is linked to the father's housing unit. If neither the mother nor the father is living in a sample housing unit, the youth is linked to one of the sample housing units at random.
      Youths who cannot be linked to any other housing unit Included if the youth is residing at a sample housing unit when the screening interview is conducted.

      Multiple Respondent Households

      In the NLSY97 cohort, 8,984 respondents originated from 6,819 unique households. Because the sample design selected all household residents in the appropriate age range, 1,862 households included more than one NLSY97 respondent. Table 3 lists the numbers of respondents living in multiple respondent households during the initial survey round. The most common relationship between multiple respondents living in the same household during the first round was that of siblings.

      Table 3. Round 1 Distribution of NLSY97 Respondents by Household Type

      Type Respondents Households
      1 Respondent 4957 4957
      Total Multiple Respondents 4027 1862
         2 Respondents 3192 1596
         3 Respondents 705 235
         4 Respondents 100 25
         5 Respondents 30 6
      Total 8984 6819
       
      Note: Table 3 is based on the household ID code (QNAME=SIDCODE, reference number=R11930.).

      Siblings in the NLSY97

      The sample design, which selected every eligible person connected to the housing unit, generated a sample of siblings living in the same housing unit and satisfying the NLSY97 age restrictions. However, the NLSY97 samples do not contain nationally representative samples of siblings of all ages and living arrangements. Care should be used in generalizing from the findings of sibling studies based on the NLSY97. Table 4 shows the numbers of sibling groups in the NLSY97.

      Table 4. Round 1 Distribution of NLSY97 Sibling Groups

      Type Respondents
      No Siblings 5129
      Total Multiple Siblings 3855
         2 Siblings 3134
         3 Siblings 627
         4 Siblings 84
         5 Siblings 10
      Total 8984
      Note: Table based on the household ID code (SIDCODE) and the relationship variables from the round 1 household roster
      (HHI2_RELx.xx). Siblings include biological, adoptive, half-, and step- relationships but not foster relationships.

      Other technical information on the sample assignment process can be found in (1) the Field Interviewer Reference Manual, which includes a copy of the screening instrument, and (2) the Technical Sampling Report, which describes the NLSY97 sample selection procedures for both subsamples.