Skip to main content

NLSY79

Standard Errors & Design Effects

This section contains information on standard errors and design effects for the NLSY79 sample, briefly discussing how to use these two statistical factors. It then includes tables for the first round and for 1996 through 2022. Users interested in the intervening years should review the Technical Sampling Report and Technical Sampling Report Addendum.

Standard errors have been explicitly computed for a number of statistics based upon the entire NLSY79 sample (total, civilian, and military) and a number of sex or race subclasses. Standard errors for other statistics (defined over the entire sample or the subclasses) may be approximated with use of the DEFT factors given in the linked tables. Users who examine the tables will note that CHRR has calculated standard errors for different variables over time.

Approximate standard errors: Percentages

The following formula approximates a standard error of a percentage:

se(P) approximately equal to DEFT times √P(100-P) divided by √n

where
se(P) = the approximate standard error for the percentage of P
P = the sample percentage (ranging from 0 to 100)
n = the actual unweighted sample size for the demographic subclass from which the percentage was developed
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the percentage was developed

For example, for 1996 the appropriate DEFT factor for estimating a standard error of the percentage of Hispanic or Latino males who were high school dropouts is 1.17744 (see proportion column, row seven of Table 2. Deft factors for round 17, 1996). Assuming the calculated sample (P) equals 22.19 percent and the unweighted sample estimate size is 946, then:

se(P) approximately equal to 1.17744 times √22.19(100-22.19) divided by √946

To approximate the standard error of the corresponding projected population total (NP/100), calculate:

se(NP divided by 100) approximately equal to N[se(P) divided by 100]

where
se(NP/100) = the approximate standard error of the projected population total corresponding to a percentage P within a particular demographic subclass and sample type
N = the appropriate projected total population base for the particular demographic subclass and sample type

For example, if the projected total population base for Hispanic or Latino males is 1,030,861, the projected number of civilian Hispanic or Latino male high school dropouts is equal to NP/100 or 1,030,861 * 22.19/100 = 228,748. Thus, the approximate standard error for the total number of Hispanic or Latino male high school dropouts is:

se(NP divided by 100) approximately equal to 1,030,861 times (1.5907 divided by 100) which is approximately 16,397.9

Note: 1.5907 came from the previous calculation.

Approximate standard errors: Means

One can compute approximate standard errors for means as follows:

se(X) approximately equal to DEFT times √(s squared divided by n)

where
se(X) = the approximate standard error of the mean
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the mean was developed
S2 = the weighted element variance computed for the demographic subclass and sample type from which the mean was developed
n = the unweighted sample size for the particular mean

For example, for 1979 the DEFT factor for all Hispanics or Latinos is 1.45699 (see means column, row four of Table 1. Deft factors for round 1, 1979). To approximate the standard error of the mean number of years of education completed by this subclass, where the weighted element variance is .72955 and the sample size is 77, compute:

se(X) approximately equal to 1.45699 times √(.72955 divided by 77) which is approximately .1418

Design effects

Because the samples are multi-stage, stratified random samples instead of simple random samples, respondents tend to come in geographic clusters and clusters of persons tend to be alike in a variety of ways for a variety of reasons. (For more information on the sampling and screening process, users are referred to section on Sample Design & Screening Process in this guide.) For example, there may be cultural differences by locality or ecological differences in labor market conditions. Depending upon the degree of this homogeneity, the conventionally computed standard deviations for the variables, which assume a simple random sample, may be too small. However, by controlling the rate at which particular strata are sampled, multi-stage, stratified random samples can improve upon simple random samples. The ratio of the correct standard error to the standard error computed under the assumption of a simple random sample is known as the design effect. The technical sampling report for the NLSY79 (Frankel, Williams, and Spencer 1983) and its addendum (CHRR) provide design effects for the various strata.

A single design effect that can be broadly applied to regression analysis cannot be constructed. To illustrate the approximate size of design effects in regression analysis, a regression of rate of pay for the CPS job in 1979 was estimated using race, sex, marital status, and education as explanatory variables. Assuming each of the roughly 200 PSUs has the same number of respondents in the sample of 5,724 persons with observed wages, the design effect was calculated to be 1.52; that is, the true standard errors were larger than the naively computed standard errors by a factor of 1.52. When this exercise was repeated for rate of pay on the CPS job in 1986, the design effect had fallen to 1.37.

This reduction reflects the fact that mobility tends to mix the respondents more uniformly through the country, reducing the clustering of the sample. Many of the persons who started out in the same PSU will have moved to different areas and, hence, no longer share unobservable labor market conditions. These shared unobservable labor market conditions are likely responsible for the spatial correlation of the error terms which generate design effects. Thus, another advantage of longitudinal data is the lessening of design effects over time.

By examining the Geocode data for the NLSY79, it is possible to control for some of the environmental factors generating design effects or, if desired, to compute design effects based upon county or metropolitan area clusters which continue to be present. To facilitate study of design effects, scrambled PSU codes from the 1979 survey are available to persons with authorized access to the NLSY79 Geocode data.

The Technical Sampling Report and Technical Sampling Report Addendum also provide information on design effects.

Click below to view the DEFT and standard errors tables.

Table. Deft factors for round 1, 1979

Demographic Group

Proportions Means

All Youth

1.72547 1.71282

Males

1.46605 1.56808

Females

1.58029 1.49720

Hispanics or Latinos

1.44342 1.45699

Blacks

1.35303 1.43730

Non-black/non-Hispanics

1.58686 1.56996

Hispanic or Latino Males

1.24321 1.22329

Hispanic or Latino Females

1.40353 1.25095

Black Males

1.19457 1.21378

Black Females

1.24877 1.25243

Non-black/non-Hispanic Males

1.33775 1.45962

Non-black/non-Hispanic Females

1.46889 1.37581
Table. Deft factors for round 17, 1996

Demographic Group

Proportions Means

All Youth

1.35848 1.967232

Males

1.28523 1.667333

Females

1.24536 1.621727

Hispanics or Latinos

1.28275 1.584298

Blacks

1.19735 1.423025

Non-black/non-Hispanics

1.19087 1.713184

Hispanic or Latino Males

1.17744 1.407125

Hispanic or Latino Females

1.13217 1.264911

Black Males

1.16541 1.174734

Black Females

1.13258 1.319091

Non-black/non-Hispanic Males

1.13217 1.456022

Non-black/non-Hispanic Females

1.09545 1.405347
Table. Deft factors for round 18, 1998

Demographic Group

Proportions Means

All Youth

1.38301 1.96469

Males

1.30836 1.66433

Females

1.28311 1.60000

Hispanics or Latinos

1.21917 1.52807

Blacks

1.19164 1.40890

Non-black/non-Hispanics

1.17937 1.67481

Hispanic or Latino Males

1.19248 1.37659

Hispanic or Latino Females

1.13418 1.25100

Black Males

1.14336 1.12694

Black Females

1.12088 1.31529

Non-black/non-Hispanic Males

1.18195 1.43353

Non-black/non-Hispanic Females

1.11028 1.37133
Table. Deft factors for round 19, 2000

Demographic Group

Proportions Means

All Youth

1.36423 1.90919

Males

1.26007 1.61864

Females

1.21244 1.58588

Hispanics or Latinos

1.24544 1.48492

Blacks

1.19954 1.42127

Non-black/non-Hispanics

1.20052 1.62327

Hispanic or Latino Males

1.19722 1.31909

Hispanic or Latino Females

1.09240 1.22474

Black Males

1.20277 1.18322

Black Females

1.08282 1.34907

Non-black/non-Hispanic Males

1.12750 1.39462

Non-black/non-Hispanic Females

1.13908 1.34907
Table. Deft factors for round 20, 2002

Demographic Group

Proportions Means

All Youth

1.34578 1.82757

Males

1.29701 1.58430

Females

1.18181 1.52807

Hispanics or Latinos

1.24097 1.47986

Blacks

1.20692 1.35647

Non-black/non-Hispanics

1.15085 1.56844

Hispanic or Latino Males

1.12450 1.28841

Hispanic or Latino Females

1.09479 1.21861

Black Males

1.20830 1.12694

Black Females

1.18743 1.33604

Non-black/non-Hispanic Males

1.20468 1.37659

Non-black/non-Hispanic Females

1.06829 1.30958

Important information: Deft tables for rounds 21 through the current public release

Users are cautioned that the figures in the proportion column for the last six categories are becoming much less relevant over time. The proportion DEFT column is based on education, training, marriage, and employment variables. Over time categories, such as black females, have only a few respondents in school or training, which causes the Deft factors to change from survey to survey. Broader categories, like "All Youth," "Males," and "Females" are more accurate to use.

Table. Deft factors for round 21, 2004

Demographic Group

Proportions Means

All Youth

1.38789 1.83712

Males

1.27377 1.55563

Females

1.23592 1.55081

Hispanics or Latinos

1.30336 1.46969

Blacks

1.14782 1.35831

Non-black/non-Hispanics

1.18163 1.57003

Hispanic or Latino Males

1.27083 1.31149

Hispanic or Latino Females

1.12750 1.19164

Black Males

1.14455 1.10454

Black Females

1.02896 1.37113

Non-black/non-Hispanic Males

1.09373 1.35647

Non-black/non-Hispanic Females

1.08224 1.32098
Table. Deft factors for round 22, 2006

Demographic Group

Proportions Means

All Youth

1.35881 1.81246

Males

1.23472 1.55563

Females

1.25553 1.52315

Hispanics or Latinos

1.13710 1.48661

Blacks

1.15994 1.33041

Non-black/non-Hispanics

1.14455 1.53460

Hispanic or Latino Males

1.15195 1.31719

Hispanic or Latino Females

1.00995 1.23085

Black Males

1.15247 1.09772

Black Females

1.11221 1.35647

Non-black/non-Hispanic Males

1.09636 1.32288

Non-black/non-Hispanic Females

1.08082 1.30192
Table. Deft factors for round 23, 2008

Demographic Group

Proportions Means

All Youth

1.31106 1.83712

Males

1.25599 1.60468

Females

1.22474 1.52315

Hispanics or Latinos

1.13235 1.43353

Blacks

1.16726 1.38203

Non-black/non-Hispanics

1.10855 1.56365

Hispanic or Latino Males

1.14837 1.27083

Hispanic or Latino Females

1.03870 1.18322

Black Males

1.14182 1.12916

Black Females

1.11467 1.34907

Non-black/non-Hispanic Males

1.09030 1.38564

Non-black/non-Hispanic Females

1.09829 1.28841
Table. Deft factors for round 24, 2010

Demographic Group

Proportions Means

All Youth

1.34024 1.80278

Males

1.26293 1.58745

Females

1.23288 1.48829

Hispanics or Latinos

1.19284 1.46116

Blacks

1.21295 1.36015

Non-black/non-Hispanics

1.12639 1.54434

Hispanic or Latino Males

1.19284 1.28452

Hispanic or Latino Females

1.11867 1.20208

Black Males

1.16458 1.10905

Black Females

1.13137 1.34907

Non-black/non-Hispanic Males

1.07877 1.37659

Non-black/non-Hispanic Females

1.03983 1.26886
Table. Deft factors for round 25, 2012

Demographic Group

Proportions Means

All Youth

1.34604 1.77682

Males

1.26681 1.55921

Females

1.24255 1.48757

Hispanics or Latinos

1.21171 1.46095

Blacks

1.19992 1.35592

Non-black/non-Hispanics

1.17951 1.52438

Hispanic or Latino Males

1.16338 1.24213

Hispanic or Latino Females

1.05880 1.20750

Black Males

1.11229 1.16998

Black Females

1.15019 1.32479

Non-black/non-Hispanic Males

1.14991 1.36160

Non-black/non-Hispanic Females

1.12411 1.25952
Table. Deft factors for round 26, 2014

Demographic Group

Proportions Means

All Youth

1.33370 1.77496

Males

1.25238 1.56764

Females

1.19779 1.50041

Hispanics or Latinos

1.15607 1.41956

Blacks

1.13520 1.38628

Non-black/non-Hispanics

1.18624 1.50758

Hispanic or Latino Males

1.15649 1.25180

Hispanic or Latino Females

1.06414 1.20324

Black Males

1.12620 1.19193

Black Females

1.00051 1.34394

Non-black/non-Hispanic Males

1.15447 1.35138

Non-black/non-Hispanic Females

1.18466 1.26346
Table. Deft factors for round 27, 2016

Demographic Group

Proportions Means

All Youth

1.40369 1.73651

Males

1.36746 1.53267

Females

1.23931 1.47176

Hispanics or Latinos

1.28005 1.44627

Blacks

1.10852 1.34987

Non-black/non-Hispanics

1.26546 1.47732

Hispanic or Latino Males

1.19194 1.22472

Hispanic or Latino Females

1.16081 1.23085

Black Males

1.10918 1.15997

Black Females

1.04381 1.30468

Non-black/non-Hispanic Males

1.21767 1.32061

Non-black/non-Hispanic Females

1.17469 1.24867
Table. Deft factors for round 28, 2018

Demographic Group

Proportions Means

All Youth

1.36769 1.72280

Males

1.29963 1.57090

Females

1.18347 1.46229

Hispanics or Latinos

1.23085 1.43839

Blacks

1.06561 1.30877

Non-black/non-Hispanics

1.21787 1.46098

Hispanic or Latino Males

1.12575 1.25443

Hispanic or Latino Females

1.10262 1.19304

Black Males

1.05849 1.15098

Black Females

0.97723 1.31684

Non-black/non-Hispanic Males

1.12186 1.35481

Non-black/non-Hispanic Females

1.11219 1.22446
Table. Deft factors for round 29, 2020

Demographic Group

Proportions Means

All Youth

1.36387 1.72145

Males

1.35466 1.56630

Females

1.12285 1.12285

Hispanics or Latinos

1.15142 1.15142

Blacks

1.05324 1.28861

Non-black/non-Hispanics

1.22780 1.45744

Hispanic or Latino Males

1.00312 1.22750

Hispanic or Latino Females

1.02489 1.21003

Black Males

0.95852 1.09251

Black Females

0.96780 1.34382

Non-black/non-Hispanic Males

1.16393 1.36001

Non-black/non-Hispanic Females

1.06213
1.19797
Table. Deft factors for round 30, 2022

Demographic Group

Proportions Means

All Youth

1.11022 1.71275

Males

1.10061 1.57639

Females

0.93109 1.39730
 

Hispanics or Latinos

1.03198 1.42323

Blacks

0.94075 1.31054

Non-black/non-Hispanics

0.99426 1.44663

Hispanic or Latino Males

0.96821 1.25049

Hispanic or Latino Females

0.93765 1.18866

Black Males

0.97012 1.17667

Black Females

0.82893  1.31188

Non-black/non-Hispanic Males

1.01556   1.35862

Non-black/non-Hispanic Females

0.84741 1.17163

Scroll right to view additional table columns or click the link at the bottom of each table to open in a new window.

Table. Standard errors for round 1, 1979
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.00471 0.00627 0.00545 0.01385 0.00835 0.00527 0.01744 0.01814 0.01232 0.00928 0.00710 0.00619

Proportion Attending High School

0.00735 0.00893 0.01006 0.01554 0.01151 0.00904 0.02176 0.02146 0.01460 0.01628 0.01085 0.01233

Proportion Attending College

0.00597 0.00729 0.00778 0.01037 0.00784 0.00710 0.01230 0.01460 0.00919 0.01119 0.00862 0.00947

Proportion High School Grad

0.00658 0.00776 0.00905 0.01277 0.01033 0.00785 0.01440 0.01957 0.01217 0.01448 0.00926 0.01094

Mean Years of School Completed

0.02900 0.04000 0.03800 0.08200 0.05700 0.03400 0.10000 0.10500 0.06100 0.07400 0.04600 0.04400

Mean Years of School Expected

0.04600 0.05900 0.04700 0.10800 0.06400 0.05500 0.12500 0.11700 0.07900 0.07900 0.07100 0.05500

Proportion Living in South

0.02286 0.02353 0.02324 0.05641 0.04264 0.02544 0.04973 0.06060 0.04555 0.04084 0.02610 0.02601

Mean Numbers of Children Expected

0.02400 0.02700 0.03200 0.05800 0.04600 0.02800 0.06500 0.07000 0.05600 0.05500 0.03100 0.03700

Proportion Married

0.00454 0.00365 0.00686 0.01023 0.00533 0.00570 0.00923 0.01646 0.00440 0.00884 0.00448 0.00855
Table. Standard errors for round 17, 1996
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.004 0.002 0.009 0.001 0.007 0.003 0.003 0.001

Proportion High School Dropouts

0.006 0.008 0.006 0.014 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.001 0.001 0.002 0.001 0.001 0.002 0.002 0.001 0.002 0.001 0.000

Proportion Attending College

0.003 0.003 0.005 0.006 0.005 0.004 0.008 0.009 0.005 0.007 0.004 0.005

Proportion High School Grad

0.006 0.007 0.006 0.015 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.034 0.034 0.036 0.052 0.046 0.039 0.049 0.059 0.046 0.048 0.038 0.041

Proportion Currently Married

0.007 0.010 0.010 0.016 0.013 0.008 0.020 0.021 0.018 0.017 0.011 0.011

Proportion Employed at Present

0.006 0.007 0.009 0.015 0.009 0.007 0.017 0.020 0.014 0.013 0.007 0.010

Proportion Unemployed

0.002 0.003 0.003 0.006 0.005 0.003 0.007 0.009 0.008 0.008 0.004 0.004

Proportion in Labor Force

0.005 0.005 0.008 0.013 0.008 0.006 0.015 0.018 0.012 0.012 0.006 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.003 0.002 0.004 0.001 0.001

Average Number of Children

0.023 0.027 0.030 0.054 0.035 0.028 0.067 0.065 0.040 0.050 0.033 0.036

Average Highest Grade Completed

0.060 0.074 0.063 0.109 0.065 0.073 0.137 0.119 0.074 0.081 0.091 0.077

Proportion Currently Enrolled

0.003 0.004 0.005 0.006 0.005 0.004 0.008 0.008 0.005 0.007 0.004 0.006
Table. Standard errors for round 18, 1998
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.003 0.002 0.008 0.002 0.006 0.003 0.003 0.001

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.000 0.001 0.000 0.001 0.000 0.000 0.001 0.001 0.001 0.000 0.001

Proportion Attending College

0.003 0.003 0.005 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.035 0.034 0.037 0.051 0.045 0.039 0.047 0.058 0.044 0.047 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.015 0.012 0.008 0.021 0.021 0.018 0.016 0.011 0.010

Proportion Employed at Present

0.006 0.007 0.009 0.014 0.009 0.007 0.017 0.020 0.012 0.014 0.008 0.011

Proportion Unemployed

0.002 0.003 0.003 0.005 0.005 0.002 0.007 0.008 0.007 0.007 0.003 0.003

Proportion in Labor Force

0.005 0.006 0.009 0.013 0.008 0.006 0.016 0.019 0.011 0.011 0.006 0.011

Proportion Gov't Training

0.001 0.001 0.001 0.002 0.002 0.001 0.003 0.004 0.003 0.004 0.001 0.001

Average Number of Children

0.024 0.028 0.030 0.050 0.036 0.028 0.061 0.065 0.042 0.050 0.033 0.035

Average Highest Grade Completed

0.061 0.077 0.063 0.114 0.066 0.073 0.147 0.121 0.074 0.082 0.09. 0.074

Proportion Currently Enrolled

0.003 0.003 0.004 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005
Table. Standard errors for round 19, 2000
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.003 0.003 0.001 0.006 0.001 0.005 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.001 0.000 0.001 0.002 0.002 0.000 0.000 0.000

Proportion Attending College

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.004 0.007 0.003 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.037 0.052 0.043 0.039 0.049 0.059 0.044 0.046 0.038 0.041

Proportion Currently Married

0.008 0.010 0.010 0.014 0.012 0.008 0.022 0.021 0.018 0.015 0.011 0.010

Proportion Employed at Present

0.006 0.006 0.009 0.012 0.009 0.007 0.014 0.018 0.014 0.012 0.007 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.004 0.003 0.003 0.001 0.001

Average Number of Children

0.024 0.029 0.030 0.048 0.037 0.027 0.061 0.064 0.046 0.051 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.069 0.074 0.146 0.118 0.078 0.089 0.092 0.078

Proportion Currently Enrolled

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.005 0.007 0.003 0.005

R19 table note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small subsample sizes for "Proportion government training participant" and "Proportion in high school or less" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 20, 2002
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.000 0.004 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.001 0.001 0.000

Proportion Attending College

0.002 0.003 0.004 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.036 0.053 0.042 0.039 0.050 0.060 0.043 0.045 0.039 0.041

Proportion Currently Married

0.009 0.010 0.011 0.015 0.013 0.009 0.023 0.022 0.018 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.009 0.012 0.011 0.008 0.016 0.015 0.016 0.014 0.008 0.011

Proportion Gov't Training

0.002 0.002 0.002 0.004 0.004 0.002 0.006 0.006 0.006 0.006 0.002 0.002

Average Number of Children

0.023 0.028 0.028 0.051 0.037 0.026 0.062 0.067 0.048 0.053 0.034 0.034

Average Highest Grade Completed

0.061 0.077 0.065 0.120 0.066 0.074 0.150 0.125 0.073 0.091 0.094 0.078

Proportion Currently Enrolled

0.002 0.003 0.003 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

R20 table note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small sample sizes for "Proportion government training participant" and "Proportion in high school or less: make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 21, 2004
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.001 0.003 0.002 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.012 0.010 0.009 0.006

Proportion Living in South

0.034 0.034 0.036 0.053 0.044 0.039 0.051 0.059 0.044 0.045 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.014 0.012 0.008 0.021 0.020 0.018 0.014 0.010 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.009 0.008 0.018 0.018 0.012 0.013 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.001 0.003 0.006 0.004 0.003 0.002 0.002

Average Number of Children

0.024 0.029 0.031 0.053 0.037 0.028 0.069 0.065 0.049 0.051 0.035 0.036

Average Highest Grade Completed

0.061 0.076 0.065 0.115 0.069 0.074 0.149 0.119 0.074 0.096 0.093 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

R21 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 22, 2006
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.002 0.001 0.003 0.001 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.052 0.043 0.039 0.048 0.059 0.043 0.046 0.039 0.041

Proportion Currently Married

0.009 0.010 0.012 0.014 0.012 0.009 0.022 0.018 0.016 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.010 0.008 0.020 0.017 0.014 0.015 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.002 0.004 0.004 0.005 0.002 0.002

Average Number of Children

0.023 0.029 0.030 0.055 0.037 0.027 0.069 0.068 0.048 0.052 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.067 0.074 0.145 0.126 0.072 0.096 0.093 0.078

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

R22 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 23, 2008
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.001

Proportion Attending College

0.002 0.002 0.003 0.004 0.003 0.002 0.005 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.032 0.031 0.034 0.050 0.043 0.035 0.046 0.058 0.042 0.046 0.034 0.038

Proportion Currently Married

0.009 0.010 0.011 0.015 0.012 0.008 0.022 0.020 0.017 0.015 0.011 0.012

Proportion Employed at Present

0.008 0.010 0.013 0.011 0.008 0.018 0.017 0.015 0.014 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.003 0.004 0.003 0.004 0.002 0.002

Average Number of Children

0.023 0.030 0.030 0.054 0.038 0.027 0.068 0.067 0.049 0.052 0.036 0.035

Average Highest Grade Completed

0.062 0.078 0.066 0.109 0.070 0.075 0.141 0.117 0.076 0.094 0.096 0.079

Proportion Currently Enrolled

0.002 0.002 0.003 0.004 0.004 0.002 0.006 0.006 0.005 0.007 0.002 0.004

R23 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 24, 2010
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.003

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.034 0.033 0.037 0.051 0.042 0.039 0.047 0.058 0.042 0.044 0.038 0.041

Proportion Currently Married

0.009 0.010 0.011 0.016 0.012 0.008 0.021 0.023 0.017 0.016 0.010 0.012

Proportion Employed at Present

0.008 0.009 0.011 0.014 0.011 0.009 0.019 0.020 0.017 0.014 0.011 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.002 0.004 0.005 0.004 0.004 0.002 0.002

Average Number of Children

0.024 0.030 0.030 0.057 0.037 0.027 0.072 0.068 0.049 0.053 0.036 0.035

Average Highest Grade Completed

0.062 0.079 0.064 0.112 0.072 0.075 0.140 0.125 0.077 0.098 0.096 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.004

R24 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 25, 2012
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.001 0.000 0.000

Proportion High School Dropouts

0.007 0.005 0.014 0.009 0.005 0.020 0.015 0.012 0.009 0.009 0.006

Proportion in High School or Less

NA NA NA NA NA NA NA NA NA NA NA NA

Proportion Attending College

0.002 0.003 0.003 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.006 0.020 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.055 0.043 0.039 0.055 0.064 0.044 0.046 0.039 0.041

Proportion Currently Married

0.009 0.011 0.011 0.016 0.012 0.009 0.022 0.022 0.016 0.015 0.012 0.012

Proportion Employed at Present

0.008 0.010 0.011 0.015 0.011 0.009 0.020 0.018 0.016 0.015 0.010 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.004 0.003 0.001 0.004 0.005 0.004 0.005 0.002 0.002

Average Number of Children

0.024 0.030 0.031 0.058 0.038 0.027 0.068 0.069 0.053 0.052 0.036 0.036

Average Highest Grade Completed

0.062 0.080 0.065 0.114 0.073 0.076 0.139 0.126 0.084 0.098 0.098 0.078

Proportion Currently Enrolled

0.002 0.003 0.004 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

R25 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was labeled "NA" since no NLSY79 respondent was in this category.

Table. Standard errors for round 26, 2014
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.021 0.016 0.012 0.010 0.009 0.007

Proportion Attending College

0.002 0.003 0.002 0.004 0.005 0.004 0.005 0.008 0.003 0.007 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.020 0.014 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.033 0.036 0.056 0.042 0.038 0.059 0.061 0.044 0.046 0.038 0.041

Proportion Currently Married

0.009 0.011 0.012 0.016 0.012 0.009 0.022 0.021 0.017 0.016 0.012 0.012

Proportion Employed at Present

0.009 0.011 0.011 0.014 0.010 0.010 0.021 0.019 0.015 0.013 0.012 0.013

Proportion Gov't Training

0.001 0.001 0.002 0.002 0.003 0.001 0.003 0.003 0.004 0.003 0.001 0.002

Average Number of Children

0.024 0.029 0.032 0.055 0.039 0.027 0.066 0.070 0.054 0.054 0.035 0.037

Average Highest Grade Completed

0.064 0.084 0.067 0.114 0.077 0.078 0.145 0.129 0.088 0.100 0.102 0.080

Proportion Currently Enrolled

0.002 0.002 0.004 0.005 0.004 0.003 0.005 0.008 0.003 0.007 0.003 0.004

R26 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25, the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26, the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table. Standard errors for round 27, 2016
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.018 0.018 0.012 0.009 0.008 0.006

Proportion Attending College

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.002 0.006 0.003 0.004

Proportion High School Grads

0.0046 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0337 0.033 0.036 0.058 0.041 0.038 0.061 0.063 0.042 0.045 0.038 0.040

Proportion Currently Married

0.0093 0.011 0.011 0.016 0.012 0.009 0.023 0.021 0.017 0.016 0.012 0.011

Proportion Employed at Present

0.0084 0.010 0.011 0.015 0.011 0.009 0.023 0.018 0.016 0.015 0.011 0.013

Proportion Gov't Training

0.0012 0.001 0.001 0.004 0.002 0.001 0.007 0.004 0.004 0.004 0.002 0.002

Average Number of Children

0.0239 0.031 0.031 0.059 0.039 0.028 0.070 0.073 0.054 0.051 0.036 0.037

Average Highest Grade Completed

0.0624 0.080 0.067 0.118 0.075 0.076 0.142 0.134 0.085 0.103 0.098 0.080

Proportion Currently Enrolled

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.003 0.006 0.003 0.004

R27 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table. Standard errors for round 28, 2018
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.016 0.012 0.008 0.008 0.006

Proportion Attending College

0.0010 0.001 0.002 0.002 0.002 0.001 0.004 0.003 0.000 0.004 0.002 0.002

Proportion High School Grad

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0334 0.033 0.036 0.058 0.042 0.038 0.060 0.063 0.043 0.045 0.038 0.041

Proportion Currently Married

0.0094 0.011 0.011 0.017 0.012 0.009 0.025 0.019 0.016 0.016 0.012 0.012

Proportion Employed at Present

0.0086 0.010 0.012 0.016 0.012 0.010 0.020 0.022 0.017 0.016 0.011 0.014

Proportion Gov't Training

0.0008 0.009 0.001 0.002 0.002 0.001 0.000 0.003 0.003 0.003 0.001 0.001

Average Number of Children

0.0248 0.033 0.032 0.057 0.038 0.029 0.067 0.070 0.054 0.053 0.039 0.037

Average Highest Grade Completed

0.0610 0.081 0.066 0.117 0.074 0.074 0.151 0.126 0.084 0.105 0.100 0.078

Proportion Currently Enrolled

0.0011 0.001 0.002 0.003 0.002 0.001 0.004 0.004 0.000 0.004 0.002 0.002

R28 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Table. Standard errors for round 29, 2020
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.000 0.006

Proportion Attending College

0.0007 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

Proportion High School Grad

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.009 0.006

Proportion Living in South

0.0332 0.034 0.035 0.058 0.042 0.038 0.062 0.062 0.044 0.044 0.039 0.040

Proportion Currently Married

0.0100 0.012 0.012 0.017 0.012 0.010 0.025 0.020 0.017 0.016 0.013 0.012

Proportion Employed at Present

0.0092 0.013 0.012 0.015 0.013 0.011 0.020 0.023 0.018 0.017 0.015 0.014

Proportion Gov't Training

0.0008 0.001 0.001 0.003 0.002 0.001 0.005 0.003 0.003 0.002 0.001 0.001

Average Number of Children

0.0250 0.034 0.032 0.055 0.039 0.029 0.068 0.071 0.055 0.057 0.040 0.037

Average Highest Grade Completed

0.0630 0.085 0.065 0.121 0.075 0.076 0.155 0.130 0.082 0.108 0.103 0.076

Proportion Currently Enrolled

0.0008 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

R29 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Table. Standard errors for round 30, 2022
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0047 0.007 0.005 0.014 0.008 0.005 0.018 0.015 0.012 0.009 0.009 0.005

Proportion High School Grad

0.0047 0.007 0.005 0.014 0.008 0.007 0.018 0.015 0.012 0.009 0.009 0.005

Proportion Living in South

0.0328 0.034 0.034 0.062 0.042 0.037 0.068 0.065 0.044 0.044 0.039 0.038

Proportion Currently Married

0.0100 0.012 0.011 0.016 0.013 0.010 0.025 0.020 0.018 0.017 0.013 0.011

Proportion Employed at Present

0.0078 0.012 0.010 0.017 0.012 0.009 0.026 0.022 0.018 0.015 0.014 0.012

Proportion Gov't Training

0.0009 0.002 0.001 0.002 0.002 0.001 0.002 0.003 0.003 0.003 0.002 0.001

Average Number of Children

0.0257 0.034 0.032 0.059 0.039 0.030 0.073 0.073 0.056 0.053 0.040 0.037

Average Highest Grade Completed

0.0623 0.087 0.062 0.118 0.079 0.074 0.156 0.122 0.095 0.110 0.105 0.072

R30 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round XXV the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round XXVI the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round XXVIII, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year. No educational updates were collected in round XXX, eliminating variables depicting "Proportion attending college" and "Proportion currently enrolled."

Sample Weights & Clustering Adjustments

Sample weights

In each survey year a set of sampling weights is constructed. These weights provide the researcher with an estimate of how many individuals in the United States each respondent's answers represent. Weighting decisions for the NLSY79 are guided by the following principles:

  1. individual case weights are assigned for each year in such a way as to produce group population estimates when used in tabulations
  2. the assignment of individual respondent weights involves at least three types of adjustment, with additional considerations necessary for weighting of NLSY79 Child data

The interested user should consult the NLSY79 Technical Sampling Report (Frankel, Williams, and Spencer 1983) for a step-by-step description of the adjustment process. A cursory review of the process follows.

  • Adjustment One. The first weighting adjustment involves the reciprocal of the probability of selection at the first interview. Specifically, this probability of selection is a function of the probability of selection associated with the household in which the respondent was located, as well as the subsampling (if any) applied to individuals identified in screening.
  • Adjustment Two. This process adjusts for differential response (cooperation) rates in both the screening phase and subsequent interviews. Differential cooperation rates are computed (and adjusted) on the basis of geographic location and group membership, as well as within-group subclassification.
  • Adjustment Three. This weighting adjustment attempts to correct for certain types of random variation associated with sampling as well as sample "undercoverage." These ratio estimations are used to conform the sample to independently derived population totals.

Sampling weight readjustments

Sampling weights for the main survey are readjusted to account for noninterviews each survey year. The readjustments are necessitated by differential nonresponse and use base year sample parameters for their creation, employing a procedure similar to that described above. The only exception occurs in the final stage of post-stratification. Post-stratification weights in survey rounds two and above have been recomputed on the basis of completed cases in that year's sample rather than the completed cases in the base year sample.

Custom weights

Users looking for a simple method to correct a single year's worth of raw data for the effects of over-sampling, clustering and differential base year participation should use the weights include each round on the data release. Unfortunately, while each round of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years' worth of data. The NLS has a custom weighting program which provides the ability to create a set of customized longitudinal weights. These weights improve a researcher's ability to accurately calculate summary statistics from multiple years of data.

The custom weighting program calculates its weights by first creating a new temporary list of individuals who meet all of a researcher's criteria. This list is then weighted as if the individuals had participated in a new survey round. The weights for this temporary list are the output of the custom weighting program.

There are two options for the custom weighting program on the Custom Weights for the NLSY79 page. The first option allows researchers to specify the particular rounds in which respondents participated. Researchers can also select if "The respondents are in all of the selected years" or can select if "The respondents are in any or all of the selected years." The second option allows users to input a list of respondent ids to get the appropriate weights for just that list. For example, this second option allows researcher to weight only those people who ever reported smoking cigarettes in any survey or weight only people who needed extra time to graduate from college.

Important information: Custom Weighting Program

  • If you select all survey rounds available and also pick "The respondents are in any or all of the selected years," the weights produced are identical to round 1 survey weight. This result arises because the any selection combined with all survey rounds produces a list of every person who participated in the survey.
  • The output of the custom weight program has 2 implied decimal places just like the weights found in the data release. Dividing each custom weight output value by 100 results in the number of individuals the respondent represents.

Practical usage of weights

The application of sampling weights varies depending on the type of analysis being performed. If tabulating sample characteristics for a single interview year in order to describe the population being represented (that is, compute sample means, totals, or proportions), researchers should weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by persons born in 1957 through 1964, simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not correspond to the intended result include:

Samples generated by dropping observations with item nonresponses

Often users confine their analysis to subsamples for which respondents provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don't knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes only a small proportion of the data for variables under analysis, population estimates (that is, weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort under analysis. For more information on item nonresponse in the NLSY79, see the Item Nonresponse section of this guide.

Data from multiple waves

Because the weights are specific to a single wave of the study, and because respondents occasionally miss an interview but are contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, occasionally the weights for a respondent in different years may be quite dissimilar, leaving the user uncertain as to which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. In practice, if the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used (for more information on weighting see the section on Sample Weights & Clustering Adjustments).

Regression analysis

A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may not lead to correct estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership.

Users interested in calculating the population average effect of, for example, education upon earnings, should simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, and so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts.

Users uncertain about the appropriate grouping should consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would not be properly specified.

Clustering adjustments

Researchers use NLSY79 data to estimate a variety of statistics. Since NLSY79 data come from a sample instead of data from every age appropriate individual in the U.S. the statistics produced are only estimates of the "true" national values. When researchers use a computer package to compute a statistic such as a mean or a regression coefficient, the program automatically provides a second set of statistics, such as the standard error, standard deviation, or t-statistic, which tells researchers how precisely the mean or coefficient is measured.

Details

Instead of randomly selecting individuals located anywhere in the U.S. during 1978, only a random selection of areas were selected. By randomly selecting a fixed number of small areas, interviewers reduced the amount of time they spent traveling for each interview. In this way, costs were lowered and the survey was fielded faster yielding data more quickly. Like all other national data sets that use clustering, NLSY79 data has many groups or bunches of respondents who share similar characteristics because they lived in the same neighborhood during 1978. This makes survey results appear more homogeneous, or similar, than actually found in the US.

Researchers can use two different approaches to correct this problem. The first approach uses the tables found in the NLSY79 Technical Sampling Report. For each survey round there is a table that lists the "Design Effects" or DEFT factors. These DEFTs give users a simple method for determining approximately how much they should increase their standard errors when trying to measure the precision of their estimates. Using the DEFT factors is a simple method of adjusting standard errors to account for clustering. However, when using specialized subsamples, these tables provide no guidance for users on how to adjust regression coefficients being based on calculations from only a small subset of NLSY79 variables.

The more general method is to correct for clustering by using a specialized software package. Two of the most widely used packages to adjust surveys for clustering effects are Stata, sold by the Stata Corporation and Sudaan, sold by RTI International. This section describes how to adjust for clustering using Sudaan. Sudaan is used to generate the DEFT factors found in the Technical Sampling Report.

Important information: Clustering

If you do not have access to the Geocode data set, you cannot use Sudaan or Stata to adjust for clustering. The Geocode data set can only be accessed by individuals approved by BLS. See Geographic Residence and Neighborhood Composition for information about using the restricted-use Geocode file.

Table 1. Effect of clustering correction on a mean value's standard error, 1998 data, example one

Variable

Mean Value Uncorrected Std Error Corrected Std Error

Net Worth

$128,068 $3,403 $5,826

Family Income

$55,031 $536 $1,137

BMI

26.7 0.06 0.09

Table 2 shows how adjusting for clustering affects a simple regression. Using the same 1998 data, a simple unweighted least squares equation was run with both SAS and Sudaan using net worth as the dependent variable and six independent variables. Three of these independent variables (BMI, income and age) take a wide range of values, while the remaining three variables (black, Hispanic or Latino, and female) take the value of 1 if the respondent has the particular characteristic and 0 otherwise.

The table shows that adjusting for clustering changes many of the standard errors and associated t-values. The biggest effect is seen on the income line. The uncorrected standard error increases from 0.06 to 0.19, resulting in the t-value falling from 44.37 to 13.87. Smaller changes are seen for the other variables. The intercept, age, and female standard errors all increase in size while the BMI, black, and Hispanic or Latino variables all end up with slightly smaller standard errors.

Overall, both examples show that adjusting for clustering effects is important. The next subsection shows what variables are needed to adjust for clustering. The section ends with the specific Sudaan commands used to create the tables in this chapter.

Key variables needed for clustering correction

Two variables are needed to adjust the data set for clustering. Both variables are found only on the Geocode data set and are placed there because researchers can use these variables to determine where each civilian respondent lived in 1978.

Table 2. Effect of clustering correction on a mean value's standard error, 1998 data, example two

Variable

Coefficient Estimate Uncorrected Std Error Uncorrected t Value Corrected Std Error Corrected t Value

Intercept

186,808 43,534 4.29 52,166 3.58

BMI

1,091 466 2.34 457 2.39

Income

2.63 0.06 44.37 0.19 13.87

Black

40,394 5,938 6.80 4,259 9.48

Hispanic

41,382 6,617 6.25 4,554 9.09

Age

5,285 1,086 4.87 1,252 4.22

Female

2,814 4,891 0.58 5,064 0.56

As discussed above, the NLSY79 is a multi-stage clustered sample. The clusters were created by first dividing the entire U.S. into Primary Sampling Units, or PSUs. These PSUs were defined by NORC and were composed of Standard Metropolitan Statistical Areas (SMSAs), entire counties when the counties were small, parts of counties when the counties were large, and independent cities. NORC randomly selected two different sets of PSUs for inclusion in the study, each of which by itself randomly represents the U.S. This selection of two sets of PSUs means the NLSY79 is composed of two replicates or strata. Within each is a random selection of PSUs. The replicate or strata that a respondent belongs to is found in the Geocode data set only and is labeled variable R02191.46, entitled "Within Stratum Replicate Of Primary Sampling Unit." This variable takes either the value 1 or 2, for either the first or second replicate.

The variable, containing the PSU is labeled R02191.45, and is entitled "Stratum Number For Primary Sampling Units." R02191.45 ranges in value from 1 to 120. Researchers who want to know which geographic areas correspond to particular values should look at Attachment 104 of the Geocode Codebook Supplement for the crosswalk table. Respondents with a PSU code of 52 to 70 are part of the military sample and do not have any known geographic location.

Important information: Clarification on variable labeling

The label for variable R02191.46 found in SAS and SPSS programs that is automatically produced by NLS Investigator is confusing. The label reads "PRIMARY SAMPLNG UNIT PSU SCRAMBLED 79". This variable contains the scrambled replicate, or stratum number, not the PSU. PSU information is found in R02191.45. Users should be careful when adjusting geographic variables using the clustering corrections. The complete title for variable R02191.46 is "Within Stratum Replicate Of Primary Sampling Unit (PSU) - Scrambled." Because this variable is randomly scrambled, doing clustering corrections on some geographic variables produces incorrect results. Scrambling has no effect on variables that are not geographic, such as education, income, or training.

Using the key variables In Sudaan

The specific steps used to generate the tables above are covered in this section. While the tables were produced using the Windows Version 8.0 Standalone package, the steps and commands are similar for other versions of Sudaan. To adjust summary statistics such as means or regressions with Sudaan, the researcher needs to create three files: one containing the data, one telling Sudaan how to read the data, and one containing the specific commands. Any computer package can be used to create the data file. Data can even be written directly from NLS Investigator to a file. Figure 1 has the relevant portion of the SAS program used to create the data file used in Tables 1 and 2 above.

Figure 1. SAS commands to create Sudaan data file

Data obesity;
(SAS commands that generate variables like Age, Income, and BMI are placed here)
PSU =R0219145;
REPLICATE =R0219146;
proc sort; /* Sort the data since Sudaan can not handle unsorted */
by replicate psu;
Data;
Set obesity;
file 'C:\DesignEffects\ObesitySudaanAdjustment.dbs'
put ID     5.
PSU         3.
REPLICATE   2.
WGHT       7.
BLACK      2.
HISPANIC    2.
AGE        3.
SEX        2.
INCOME      9.
BMI        4.1
NETASSET    9

Run;

One of the key things to note is that the data are sorted by the PSU and replicate variables before being written to the file. For most operations, Sudaan requires the data to be in this order before processing.

The second file is the "label" file. This file is used to read the data into Sudaan. The label file, called "ObesitySudaanAdjustment.lab," is shown in Figure 2. The label file has five parts. The first column on the left is the variable's name, followed by a letter which tells Sudaan if the variable contains numeric or character data. The third and fourth columns contain the number of bytes (characters) taken up by the variable and the number of decimal places in the number. The last column contains the label. Sudaan expects the label file to follow a precise format with columns starting and ending in very specific places.

Figure 2. Sudaan label file

ID

N 5 0

ID# (1-12686)

PSU

N 3 0

# OF PSU

REPLICAT

N 2 0

REPPLICATE SCRAMBLED

WGHT

N 7 0

SAMPLING WEIGHT

BLACK

N 2 0

T/F BLACK

HISPANIC

N 2 0

T/F HISPANIC

AGE

N 3 0

AGE OF RESPONDENT

SEX

N 2 0

MALE 0 - FEMALE 1

TOTINC

N 9 0

TOTAL INCOME

BMI

N 4 1

BODY MASS

NETASS

N 9 0

TOTAL NET WORTH

The third file is the set of commands used to run Sudaan. Many versions of Sudaan allow commands to be typed directly into the program so researchers are not forced to create command files. Figures 3 and 4 provide the Sudaan commands that were used to create Tables 1 and 2 above. Figure 3 has three sections. The top section below the "Proc Descript" command tells Sudaan where to find the raw data and what variable contains the basic survey weights. The nest command defines which variables contain the replicate and PSU information. The middle section, beginning with "Var," tells Sudaan which variables will have descriptive statistics created. The final section, beginning with "Print," specifies the types of output that are shown.

The first section of Figure 4 is similar to commands seen above in Proc Descript. The large difference is that the "weight" command has the reserved name "_ONE_" after it instead of the NLSY79 weight, "wght." Putting the "wght" variable after the weight command would cause Sudaan to run weighted least squares. By using "_ONE_" instead, Sudaan weights all variables with the same 1.0 value, resulting in Sudaan running unweighted least squares. The second part of the command, which begins with "Model," shows the exact regression to run.

Figure 3. Sudaan commands used to create summary statistics in Table 1

Proc Descript
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr mean DEFT1est_no=12686;
weight wght;
nest REPLICAT PSU / MISSUNIT;
Var NETASS BMI TOTINC BLACK HISPANIC AGE SEX;
Print nsum="Sample Size" WSUM="Population Size" Mean
semean="Std. Err." DEFFMEAN="Design Effect" / style=nchs
nsumfmt=f6.0 wsumfmt=f10.0 deffmeanfmt=f6.2 semeanfmt=f11.2;


Figure 4. Sudaan commands used to create regression values in Table 2

Proc Regress
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr DEFT1est_no=12686;
weight ONE;
nest REPLICAT PSU / MISSUNIT;
Model NETASS = BMI TOTINC BLACK HISPANIC AGE SEX;

Related Variables The 1979 Geocode data also contain the State, county, and metropolitan statistical area where the respondent lived in 1979.
Documentation Additional information can be found in Standard Errors and Design Effects section of this User's Guide, in the NLSY79 Technical Sampling Report, and in Attachment 104 of the Geocode Codebook Supplement.
Data Files Data on clustering can be found only in the NLSY79 Geocode files under the "GEOCODE" 1979 area of interest.

Types of Variables

There are six types of variables present in the NLSY79 data. Some are the raw answers provided by the respondent, while others are constructed. Types of variables include:

  1. Direct (or raw) responses from a questionnaire or other survey instrument
  2. Edited variables constructed from raw data according to consistent and detailed sets of procedures, such as occupational codes, KEY variables, and so forth
  3. Constructed variables based on responses to more than one data item, either cross-sectionally or longitudinally, and edited for consistency where necessary, such as variables on the NLSY79 Supplemental Fertility File ("Fertility and Relationship History/Created" area of interest in NLS Investigator)
  4. Constructed variables from other sources, such as the County & City Data Book information present on the NLSY79 Geocode data files
  5. Variables provided by an outside organization based on sources not directly available to the user, such as the high school survey and transcript data, scores from the Armed Services Vocational Aptitude Battery, and so forth
  6. Data collected from or about one universe of respondents reconstructed with a second universe as the unit of observation, such as variables on the NLSY79 Child File

The type of variable impacts:

  • the title or variable description naming each variable,
  • physical placement of each variable within the codebook, and
  • location of a variable within a given area of interest.

Reference numbers

Every variable in the main NLSY79 data files has been assigned a reference number or identifier that determines its relative position within the data file and NLS documentation system. Persons contacting NLS User Services should be prepared to discuss their question or problem in relationship to the reference number(s) of the variable(s) in question.

Important information: Data consistency processes

In general, the Center for Human Resource Research (CHRR) does not impute missing values or perform internal consistency checks across waves. Exceptions to this general rule occur when financial support is available, as is the case with the consistency edits performed since 1982 on the NLSY79 fertility data. When bounded interviewing methods are used, responses from the previous interview appear in the text of a question, both to verify that past information and as a point from which to update current information. Bounded interviewing techniques, using data from the Information Sheets or flap items, are intended to impose consistency across waves. Data quality checks most often occur in the process of constructing (1) cumulative and current status variables, such as 'Highest Grade Completed,' and (2) NLSY79 employment-related variables, such as 'Weeks Working in Past Calendar Year,' 'Total Tenure with Employer,' and so forth. More information on NLSY79 instruments can be found in the Survey Instruments section.

Once assigned to variables within the NLSY79 data files, reference numbers remain constant through subsequent revisions of the files. Reference numbers are assigned sequentially, with variables referring to the first survey year having a lower reference number than those variables specific to the second year and so forth.

Occasionally variables are created in a year later than that in which the data were actually collected. These variables are frequently given a reference number with a decimal value that reflects the year in which the actual data were gathered rather than the year the created variable was constructed, for example, R01461.01. Beginning with the 1993 survey, decimals are also used to indicate that more than one variable has been derived from a single question.

Important information: Reference numbers

Reference numbers in the main and Geocode data files have traditionally begun with the letter "R." Beginning with the 2000 data release, the work history variables are incorporated with the main data on the same data set. However, these work history variables are assigned reference numbers beginning with "W" for easy identification. Beginning in 2006, government program participation or recipiency variables are assigned reference numbers beginning with "G,", health module variables are assigned reference numbers beginning with "H," and all other variables are assigned reference numbers beginning with "T."

Variable descriptions or variable titles

Each variable within NLSY79 main file data files has been assigned an 80 character summary title that serves as the verbal representation of that variable throughout the documentation.

Variable titles are assigned by CHRR archivists who endeavor, within the limitations described below, to capture the core "content" of the variable and to incorporate within the title:

  1. "NLS Investigator areas of interest" that facilitate easy identification of related variables,
  2. "Universe identifiers" that specify the subset of respondents for which each variable is relevant, and
  3. "Reference periods" that indicate the specific period of time (e.g., survey year, calendar year) to which the data pertain for some variables. Universe identifiers and reference periods are discussed below.

Universe identifiers

If two ostensibly identical variables differ only in that they refer to different universes, the variable title will include a reference to the applicable universe by either appending in parentheses to each title the appropriate universe (Example 1) or by identifying the universe before the variable title (Example 2).

  • Example 1: 'Did R Have Any Job since Last Int? (Unemployed or OLF) (1994)'
  • Example 2: 'Female - Number of Children R Has Had since Last Interview'

Reference periods

Variable descriptions may include a phrase indicating the time period to which the data refer. When a date follows a verbal description of a variable and is preceded by the prepositional phrase "in 19XX," the date identifies the calendar year for which the relevant information was collected.

  • Example: 'Received Income from Child Support in 1991?' This 1992 survey question refers to child support payments received in calendar year 1991.

Important information: Verifying variable details

Do not presume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s) for the cohort. In addition, variables with similar content may have completely different titles, depending on the type of variable (raw versus created).

Variables with similar content, such as information on respondents' labor force status, may have completely different titles, depending on the type of variable (raw versus created). In addition, such variables may be located within different NLSY79 areas of interest.

  • Example 1: 'Employment Status Recode' (ESR), in 1979-98 and 2006, is the created or reconstructed version of the 'Activity Most of Survey Week' raw variable. The 'Activity' variable is derived from the first question of the full series of questions used by the Department of Labor (DOL) to obtain employment status; the title reflects questionnaire content. ESR, on the other hand, reflects the procedure used to recode the 'Activity' variable. This produces a constructed variable for all respondents based upon responses to the 'Activity' question and all other questions used by the DOL to obtain employment status. These other questions serve to qualify and refine employment status beyond the answer to the initial 'Activity' question.
  • Example 2: NLSY79 raw fertility variables appear within the various "Children," "Birth Record," or "Birth Record xxxx" areas of interest while edited and constructed versions of these variables appear within the "Fertility and Relationship History/Created" area of interest.

Finally, different archivists, for a period of more than 20 years, have performed the task of assigning variable descriptions to data. While every effort has been made to maintain consistency, users may find some differences in variable title and area of interest assignment.

New variables created by researchers

Researchers sometimes use the NLS public datasets to generate a new variable to use in their research. In some cases, researchers like to make that new variable publicly available (through their own data repository) so that it can be easily accessed for follow-up studies. This is permissible as long as researchers are using public NLS data (rather than restricted) and that they make it clear they are the author of the variable rather than the NLS team.

Survey Instruments

The primary variables found within the main data set are derived directly from survey instruments, such as questionnaires, household interview forms, and so forth. This section describes each of the NLSY79 instruments in the order that they appear in the following list.

Types of NLSY79 survey instruments and user aids

This section also explains the conventions used in the NLSY79 documentation system to identify questionnaire items from some of the primary survey instruments. An additional document, the interviewer reference manual, provides background information on specific survey instruments.

Important information: Instrument terminology

Questionnaire Item or Question Number. This generic term refers the user to the printed source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item that appears within one of the survey instruments. Each questionnaire item has been assigned a number or a combination of numbers and letters within the NLSY79 documentation system to assist the user in linking each variable to its location in a survey instrument. NLSY79 questionnaire item assignment is complex and varies across survey years and instruments. For some years, NLSY79 questionnaire item identification is dependent upon various combinations of the deck and column numbers used in data entry that are printed to the right of the answer categories on the survey instrument. In other years, designation is made by section and question numbers. Specific information on the conventions used appears below, after each relevant instrument, under the subheadings "Question Numbering."

A unique set of survey instruments has been used during each survey year to collect information from respondents. The term "survey instrument" is used to refer to:

  1. the questionnaires that serve as the primary source of information on a given respondent
  2. questionnaire supplements fielded during select survey years that contain additional sets of questions
  3. documents such as the household interview forms or household record cards that collect information on members of each respondent's household

Users should be aware that, while the source of the majority of variables in the main NLSY79 data files is the questionnaire or one of the other survey instruments, certain NLSY79 variables are created either from other NLSY79 variables or from information found in an external data source (see Types of Variables).

Household information

Each NLSY79 interview includes the collection of information on the members of each respondent's household. For NLSY79 respondents, such household data are collected prior to the administration of the main questionnaire and for many years used separate survey instruments called the Household Interview Forms. Both the instruments used for the yearly household data collection and the household screening instruments that were used to draw the samples of respondents are described below.

NLSY79 1978 Household Screener and Interviewer's Reference Manual

This document (fully titled NLSY-National Longitudinal Survey of Labor Force Behavior Interviewer's Manual-Household Screening, NORC 1978) contains detailed information on the 1978 screening of households conducted by NORC from which the civilian youth samples (the cross-sectional and supplemental samples) were drawn. It provides a copy of the short 25-question screener, question-by-question specifications for administering the form, and a sample completed screener. Most of the information collected on each respondent during the screening is presented within the data set. The screener is the source for important data such as the sex and race or ethnicity variables that were used to assign each respondent to a specific NLSY79 subsample, as well as the relationship codes (for example, brother, sister, husband, wife) that allow researchers to identify related NLSY79 respondents who shared a household at the time of the screening.

Question numbering

Question numbers for the 1978 screener were arbitrarily assigned by NORC using an artificial questionnaire section number that followed the last section of the 1979 questionnaire ("Section 25" for all screener variables) even though the actual administration of the screener preceded that of the 1979 questionnaire.

Users should note that screener questions are identified within the documentation as 1979 variables even though these data were collected during 1978. Most variables from the screener use the phrase HOUSEHOLD SCREENER at the beginning of the variable title, appear physically within the codebook after the 1979 household record series, and have been placed within the "1978 Screener" and "Household Record" areas of interest.

Household interview forms

Yearly household information for the NLSY79 is collected from either the respondent or the head of household prior to the administration of the main questionnaire. NLSY79 Household Interview Forms are used to:

  1. enumerate all persons currently living in the respondent's household
  2. record information about each person's age, highest grade completed, work experience in the past year, and relationship to the respondent
  3. collect, during the 1979-1986 surveys, certain family income information

Information on household members is collected using the questions on the Household Interview Forms; however, much of the information is actually recorded on the "Household Enumeration" section of the Face Sheet discussed below.

During the 1979-1986 interviews, different versions of the Household Interview Forms were administered depending upon the type of residence of the respondent. Version A was used if the respondent was living with his or her parents (or in-laws), in which case the interview was conducted with the respondent's parents (or in-laws) in order to gather information on household income sources. Version B was used if the respondent was living in group quarters, such as a dormitory or the military, or in temporary facilities, such as a hospital or prison, and was administered to the respondent. If the respondent had a permanent residence elsewhere, the household interview gathered information about that household. Version C was administered to the respondent if he or she was living in his or her own dwelling unit, military family housing, an orphanage, a religious institution, or other individual quarters or was the head of a family unit. Table 1 in the Household Composition section of depicts, by survey year, the universe and residential unit(s) specific to each form.

During the first eight survey rounds, many respondents were younger than 18 and living with their parents; thus, Version A was frequently used. Beginning with the 1987 survey, all respondents were 21 or older and living predominantly on their own; consequently, the household interview forms were consolidated into a single version. For 1979-1986, these forms appear as separate documents. Beginning with the 1987 interview, household interview questions were incorporated within each year's questionnaire. Some variation in administration of these forms has occurred over survey years. Users should refer to each survey year's Interviewer's Reference Manual for more information.

Interviewing aids

Certain instruments used during fielding of the NLSY79 provide researchers with interview- and respondent-specific information that appears as variables within the NLSY79 data files.

Face Sheet

Immediately prior to fielding, a Face Sheet is computer-generated for each respondent and forwarded to the interviewer assigned to that case. The Face Sheet contains:

  1. various items of respondent-specific information (name, address, phone number)
  2. information about each member of the household or family unit as of the last interview (full name, sex, relationship to youth, education, and whether the household member worked during the year), generated from the most recent administration of the Household Interview Forms
  3. a historical overview of previous interview rounds (whether the respondent refused to be interviewed, the respondent was interviewed after initially refusing, the interview was complete or incomplete, and so forth)
  4. for the 1980-1986 survey years, information on the version of the Household Interview Form that was used in the previous interview

This information is used to alert the interviewer and field manager to potential problems, assist them in preparing a successful location and fielding strategy, and provide details necessary to conduct an efficient interview, such as a listing of previous employers. Information about the respondent's household and family unit from each survey year's Face Sheet can be found by searching the "Household Record" area of interest with NLS Investigator. Sample Face Sheets for most survey years can be found in the various Interviewer Reference Manuals.

Information Sheet

This document contains data on the respondent from the previous interview that will be referred to and used to update information during the interviewing process. Items found on this document include marital status, high school completion status, university last attended, names of previous employers, training program enrollment, and pregnancy status. This information enables the interviewer to accurately route the respondent through the relevant sections of the questionnaire and provides on-the-spot reconciliation of earlier errors. Information Sheet items appear within the NLSY79 data set ("Last Interview Information" area of interest in NLS Investigator). Beginning with the 1993 interviews, the information sheet is incorporated into the CAPI instrument. Sample Information Sheets can be found in the Interviewer Reference Manuals. In CAPI surveys, information sheet data are stored electronically on the interviewer's laptop and accessed by the survey program during the interview; no paper information sheet is used.

Children's Record Forms (CRF) (1985-1992)

This interviewing aid containing information on biological (collected each survey) and nonbiological (that is, adopted or step-; collected biennially) children was used in the 1985-1992 surveys to:

  1. provide identification numbers, names, dates of birth, sex, and deceased/adopted status for each child
  2. identify special sections of the main questionnaire (such as immunization, feeding, and so forth) that needed to be administered for particular children

Sample Children's Record Forms can be found in the Interviewer's Reference Manuals. Beginning with the 1993 interviews, this form is incorporated into the CAPI instrument. As with information sheets, these data are automatically accessed by the survey program during CAPI interviews, so the hard copy CRF is no longer needed.

Questionnaires

There are separate and distinctly different questionnaires for each survey year of the NLSY79. Each questionnaire is organized around a set of topical subjects, the titles of which usually appear on either the first page of each section of the questionnaire or as a header.

Important information: Questionnaire use

The questionnaires are critical elements of the NLSY79 documentation system and should be used by each researcher to ascertain the wording of questions, coding categories, and the universe of respondents asked to respond to a given question.

NLSY79 questionnaires record:

  1. interview dates
  2. responses to the topical survey questions (see discussion below)
  3. locating information which will assist NORC in finding the respondent for the next interview
  4. interviewer remarks on such topics as the race and sex of respondent, language in which the interview was conducted, interviewer's impressions, and so forth

Show cards

These are interviewing aids used in conjunction with the questionnaire and list the possible response categories for selected questions. Show cards help the respondent keep the more complicated response categories in mind.

NLSY79 questionnaires explore the following core topics:

  • current labor force status
  • jobs and employers
  • work experience and attitudes
  • training
  • assets and income
  • family background
  • marital history
  • fertility
  • regular schooling
  • military service
  • health

Additional sets of questions have been fielded during select survey years on such topics as:

  • childcare
  • alcohol use
  • drug use
  • job search methods
  • educational/occupational aspirations
  • school discipline
  • pre-and post-natal health behaviors
  • delinquency
  • childhood residences

During the 1979-1992 paper-and-pencil (PAPI) interviews, questionnaires and other survey instruments were preprinted paper products used during fielding. With the advent of computer-assisted interviewing (CAPI) in 1993, the "questionnaire" became a series of visual screens that not only told the interviewers what questions to ask but provided helpful instructions on how to administer the interview. Separate supplemental documents such as the job-specific Employer Supplements were integrated into the electronic main questionnaire. NLSY79 CAPI questionnaires incorporate some helpful elements of the traditional codebook, with reference numbers assigned to variables and greater specificity on coding and universes provided within each codeblock.

Question numbering

The conventions used to assign question numbers within the NLSY79 documentation system vary by survey year and are based on various combinations of the questionnaire section number, the question number, or the deck and column numbers (Table 1). Users can locate a variable within the codebook--which represents each question fielded in the same order as it appears within the questionnaire--by finding the question number which appears (in parentheses) to the right of each reference number.

Table 1. NLSY79 question numbering conventions
Survey Year Designated By Example
1979 Section # (S) and Question # (Q) S02Q01: Question 1 in Section 2
1980-1982 Section # (S), Deck # (D), and Column # S06D1314: Question appearing in Section 6, deck 13, column 14
1983-1987,
1989-1992
Deck # and Column # Q0413: Question appearing in deck 4, column 13
1988 Section # and Question # (Q) Q5.3: Question 3 in Section 5
1993-present Section #, Question # (Q) and Loop # as applicable Q5-26.3: Question 26 in Section 5, with the appended .03 representing the third loop

Deck and column numbers are vestigial items that were used to locate the data when it was input on punch cards. The deck numbers are printed at the upper right hand corner of each page in the survey instruments and at the beginning point for each new deck for the 1980 through 1992 instruments. The column numbers are printed to the left of the response categories. If the variable contains more than one digit, the column reference is to the starting column for that variable. 

Important information: Questionnaire content

Although NLSY79 questionnaires are to some extent topically arranged, the user should be aware that the absence of a section title on a given subject does not mean that no questions on that topic were fielded during that survey year. For example, the 1987 and 1989 NLSY79 questionnaires contain no section entitled "Childcare." However, a small number of childcare questions were asked in those years and appear within the "Fertility" section of the questionnaires.

Questionnaire supplements

Separate instruments called "supplements" have been used since the onset of the NLSY79 to administer distinct sets of questions. The NLSY79 has made extensive use of supplements for collecting information from separate universes such as schools or children or for administering confidential sets of questions on illegal activities or abortion. The following section describes each supplemental instrument used for the NLSY79. The use of such separate supplements has diminished with CAPI-administered interviews. In the main youth and young adult instruments, all supplements are now incorporated as electronic modules in a questionnaire. Children still use multiple supplements, one self-report, one interviewer-administered, and one completed by the mother.

Illegal Activities Form J (1980)

This confidential questionnaire supplement, administered during the 1980 survey, contains a series of questions designed to collect information on the extent of respondents' participation in various delinquent and criminal activities such as:

  • skipping school
  • alcohol/marijuana use
  • vandalism
  • shoplifting
  • drug dealing
  • and robbery

This series supplements those on reported contacts with the criminal justice system collected within the main questionnaire.

Employer Supplement

Information about each employer for whom a NLSY79 respondent has worked since the last interview has been collected since 1980. One Employer Supplement is administered for each employer and contains questions about gaps when the respondent was not working, the number of hours worked, the type of work done, and the wages earned at that job. Note: Comparable information for the 1979 survey can be found in the "On Jobs" section of the main questionnaire and within the separate single sheet 1979 Employer Flap. Beginning with the 1993 CAPI interviews, all employer supplement questions appear within the body of the main questionnaire.

Question numbering

Five numbering systems have been used to identify questionnaire items within the Employer Supplement (Table 2). Although data from up to 10 jobs are collected, the main data set includes information on only the first five jobs since few individuals work at more than five jobs between interviews. Data on all ten jobs are used to construct a series of summary variables for hours and weeks worked; see the Labor Force Status, Time & Tenure with Employers, and Work Experience sections for more information.

Table 2. Employer Supplement question numbering conventions: 1980-present
Survey Years Question Numbering Description
1980-1987
1989-1991
A supplement identifier, i.e., the letter B, representing the first supplement, through F, the fifth supplement, is combined with the deck and column numbers preprinted in the instrument. The deck numbers for the first Employer Supplement would be B1, B2, B3, and B4 while the second supplement would use C with each deck and column number. The question number QB140 thus refers to B (the first supplement), 1 (deck 1), 40 (column 40), while QC166 refers to Employer Supplement C, deck 1, column 66.
1988 Letter designations, i.e., ESB, ESC, ESD, ESE, ESF, continue to identify the specific supplement in use; however, deck and column numbers are not used. Appended to the supplement identifier is the actual question number as printed in the supplement. For example, ESB.1 refers to the first supplement, question 1.
1992 A series of supplemental deck numbers are attached to the column numbers preprinted in the supplement. Question numbers 7439-7831 refer to information collected in the first supplement, 7939-8331 to the second supplement, 8439-8831 to the third supplement, 8939-9331 to the fourth supplement, and 9439-9831 to the fifth supplement.
1993-1996 The designation QES and a number, e.g., QES5, indicates that this series of questions collected information about the fifth employer. Hyphenated numbers attached to the QES5, e.g., QES5-26, QES5-27, etc. indicate the specific question number within the series, while a decimal number following a question number, QES5-26.3, reflects the third repetition of that question for that employer.
1998-present Beginning in 1998, the number identifying the employer was moved to a decimal after the question number. The question previously labeled QES5-26.3, for example, was now designated as QES-26.05.03. The decimal number ".05" indicates this information was collected about the fifth employer. Again, ".03" represents the third repetition of question 26 for the fifth employer.

Fertility Supplement (1983)

Respondents (both male and female) who were not interviewed during 1982 were administered a special set of supplementary fertility questions during the 1983 survey. The Fertility Supplement was designed to collect complete fertility data, including all live births for males and females, and all pregnancy losses and contraception between pregnancies for females. For those not interviewed in 1982, these questions replaced the fertility questions found in Section 10 of the 1983 questionnaire.

Confidential Abortion Forms

Biennially beginning in 1984, female NLSY79 respondents have completed a short confidential abortion form which elicited information on the number and dates of each abortion. Copies of these supplementary questions are provided within the survey instrument sets. The 1984 form also collected information on the dates that respondents left school prior to 1979 if leaving school was associated with early childbearing. Beginning in 2002, the abortion form was included in the main instrument. 

Drug Use Supplement (1988, 1992, 1994, and 1998)

The 1988 supplement contains the confidential set of drug use questions which were, through a random assignment process, self-administered by the respondent in half of the cases and administered by the interviewer in the other half. Questions were asked on age at first use of marijuana and cocaine, extent of lifetime and most recent use, and method(s) practiced in using cocaine. The 1992 and 1994 supplements contain the confidential set of questions on respondents' use of cigarettes, alcohol, marijuana, cocaine, or other drugs. Users should note that while the 1988 and 1992 supplements are bound as separate booklets, the 1994 and 1998 supplements are bound with the main questionnaire.

Childhood Residence Calendar (1988)

The 1988 questionnaire contained a special section detailing the living arrangements of respondents from birth through age 18. The Childhood Residence Calendar, the interviewing aid used to collect these data, depicts for each year of life the type of parent (biological-, adoptive-, or step-) with whom each respondent lived for at least four months and, for those ages when he or she was not living with a parent, in what other arrangements the respondent resided, such as, with grandparents, foster parents, friends, or in a children's home, detention center, or other institution.

Supplemental data collections

High School Survey (1980)

A supplemental survey of the last secondary school attended by civilian NLSY79 respondents was conducted in 1980. This survey gathered information on each school's grading system, course offerings, dropout rate, student body composition, and faculty characteristics, as well as respondent scores from a variety of intelligence and aptitude tests. Copies of the high school survey instruments, the "School Questionnaire" and the "Student's School Record Information" form, are included within the documentation item called the NLSY High School Transcript Survey: Overview and Documentation

Transcript Surveys (1980-1983)

Transcript information on up to 64 courses was collected from high school records for civilian NLSY79 respondents who were expected to complete high school within the United States. A copy of the instrument used to collect transcript information, called the "Transcript Coding Sheet," is included within the NLSY High School Transcript Survey: Overview and Documentation (see School & Transcript Surveys Documentation in the NLSY79 Codebook Supplement).

ASVAB

The Armed Services Vocational Aptitude Battery (ASVAB) was administered to most NLSY79 respondents in 1980 as part of a Department of Defense effort to renorm this military enlistment test. The scores from this supplemental data collection are included in the NLSY79 data file. For details, see the Aptitude, Achievement & Intelligence Scores section.

Interviewer's Reference Manual (Question-by-Question [Q by Q] specifications)

Each questionnaire or set of survey instruments is accompanied by an Interviewer's Reference Manual. This document provides NORC interviewers with background information on the NLSY79 and detailed question-by-question instructions for administering and coding the questionnaire, Employer Supplement, Household Interview Forms, and other survey supplements. Separate Q by Q's exist for each survey year. Printed copies of the CAPI help screen information, which each interviewer could access during the course of the interview, replace the traditional interviewer's manual instrument beginning with the 1993 release.

Environmental Variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • Scroll right to view additional table columns or click the link at the bottom of the table to open in a new window.

Table 1. NLSY79: Residence variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Region of residence * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current residence urban or rural * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current residence in metropolitan statistical area * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Changes in residence since January 1, 1978, or date of last interview (collected as a history) * *   *                             * * * * * * * * * * * *

Human Capital and Other Socioeconomic Variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • Scroll right to view additional table columns or click the link at the bottom of the table to open in a new window.

Table 1. NLSY79: Early formative influences and parental status variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Nationality and birthplace *       *                                                  
Birth date *   *                                                      
Ethnic self-identification (revised 2002) *                                     *                    
Year foreign-born R entered the United States         *             *                                    
Month and year R entered the United States to live for at least 6 months *                     *                                    
Immigration or visa status                       *                                    
Religious affiliation, frequency of attendance *     *                             *                      
Periods lived away from parents (birth to age 18) *                 *                                        
Non-English language spoken when R was a child *                                                          
Were magazines, newspapers, or library cards available in home when R was age 14 *                                                          
Person(s) R lived with at age 14 *                     *                                    
Occupations of primary adults when R was 14 *                                                          
Birthplace of parents: State or country *                                                          
Highest grade completed by father and mother *                                                          
Employment status of father and mother in past year * *                                                        
Are R's parents living * *                               * * * * * * * * * * * * *
R's biological parents---life status, health, cause of death (40+/50+/60+ health modules)                                   * * * * * * * * * * * * *

Table 2. NLSY79: Education variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Current enrollment status, date of last enrollment * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  
Highest grade completed * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  
Reason stopped attending school * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  
Highest degree and date received                   * * * * * * * * * * * * * * * * * * * *  
Is or was school public or private *                                                          
High school curriculum * * * * * * *                                              
Comparison of high school courses to skills training                             *                              
College degree received * * * * * *       * * * * * * * * * * * * * * * * * * * *  
Type of college attending (2- or 4-year) * * * * * * * *   * * *   * * * * * * * * * * * * * * * *  
Field of study or specialization in college * * * * * * * *   * * *   * * * * * * * * * * * * * * *    
College tuition *                                                          
Educational loans or financial aid in college * * * * * * * *   * * *   * * * * * * * * * * * * * * *    
Attitude toward selected aspects of high school *                                                          
Courses taken during last year of high school *                                                          
Ever suspended or expelled from school; date   *                                                        

Table 3. NLSY79: Vocational training outside regular school variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Type(s) of training * * * * * * * *   * * * * * * * * * * * * * * * * * * * * *
Number of weeks, hours per week in training * * * * * * * *   * * * * * * * * * * * * * * * * * * * * *
Was training completed * * * * * * * *   * * * * * * * * * * * * * * * * * * * * *
Was degree, certificate, or journeyman's card obtained * *                                                        
Was training related to specific job or employer       * * * * *   * * * * * * * * * * * * * * * * * * * * *
Was training related to a promotion                       * * * * *                            
Reason for training       * * *             * * * * * * * * * * * * * * * *    
Method of financing training       * * *       * * * * * * * * * * * * * * * * * * * * *
Informal job learning activities (questions vary)                             * * * * * * * * * * * * * * * *

Table 4. NLSY79: Government jobs and training programs variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Participation in programs * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type of program * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Satisfaction with program * * * * * * * * *                                          
Did program help on subsequent jobs * * * * * * * * *   * * * * * * * *                        
Services provided by program * * * * * * * * *                                          
Length of participation in program * * * * * * * *   * * * * * * * * * * * * * * * * * * * * *
Hours per week and per day spent in program * * * * * * * *   * * * * * * * * * * * * * * * * * * * * *
Amount of income from participating in program * * * * * * * *                                            
Aspects liked most and least about programs *                                                          
Reasons for entering and leaving programs * * * * * * * * *                                          

Table 5. NLSY79: Health variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Does health limit work, duration of limitation * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type of health problem (ICD-9 code) * * * *                                                    
Work-related injury or illness (ICD-9 code)                   * * *   * * * * * *                      
Height     * *     *                             * * * * * * *    
Weight     * *     * *   * * * * * * * * * * * * * * * * * * * * *
Health insurance coverage: R, spouse, children                     * *   * * * * * * * * * * * * * * * * *
Frequency and intensity of R's physical activity                                   * * * * * * * * * * * * *
R's general health behaviors                                       * * * * * * * * * * *
General perception of health (40+/50+/60+ health modules)                                   * * * * * * * * * * * * *
Does health interfere with daily activities (40+/50+/60+ health modules)                                   * * * * * * * * * * * * *
Emotional health in past 4 weeks (40+/50+/60+ health modules)                                   * * * * * * * * * * * * *
CES-Depression Scale                           *   *   * * * * * * * * * * * * *

Loneliness

                                                          *
R's various health problems (heart problems, cancer, diabetes, poor eyesight or hearing, and so forth) (40+/50+/60+ health modules)                                   * * * * * * * * * * * * *
Time spent on healthcare activities (40+/50+ health modules)                                   * * * * * * * * * *      
Diagnosed with asthma (40+/50+ modules)                                   * * * * * * * * * *      
Diagnosed with Alzheimer's/dementia (60+ health module)                                                       * * *
Satisfaction With Life Scale/SWLS (60+ health module)                                                       * * *
General Anxiety Disorder/GAD scale (60+ health module)                                                       * * *
Brief Resilience Scale/BRS (60+health module)                                                       *
Note E.1
   
Cognition                                           * * * * * * * *  
National Death Index data                                                         *  

Note E.1: In 2018, four items from the Brief Resilience Scale (BRS) were included in the 60+ Health module. The BRS questions were discontinued partway through the 2020 survey round, but a portion of the 2020 BRS data collected is available in the public release.

All spouse items also refer to partners beginning in 1994.

Table 6. NLSY79: Marital and spouse characteristics variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Dating behaviors and attitudes (unmarried females)                   *       *   * * * * * * * * * * *        
Marital status * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Changes in marital status since 1/1/1978 or previous interview; number and duration of marriages * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Month, year R and partner began living together                       * * * * * * * * * * * * * * * * * * *
Did R and spouse live together continuously before marriage (or R and partner continuously until now)                       * * * * * * * * * * * * * * * * * * *
Changes in cohabitation with partner since last interview                                       * * * * * * * * * * *
Occupation of spouse * * * * * * * * * * * * * * * * * * * * * * * * * * * *    
Race of Spouse                                             * * * * * * * *
Extent spouse worked in previous calendar year * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current labor force status, reason not employed for spouse                   * * * * * * * * * * * * * * * * * * * * *
Shift worked by spouse       *           * * * * * * * * * * * * * * * * *        
Rate of pay, hourly rate of pay of spouse                       * * * * * * * * * * * * * * * * *    
Spouse/partner's religious affiliation and attendance       *                             * * * * * * * * * * * *
Number of spouse's marriages, details       *                           * * * * * * * * * * * * *
Effect of spouse's health on R's work       *                                                    
Quality of R's relationship (14 items) (mothers in 1988; females all other years)                   *       *   * * * * * * * * * * * *      
Age at which R expects to marry *                                                          

Table 7. NLSY79: Household and children variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Relationship of household or family members to R * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Household or family members' demographics (sex, age, highest grade completed, work status in past year) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Number of dependents or exemptions * * * * * * * * * * * * * * * * * * * * * * * * * * * *    
Number and ages of R's children living in household * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Expected number of children *     * * * * *   *   *   *   * * * * * * * * * *          
Number of children R considers ideal *     *                                                    
Healthcare during pregnancy (females)         * * * *   *   *   *   * * * * * * * * * * * * *    
Postnatal infant healthcare and feeding (females)         * * * *   *   *   *   * * * * * * * * * * * * * * *
Father's relationship with children (males)                                   * *                      
Fertility history * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Use of birth control methods       *   * * *   *   *   *   * * * * * * * * * * * * * * *
Pregnancies not resulting in live births (includes how ended through 1990)       * * * * *   *   *   *   * * * * * * * * * * * *      
Characteristics of children with asthma                                         * * * * * *        

Asked of female respondents only in even years after 1986.

Table 8. NLSY79: Childcare variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Current childcare arrangements       * * * * * * *                                        
Childcare during first 3 years of life               *   *       *   * * * * * * * * * * *        
Cost per week       *     * *   *                                        
Number of hours per week       * * * * *   *                                        
Is childcare a hindrance to R's work, school, or training       * * *       * *                                      
Extent of various neighborhood problems                           *   * * * *                      

All spouse items also refer to partners beginning in 1994.

Table 9. NLSY79: Financial characteristics and program participation variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Total family income in previous calendar year * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Income of R and spouse in previous calendar year from: Farm or own business * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Income of R and spouse in previous calendar year from: Wages or salary * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Income of R and spouse in previous calendar year from: Business or Professional Practice Investment or Ownership                                         *   *   *   * * * *
Unemployment compensation * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Public assistance * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Food Stamps * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Targeted cash or noncash benefits                                   * *                      
Pensions/Social Security * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Military service * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Veterans' benefits, workers' compensation, other disability (collected separately beginning in 2002) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Other sources * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
R receives government rent subsidy or public housing * * * * * * * * * * * * * * * * * * * * * * *   *   *   *  
Income from child support * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Child support expected vs. received                             * * * * *                      
Rights to estate or trust; income from inheritances (since last interview)                                     * * * * * * * * * * * *
R claimed Earned Income Tax Credit (EITC) on previous tax return, amount                                     * * * * * * * *        
Possession of various assets (R and spouse)   * * * * * * * * * * *   * * * * * *   *   *   *   *   *  
Asset market value (R and spouse)             * * * * * *   * * * * * *   *   *   *   *   *  
Amount of debt             * * * * * *   * * * * * *   *   * * * * * * *  
Amount spent on food, other than Food Stamps                       * * * * *           *                
Effect of 1996 welfare reform on R (shorter in 2000)                                   * *         *            
R receives targeted benefits from public assistance program (gas vouchers, childcare, and so forth)                                     *                      
R ever declare bankruptcy                                         *   * * * * *   *  
Home foreclosure                                               * * * * * * *
R has a will                                                 * * * *   *
Financial literacy                                                 * * * *    
Educational expenditures                                                   *        
Effects of Coronavirus outbreak on earnings                                                         * *
R and spouse receive Coronavirus stimulus check                                                         * *

Table 10. NLSY79: Military service variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Branch of Armed Forces * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Months spent in Armed Forces * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Military occupation(s) * * * * * * *                                              
ROTC or officer training *                                                          
Reserve or guard activities * * * * * * *                                              
Pay grade and income * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type and amount of military training * * * * * * *                                              
Does R use military skills on civilian job * * * * * * * * * * * * * * * * * * * * * * * * * * * *    
Does R do same kind of work in recent civilian job as in military job       * * * * * * * * * * * * * * * * * * * * * * * * * * *
Formal education received while in service * * * * * * *                                              
Family members who have served on active duty         *                                                  
Participation in Veteran's Educational Assistance Program (VEAP) (after 1985, with GI bill) * * * * * * * * * * * * * * * * * * * * * * * * * * *      
Attitude toward military service * * * * * * *                                              
Future military plans * * * * * * *                                              
Reason for entering and leaving military   * * * * * *                                              
Contact with military recruiters * * * * * * *                                              
Type of discharge   * *                                                      
Enlistment or reenlistment bonuses received * * * * * * *                                              
Civilian job offer at time of discharge   * * * * * *                                              
Return to same employer after active duty   * * * * * *                                              

Table 11. NLSY79: Educational and occupational aspirations and expectations variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Would R like more education or training; type *                                                          
How much education desired and actually attained *   * *                                                    
Kind of work R would like to be doing at age 35 *     *                                                    
Expectation of achieving occupational goal *     *                                                    

Table 12. NLSY79: Attitudes variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Knowledge of World of Work score *                                                          
Would R work if had enough money to live on *                                                          
Characteristics of job R is willing to take (R unemployed or out of labor force) * * * * * * * *                                            
Reaction to hypothetical job offers *                                                          
Internal-External Locus of Control Scale (Rotter) *                                                 * * *    
Mastery Scale (Pearlin)                           *                                
Attitude toward women working *     *         *                       *                  
Self-Esteem Scale (Rosenberg) (10 items)   *             *                         *                
CES-Depression Scale                           *   *   * * * * * * * * * * * * *
Person having most influence on R, his or her responses to various situations *                                                          
Retirement expectations                                         * * * * * * * * * *
R risk aversion questions                                               * * *        
Ten-Item Personality Inventory (TIPI)                                                   * * *    
Life satisfaction                                                   * * * * *
Computer and internet access                                     * * * * * * * *     * *

Table 13. NLSY79: Retrospective evaluation of labor market experience variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Perception of age, race, and sex discrimination *     *                                                    
Reason for problems in obtaining employment *     *                                                    

Table 14. NLSY79: Delinquency, drugs, and alcohol use variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Activities within last year (20 items)   *                                                        
Income from illegal activities within last year   *                                                        
Alcohol consumption in last week or month       * * * *     * *     *   *       *   * * * * *   * * *
Extent of cigarette use           *               *   *   *         * * * *   * * *
Age R first smoked and stopped smoking cigarettes           *
(first smoked only)
              *   *   *                        
Extent of marijuana use   *       *       *       *   *   *                        
Age R first used marijuana           *       *       *   *   *                        
Extent of cocaine use, age R first used           *       *       *   *   *                        
Extent of "crack" cocaine use, age R first used                           *   *   *                        
Ever used sedatives, barbiturates, and so forth           *               *   *   *                        
Cigarette and alcohol use during pregnancy         * * * *   *   *   *   * * * * * * * * * * * * *    
Marijuana and cocaine use during pregnancy                   *   *   *   * * * * * * * * * * * * *    

Table 15. NLSY79: Reported police contacts variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Number of times stopped by police   *                                                        
Number of times booked or arrested   *                                                        
Number of convictions, charges   *                                                        
Number of times incarcerated; date of release   *                                                        

Table 16. NLSY79: Time use variables by year

Variable

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Use of time at various activities (school, work, watching TV, household chores, and so forth)     *                                                      
Volunteerism/Philanthropy                                           *   * * *        

Confidentiality & Informed Consent

The NLS program has established set procedures for ensuring respondent confidentiality and obtaining informed consent. These procedures comply with Federal law and the policies and guidelines of the U.S. Office of Management and Budget (OMB) and the U.S. Bureau of Labor Statistics:

OMB procedures and federal laws

OMB procedures

The Office of Management and Budget (OMB) is responsible for setting overall statistical policy among Federal agencies. For example, OMB has established standards on collecting information about race and ethnicity, industry, occupation, and geographic location. OMB also has established standards on the manner and timing of data releases for such principal economic indicators as the gross domestic product, the national unemployment rate, and the Consumer Price Index. In addition, OMB sets standards on whether and how much respondents to Federal surveys can be paid for their participation, an issue of particular concern in the NLS program.

Another of OMB's responsibilities is to review the procedures and questionnaires that Federal agencies use in collecting information from 10 or more respondents. Federal data collections reviewed by OMB include administrative data, such as the tax forms that the Internal Revenue Service requires individuals and corporations to complete. OMB also reviews all censuses and surveys that Federal agencies conduct, either directly or through contracts.

Surveys that are funded through Federal grants to universities and other organizations generally do not have to undergo this OMB review process unless the grantee in turn contracts with a Federal statistical agency such as the Census Bureau to collect the data. In place of OMB review, surveys funded through grants typically must undergo a competitive peer-review process established by the agency administering the grant, and that review process examines the procedures for maintaining respondent confidentiality and obtaining the informed consent of the participants. In addition, such surveys also typically are scrutinized by an institutional review board established at the grantee's institution.

OMB examines a variety of issues during these reviews, such as the:

  • amount of time (and money, if any) that the agency collecting the information estimates respondents will spend to provide the requested information
  • agency's efforts to reduce the burden on respondents of providing the information
  • purpose and necessity of the data collection, including whether it duplicates the objectives of other Federal data collections
  • ways in which the agency obtains informed consent from potential respondents to participate in the data collection
  • policies and procedures that the agency has established to ensure respondent confidentiality
  • statistical methods used to select representative samples, maximize response rates, and account for nonresponse
  • payment of money or the giving of gifts to respondents
  • questionnaire itself, including the quality of its design and whether it includes questions that respondents may regard as sensitive

These OMB reviews are very thorough. From the time an agency prepares an OMB information collection request until the time OMB approves the data collection, the process typically takes 7 months or more and includes multiple layers of review within the agency and at OMB. These reviews are helpful in improving survey quality and ensuring that agencies treat respondents properly, both in terms of providing them with information about the data collection and its uses and protecting respondent confidentiality.

The review process also provides the general public with two opportunities to submit written comments about the proposed data collection. The agency conducting the data collection publishes a notice in the Federal Register describing the data collection and inviting the public to request copies of the information collection request, questionnaires, and other materials that the agency eventually will submit to OMB. The public is invited to submit written comments to the agency sponsoring the data collection within 60 days from the time the Federal Register notice is published. In the history of the NLS program, the public very rarely has submitted comments to BLS, but when comments are received, they are summarized in the information collection request that ultimately is submitted to OMB.

After the request has been submitted to OMB, the agency sponsoring the data collection then publishes a second notice in the Federal Register and invites the public to submit comments directly to OMB within 30 days. Again, in the history of the NLS program, the public very rarely, if ever, has submitted comments to OMB. Once OMB has received the information collection request, they have 60 days to review the package, ask follow-up questions, suggest changes (or, occasionally, insist upon changes) to the survey questionnaire or procedures, and ultimately grant approval.

Respondents' advance letter

After OMB grants approval, the sponsoring agency can begin contacting potential respondents and collecting information from them. The process of contacting potential NLS respondents begins with sending them an advance letter several weeks before interviews are scheduled to begin. The advance letter serves several purposes. The obvious purpose is to inform respondents that an interviewer will be contacting them soon, but BLS and the organizations that conduct the surveys for BLS also use the letter to thank respondents for their previous participation and to encourage them to participate in the upcoming round. Another important objective of the advance letter is to remind respondents that their participation is voluntary and to tell them how much time the interview is expected to take. The letter also explains to respondents how the data will be used and how respondents' confidentiality will be protected by BLS and the organizations that conduct the surveys for BLS. An example of an advance letter, along with the confidentiality statement that appears on the back of the letter, is shown in Figure 1.

Figure 1. NLSY79 round 30 advance letter

Dear [Respondent Name],
For more than 40 years, the NLSY79 has provided vital information about the lives of ordinary Americans. Few surveys can match the NLSY79 in helping us understand who we are as a nation. And for that, we thank you.

Your continued participation in this study has impacted how our country understands important economic, educational, and labor market issues. And as you near retirement age and potentially leave the paid labor force, the NLSY79 will permit researchers to study key questions about retirement and the causes and consequences of age-related health issues.

We follow the federal laws that govern the confidentiality of survey respondents, as well as additional policies and procedures that ensure your answers are safeguarded. Please see the back of this letter for more information about privacy and confidentiality.

The average interview lasts about 69 minutes and you can schedule your appointment online as well as get extra cash with our Early Bird program! (See enclosed card for details.) To receive your gift faster, we offer electronic payment options through online or mobile banking and PayPal.

We appreciate your time and willingness to thoughtfully answer our questions. Few people have the opportunity to make such a lasting contribution. Thank You!

Sincerely,

Keenan Dworak-Fisher
Director, National Longitudinal Surveys
U.S. Bureau of Labor Statistics


WHY IS THIS STUDY IMPORTANT? Thanks to your help, policymakers and researchers will have a better understanding of the work experiences, family characteristics, health, financial status, and other important information about the lives of people in your generation. This is a voluntary study, and there are no penalties for not participating or not answering all the questions. However, missing responses make it more difficult to understand the issues that concern people in your community and across the country. Your answers represent the experiences of hundreds of other people your age. We hope we can count on your participation again this time.

WHO AUTHORIZES THIS STUDY? The sponsor of the study is the U.S. Department of Labor, Bureau of Labor Statistics. The study is authorized under Title 29, Section 2, of the United States Code. The CHRR at The Ohio State University and NORC at the University of Chicago conduct this study under a contract with the Department of Labor. The U.S. Office of Management and Budget (OMB) has approved the questionnaire and has assigned 1220-0109 as the study’s control number. This control number expires on ##/##, 20##. Without OMB approval and this number, we would not be able to conduct this study.

HOW MUCH TIME WILL THE INTERVIEW TAKE? Based on preliminary tests, we expect the average interview to take about 69 minutes. Your interview may be somewhat shorter or longer depending on your circumstances. If you have any comments regarding this study or recommendations for reducing its length, send them to the Bureau of Labor Statistics, National Longitudinal Surveys, 2 Massachusetts Avenue, N.E., Washington, DC 20212.

WHO SEES MY ANSWERS? We want to reassure you that your confidentiality is protected by law. In accordance with the Confidential Information Protection and Statistical Efficiency Act, the Privacy Act, and other applicable Federal laws, the Bureau of Labor Statistics, its employees and agents, will, to the full extent permitted by law, use the information you provide for statistical purposes only, will hold your responses in confidence, and will not disclose them in identifiable form without your informed consent. All the employees who work on the survey at the Bureau of Labor Statistics and its contractors must sign a document agreeing to protect the confidentiality of your data. In fact, only a few people have access to information about your identity because they need that information to carry out their job duties.

Some of your answers will be made available to researchers at the Bureau of Labor Statistics and other government agencies, universities, and private research organizations through publicly available data files. These publicly available files contain no personal identifiers, such as names, addresses, Social Security numbers, and places of work, and exclude any information about the states, counties, metropolitan areas, and other, more detailed geographic locations in which survey participants live, making it much more difficult to figure out the identities of participants. Some researchers are granted special access to data files that include geographic information, but only after those researchers go through a thorough application process at the Bureau of Labor Statistics. Those authorized researchers must sign a written agreement making them official agents of the Bureau of Labor Statistics and requiring them to protect the confidentiality of survey participants. Those researchers are never provided with the personal identities of participants. The National Archives and Records Administration and the General Services Administration may receive copies of survey data and materials because those agencies are responsible for storing the Nation’s historical documents.

WHERE CAN I FIND MORE INFORMATION? To learn more about the survey, visit www.bls.gov/nls. To search for articles, reports, and other research based on the National Longitudinal Surveys, visit www.nlsbibliography.org.

Institutional review boards

In addition to OMB review, the NLSY79 is reviewed and approved by an institutional review board (IRB) at the institutions that manage and conduct the surveys under contract with BLS. Those institutions are The Ohio State University and NORC at the University of Chicago. BLS and OMB do not require these reviews; rather, the reviews are required under the policies of the universities. Obtaining approval from the IRBs involves completing a form signed by the Principal Investigator, providing a summary of the research project and submitting a description of the consent procedures and forms used in the survey.  Additional documentation includes a copy of any materials used to recruit respondents, a detailed summary of the survey questionnaire, and any other information regarding the risks to humans of participating in the survey. OMB must review all data collections for the NLSY79.

The NLSY79 project staff at The Ohio State University Center for Human Resource Research (CHRR) and at NORC obtain approval from their respective IRBs prior to the start of each round of data collection. Because each survey includes only an interview and no invasive medical procedures, the IRBs typically focus on respondent compensation, consent procedures, and confidentiality protections for special populations, such as incarcerated or disabled respondents. Prisons, schools, and other institutions in which NLSY79 sample members may reside often request the IRB approval statement and application as evidence that appropriate procedures are being followed and to judge whether to permit NLSY79 interviewers to have access to individuals for whom the institutions are responsible.

Federal laws

Two Federal laws govern policies and procedures for protecting respondent confidentiality and obtaining informed consent in the NLSY79 program: the Privacy Act of 1974 and the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002.

The Privacy Act and CIPSEA

These two acts protect the confidentiality of participants in the NLSY79 and its associated Child and Young Adult surveys. CIPSEA protects the confidentiality of participants by ensuring that individuals who provide information to BLS under a pledge of confidentiality for statistical purposes will not have that information disclosed in identifiable form to anyone not authorized to have it.

In addition, CIPSEA ensures that the information respondents provide will be used only for statistical purposes. While it always has been the BLS policy to protect respondent data from disclosure through the Privacy Act and by claiming exemptions to the Freedom of Information Act, CIPSEA is important because it specifically protects data collected from respondents for statistical purposes under a pledge of confidentiality.

This law strengthens the ability of BLS to assure respondents that, when they supply information to BLS, their information will be protected. In addition, CIPSEA includes fines and penalties for any knowing and willful disclosure of specific information to unauthorized persons by any officer, employee, or agent of BLS. Since the enactment of the Trade Secrets Act and the Privacy Act, BLS officers, employees, and agents have been subject to criminal penalties for the mishandling of confidential data, and the fines and penalties under CIPSEA are consistent with those prior laws. CIPSEA now makes such fines and penalties uniform across all Federal agencies that collect data for exclusively statistical purposes under a pledge of confidentiality.

Survey interviewers are trained how to answer questions from respondents about how their privacy will be protected. Interviewers explain to potential respondents that all the employees who work on the surveys at BLS, NORC, and CHRR are required to sign a document stating that they will not disclose the identities of survey respondents to anyone who does not work on the NLS program and is therefore not legally authorized to have such information. In fact, no one at BLS has access to information about respondents' identities, and only a few staff members at NORC and CHRR who need such information to carry out their job duties have access to information about respondents' identities.

Interviewers also explain that the answers respondents provide will be made available to researchers at BLS and other government agencies, universities, and private research organizations, but only after all personal identifiers--such as names, addresses, Social Security numbers, and places of work--have been removed. In addition, the publicly available data files exclude any information about the States, counties, metropolitan statistical areas, and other, more detailed geographic locations in which respondents live, making it much more difficult to infer the identities of respondents.

Respondents are told that some researchers are granted special access to data files that include geographic information, but only after those researchers undergo a thorough application process at BLS and sign a written agreement making them official agents of BLS and requiring them to protect the confidentiality of respondents. In no case are researchers provided with information on the personal identities of respondents.

Finally, the reference in the questions and answers to the National Archives and Records Administration and the General Services Administration may be confusing to some potential respondents, because those Federal agencies are not involved in the administration of the surveys. Interviewers explain to respondents that NLS data and materials will be made available to those agencies because they are responsible for storing the Nation's historical documents. The information provided to those agencies does not include respondents' personal identities, however.

The organizations involved in the NLS program continuously monitor their security procedures and improve them when necessary. Protecting the privacy of NLS respondents entails considerable responsibilities for BLS, the organizations that conduct the surveys for BLS, and the researchers who use the data. Indeed, researchers in particular may become frustrated that they cannot obtain access to all the data that they want or that they must undergo a long review process at BLS to obtain some types of data. It is important to remember, however, that protecting respondent confidentiality must remain paramount. Any action that might jeopardize respondent confidentiality and erode the confidence of respondents could harm response rates in the NLS program and in other government or academic surveys. Thus, without the safeguards in place to protect respondent confidentiality, researchers would have far less data available to work with than they currently enjoy.

Contractors' role in maintaining respondent confidentiality

BLS, NORC, and CHRR are responsible for following the Federal requirements and maintaining their own security procedures. As mentioned earlier, all officers, employees, and agents of BLS are required to sign agreements stating that they will not disclose the identities of survey respondents to anyone who does not work on the NLS program and is therefore not legally authorized to have such information. Each contractor has in place procedures to ensure that the data are secure at each point in the survey process. (See the Data Handling section for more information.)

Survey procedures

Like all contractor staff, field interviewers are agents of BLS and are required to sign the BLS agent agreement before working on the NLSY79. All interviewers also must undergo a background check when they are hired. Confidentiality is stressed during training and enforced at all times. Field interviewers receive specific instructions in their reference manuals to remind them of the appropriate procedures when locating or interacting with respondents or contacts.

At the end of each interview, interviewers ask respondents to provide information on family members, friends, or neighbors who can be contacted if the interviewers are unable to locate the sample member in a subsequent round of interviews. The interviewers then use those contacts to help in locating sample members who have moved. When contacting a sample member's relatives, friends, or neighbors about the sample member's whereabouts, interviewers never disclose the name of the survey they are conducting. They are instructed to maintain the confidentiality of any relative, friend, or neighbor who provides information about the sample member's whereabouts.

Answering machines can pose problems when interviewers are contacting sample members because it is difficult to confirm that the interviewer is calling a sample member's correct telephone number or that other household members will not hear the message. For those reasons, interviewers are instructed not to leave messages on answering machines.

When interviewers contact the appropriate household, they ask to speak with the sample member. Interviewers introduce themselves and state the purpose of the call by saying that they are from the National Opinion Research Center at the University of Chicago and are calling concerning a national survey. The name of the survey is not disclosed to anyone but the sample member.

Special situations

The NLSY79 is a general population survey and includes a variety of sample members with special circumstances, such as incarcerated individuals, respondents in the military, other institutionalized persons, disabled persons, those with limited English proficiency, and so forth.

Incarcerated respondents

Incarcerated respondents constitute the largest group requiring special accommodations. The first challenge with incarcerated respondents is contacting them to schedule an interview. NLS interviewers must contact the prison administration to arrange for an interview, but the interviewers cannot legally reveal to the prison administration that the prisoner previously had participated in the survey without first obtaining the written, informed consent of the prisoner to reveal that information (Note: Data were incomplete for 2004 due to confidentiality concerns regarding inmates' participation in the NLSY79. A protocol was established for round 22 of the NLSY79).

The following steps are used for obtaining prisoners' consent:

  1. Prisoners are first sent a letter reminding them about their previous participation in a NORC survey, but, in case the mail is monitored by prison staff, the letter does not name the survey or BLS so as not to reveal the prisoner's participation. The letter encourages the prisoner to participate in the upcoming round of the survey. It explains that NORC staff needs to set up an interview through the prison administration but that NORC cannot tell the prison administration about the prisoner's participation without the prisoner's informed consent. he letter then asks the prisoner to request a consent form by signing and dating an enclosed form letter and mailing it to NORC in a pre-addressed, postage-paid envelope. The letter reminds the prisoner that the mail at the institution may be monitored and explains that the consent form that NORC will send the prisoner will state the prisoner's name and the name of the survey. The letter emphasizes that, by returning the enclosed form letter, prison management or staff may learn that the prisoner is a participant in the survey.
  2. If the prisoner chooses to send the form letter to NORC, NORC then sends the prisoner a cover letter and a consent form that names the specific survey. The prisoner is asked to sign the consent form and mail it to NORC in a pre-addressed, postage-paid envelope. Once NORC has received the signed consent form, NORC staff can contact the prison to request permission to interview the prisoner and learn about any restrictions that the prison administration may impose.
  3. If the prison administration permits an interview and a date and time have been scheduled for the interview, NORC mails another letter to the prisoner. This letter serves two purposes. First, it tells the prisoner when the interview will take place. Second, it informs the prisoner in writing that the interview very likely will be monitored by prison that it is important to tell the prisoner in writing.

Once all of these steps are complete, the prisoner finally can be interviewed, but the NLS program takes additional steps to minimize the risk that prisoners might reveal illegal or illicit behavior in the presence of prison staff during the course of the interview.

As described later in this chapter, such sensitive questions are asked in the self-administered portions of the NLSY79. During these portions of the survey, the typical protocol for a respondent who is not incarcerated involves the interviewer turning the laptop computer around to enable the respondent to read the questions to him or herself and enter the answers directly into the laptop computer without the interviewer knowing the responses. (In fact, the interviewer does not even know which questions the respondent answered). In some relatively low-security correctional facilities, such as some county jails and halfway houses, this protocol still would be possible. In higher security facilities, the prison administrators would not permit the prisoner to touch the computer, so the questions either would have to be read to the respondent or skipped altogether.

NLS program staff have identified the questions that could be considered even moderately sensitive or risky for the prisoner to answer out loud. Given this examination of these questions, the NLS program has adopted the following protocol for administering sensitive questions to prisoners:

  1. At the very beginning of the interview, the interviewer will indicate in the survey instrument whether a respondent is in a correctional facility of any kind and, if so, whether the facility permits the prisoner to touch the laptop and enter responses to the self-administered questions. For Federal prisons, the interviewer assumes that the prisoner is not permitted to touch the laptop.
  2. If the facility permits the prisoner to enter responses to the self-administered questions directly into the laptop, then the full set of questions, including all of the sensitive questions, would be administered.
  3. If the facility does not permit the prisoner to enter responses directly into the laptop, or if the interview is conducted over the telephone rather than in person, all survey questions will be asked orally by the interviewer, but the instrument is programmed to skip sensitive questions in which the prisoner might be asked about illegal or illicit behavior.

Military respondents

NLSY79 respondents who are in the military tend to be very cooperative and willing to participate in the surveys, but it sometimes can be difficult to locate and contact them, particularly if they are stationed outside the United States. It sometimes is necessary to seek the help of military or civilian staff in the Department of Defense to locate and contact military respondents, but NLS program staff first must obtain the military member's written, informed consent to reveal to Department of Defense staff that he or she previously had participated in the survey and is willing to be contacted to participate in future rounds of the survey.

Respondents with limited English proficiency

Some respondents lack fluency in English and are more comfortable using another language. It is not possible to accommodate all of the different languages other than English that respondents might speak, but the NLSY79 historically has made special arrangements for respondents and their parents who speak Spanish, the most commonly spoken language other than English among respondents. NORC staff members translate advance letters and other informational materials into Spanish to enable respondents and the parents of minor respondents to provide their informed consent based on information that is written in the language that they understand best. Survey questionnaires also have been translated into Spanish to ensure that the surveys are administered consistently, an alternative much preferable to having Spanish-speaking interviewers translate the English-language questionnaire during the interview. The first 20 rounds of the NLSY79 included a Spanish version of the questionnaire, but, because the number of respondents who speak only Spanish has continued to decline, it no longer is cost-effective to continue programming a computerized Spanish questionnaire. For that reason, Spanish questionnaires are not used starting with round 21 (2004) of the NLSY79. Advance letters and other informational materials still are available in Spanish, however.

Sensitive topics

The NLSY79 has included questions on income and assets, religion, relationships with parents and other family members, sexual experiences, abortion, drug and alcohol use, criminal activities, homelessness, runaway episodes, and other topics that are potentially sensitive for respondents to discuss. Respondents are advised at the start of the interview that they can choose not to answer any questions that they prefer not to answer. During training, interviewers undergo exercises to teach them how to allay the concerns of respondents about answering sensitive questions and encourage them to respond.  Interviewers are instructed not to coerce respondents into answering questions that they prefer not to answer, however.

All questions in the NLSY79 are read to the respondent by an interviewer. The respondent then provides an answer, and the interviewer records that answer on a laptop computer. For especially sensitive questions, some respondents might be reluctant to answer truthfully--or at all--if they have to tell an interviewer their answers, even though interviewers can face criminal and civil penalties if they disclose the respondents' identities or answers to anyone not authorized to receive that information.

Guidelines for emailing sample members

At the end of each interview, respondents are asked to provide information that will help interviewers contact them during subsequent rounds of the surveys. In addition to the information collected about relatives, friends, or neighbors, interviewers also obtain the email addresses of sample members who have them. During round 20 of the NLSY79 (conducted during 2002), the NLS contractors began using email as a means to contact a small number of sample members who were hard to reach by other means. The following guidelines were enacted to ensure confidentiality:

  1. The name of the survey is not contained in the subject line or text of the email message. Some respondents may share the use of an email address with other household members, so the survey name is omitted from the message to prevent other household members from learning the specific name of the survey.
  2. Email is sent from one main address. Field interviewers are not permitted to use their individual email accounts to contact respondents.

Respondents knowing respondents

One feature of the sample design in the NLSY79 is that there often are multiple respondents within the same original household, either siblings or, occasionally, spouses. It obviously is not possible in these cases to prevent family members from knowing that a relative is in the survey sample, but interviewers take steps to ensure that each respondent's answers remain private and are not revealed to other family members.

Consent from NLSY79 respondents

Respondents are able to review the confidentiality and consent information presented in the advance letter. The respondent gives verbal consent to participate at the beginning of the interview.

Data Handling

An important part of maintaining respondent confidentiality is the careful handling and storage of data. Steps taken by BLS, CHRR, and NORC to ensure the confidentiality of all respondents to the NLSY79 include maintaining secure networks, restricting access to geographic variables, and topcoding income and asset values.

Network security

The data that are stored and handled at each NLSY79 organization's site are done so with maximum security in place.  During data collection, transmission, and storage, password protection and encryption are used to secure the data. Standard protocols for network security are followed at each organization's site. Detailed information about these arrangements is not provided to the public to prevent anyone from circumventing these safeguards.

Restricting access to geographic information

Geographic information about NLSY79 respondents is available only to researchers who are designated agents of BLS. These researchers must agree in writing to adhere to the BLS confidentiality policy, and their projects must further the mission of BLS and the NLSY79 program to conduct sound, legitimate research in the social sciences. Applicants must provide a clear statement of their research methodology and objectives and explain how the geographic variables are necessary to meet those objectives. For more information about applying to use the restricted-use Geocode data is available on the BLS Restricted Data Access page.

Topcoding of income and asset variables

Another step taken to ensure the confidentiality of NLSY79 respondents who have unusually high income and asset values is to "topcode" those values in NLSY79 data sets. Values that exceed a certain level are recoded so that they do not exceed the specified level. In each survey round, income and asset variables that include high values are identified for topcoding. For example, the wage and salary income variable usually is topcoded, but variables indicating the amount received from public assistance programs are not. Notes in the codebooks for topcoded income and asset variables provide more information about the exact calculations used to topcode each variable. For more information see the NLSY79 Documentation section.

References

Center for Human Resource Research. "Technical Sampling Report Addendum: Standard Errors and Deft Factors for Rounds IV through XIV." Columbus, OH: CHRR, The Ohio State University, 1994.

Frankel, M.R.; Williams, H.A.; and Spencer, B.D. Technical Sampling Report, National Longitudinal Survey of Labor Force Behavior. Chicago: NORC, University of Chicago, 1983.

Baker, Paula C.; Mott, Frank L.; Keck, Canada K.; and Quinlan, Stephen V. NLSY79 Child Handbook: A Guide to the 1986-1990 NLSY79 Child Data. Columbus, OH:  CHRR, The Ohio State University, 1993.

NORC. NLSY-National Longitudinal Survey of Labor Force Behavior Interviewer's Manual-Household Screening. Chicago: NORC, University of Chicago, 1978.

Olsen, Randall J. "The Effects of Computer Assisted Interviewing on Data Quality." Columbus, OH: CHRR, The Ohio State University, 1991.

Subscribe to NLSY79