Skip to main content

NLSY79

NLS Investigator

NLSY79 variables (as well as the variables from other other NLS cohorts) are accessed using NLS Investigator, which is available as a Web application. The main application of NLS Investigator is to access NLS variables for the purposes of identifying, selecting, extracting, and/or running frequencies or cross-tabulations. This interface allows the researcher to connect to a database and perform variable extractions without installing any software on a local computer. Through a personal online account, a researcher's selected variable tag sets, frequencies, and extracts are available for a specified period of time from any computer location with Web access. Because there is one central data source for all users, researchers will have the assurance that they are always working with the most up-to-date data, and that any necessary corrections will be immediate and universal.

Need help with NLS Investigator?

  1. Access NLSY97 variables by connecting to NLS Investigator.
  2. Get help using NLS Investigator through the NLS Investigator User Guide.
  3. Learn how to perform efficient NLS Investigator searches with the tutorial, Variable Search in the NLS Investigator.

Item Nonresponse

This section examines and quantifies the extent of missing data, formally called item nonresponse, in the NLSY79. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.

This section focuses on the three types of missing data: refusals, invalid skips, and don't knows. Overall, the section shows that in these six rounds of the NLSY79, 20 million questions were asked. Out of all the questions asked to respondents, about 1.5 percent do not have valid answers and are missing data. Of the three missing data categories, about half the missing data are don't knows and about half are invalid skips. Given the vast majority of invalid skips occur in paper-and-pencil years, the percentage of problems attributed to this category has been steadily falling as more computer survey rounds are fielded.

Introduction

Missing data, or nonresponse, happens in a number of ways in the NLSY79. First, a number of respondents do not participate at all, causing all information in that particular survey to be missing. Participation rates and reasons for noninterview in each survey round are discussed in the section on Retention & Reasons for Noninterview.

A second reason missing data occurs is that respondents do not provide a valid answer to a question. When this happens, interviewers make a determination about whether to mark the answer as a refusal or don't know value. Users should be cautioned that the assignment of refusals and don't knows is likely to vary across interviewers. Moreover, some respondents may believe it is impolite to refuse a question and decline to answer by saying they do not know. Hence, whether a question is marked either a refusal or a don't know is somewhat arbitrary. Note: Financial questions may often elicit the "refusal" or "don't know" responses. For more information about nonresponse to financial questions, see Appendix 26: Non-Response to Financial Questions and Entry Points.

The last major way missing data can occur is when the interviewer incorrectly follows the survey instrument's flow. Incorrect flows result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked. Data archivists have removed from the data most of the extraneous question responses. While extra information can be removed, missing data is not imputed in the NLSY79. Missing data caused by this reason is flagged with a special "invalid skip" code. The number of invalid skipped drops precipitously beginning in 1993 with the introduction of CAPI. Nevertheless, invalid skips are still possible in CAPI data. If the CAPI survey contains a programming mistake, the instrument could incorrectly sequence a respondent. When these errors are found, the CAPI survey is patched in the field to prevent further invalid skips but the incorrect cases are not asked the questions again.

All missing data are clearly flagged in the NLSY79 data set. Five negative numbers are used to indicate to users that the variable does not contain useful information. The five values are (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview. These five numbers are reserved as missing value flags and, with a few exceptions (see Appendix 5: Supplemental Fertility and Relationship Variables), are rarely used in the NLSY79 for valid data values.

In the tables that follow, every attempt has been made to look at only variables in a given survey year that were filled in by either a respondent or an interviewer. The goal was to eliminate all created, machine check, date and time stamp, and variables generated in data post-processing from the analysis. Given there is no automatic way to check every question to see if it meets these criteria, the number of questions analyzed by the below tables overstates the number of questions actually filled in by the respondent or interviewer. The overstatement occurs because some questions with meaningful titles are actually hidden machine checks. While every effort was made to eliminate these questions it is impossible to eliminate all of them.

This section is not the only research on the extent of missing data in the NLS. Olsen (1992) investigated the effect of switching from PAPI to CAPI interviewing. His research shows fewer interviewer errors occur from navigating the instrument as well as fewer don't knows in the CAPI survey. More importantly, CAPI respondents appeared more willing to reveal sensitive material in the alcohol use section. Mott (1985, 1984, and 1983) examines the NLSY79's fertility data. In these reports, he examines the 1982 and 1983 surveys and finds very low refusal rates for the data in general. However, by shifting to a confidential abortion reporting method, the willingness to respond greatly increases. Mott (1998) examines the amount of missing data about the children of NLSY79 females. He finds that Hispanics or Latinos and, to a smaller extent blacks, have a much higher probability of not finishing the child assessments after starting the interview.

Additional nonresponse information

The Item Nonresponse by Section examines which sections of the NLSY79 have high nonresponse rates; the Item Nonresponse by Respondents examines how many times individuals do not respond to questions; and the Item Nonresponse within Problem Sections examines which particular questions in sections with high nonresponse rates are causing problems.

Click below to read more about each nonresponse topic.

This section examines and quantifies the extent of missing data, formally called item nonresponse, in each section of the NLSY79. The six tables below show which areas of the NLYS79 respondents are least likely to answer by tracking the total number and percentage of questions that have missing data for each group of respondents. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.

The first column of the tables contains the section names within the survey. The second column shows the total number of questions that all respondents and all interviewers should have answered in that section. This number is determined by first calculating within each section the number of questions each respondent should answer. A question is considered answerable if it does not have a valid skip (-4) or noninterview (-5) as its answer. A total for the section is obtained by summing up the answers for all NLSY79 respondents.

The third (don't know), fourth (refusal), and fifth (invalid skip) columns show the total number of nonresponses found in each section. Columns six, seven, and eight show the same information except in percentage form. The ninth column shows the total percentage of questions missed and is the sum of the previous three percentages. The last column, labeled rank, shows which sections have the most (closer to 1) and least (further from 1) amount of nonresponse.

The bottom row of each table combines the information and shows totals. For example, the bottom of the "Number Questions Asked" column in the 1979 survey shows that almost four million questions (3,975,146) were expected to be filled in by respondents or interviewers. While the 1979 survey contains many questions, other years are not far behind. In 1984, there were 3 million questions, 1989 had 1.8 million, 1994 had 3.7 million questions, 1998 had had 4.1 million questions and 2004 had 3.7 million. Readers are cautioned that each year of NLSY79 data contains far more data points since the tables exclude questions obviously labeled as machine checks, date and time stamps, and questions with valid skip or noninterview data flags.

The six tables show that the overall rate of missing data for many years dropped steadily over time. In 1979, 2.7 percent of the questions in the survey were not answered. This number drops to 1.9 percent in 1984 and then falls to 0.9 percent in 1989 and reaches a low point of 0.7 percent in 1994. After 1994 the number rises again with 0.92 percent in 1998 and 1.42 percent in 2004. Hence, nonresponse problems are of slightly less concern after the initial round of surveying.

Combining the data from all sections in all the tables shows the majority of nonresponse is caused by don't knows and invalid skips. The surveys examined asked a total of 20 million questions. Of these questions more than 140,000 or 0.7 percent were don't knows and slightly more than 127,000, or 0.6 percent were invalid skips. The last category, refusal, contains about 26,000 questions which is roughly 0.1 percent of all questions asked.

Examining the tables over time shows a steady decrease in the amount of data missing due to invalid skips. In 1979, invalid skips accounted for 2.1 percent of the questions asked. This number dropped sharply to 1.2 percent by 1984 and then down to 0.25 percent by 1989. Analysis indicated that CAPI dramatically lowered the problem of invalid skips with only 57 questions out of almost 3.7 million incorrectly skipped in 1994 and 75 questions out of 4 million in 1998.

While invalid skips fall over time, the percentage of refusals has increased slightly. Refusals accounted for 0.01 percent in 1979, 0.07 percent in 1984, 0.10 percent in 1989, 0.16 percent in 1994, 0.19 percent in 1998, and 0.20 percent in 2004. Nevertheless, while refusals steadily increase over time in absolute terms the numbers are still quite small.

While invalid skips fall and refusals are rising over time, the trend in don't knows is more complex. Don't knows accounted for 0.6 percent in 1979, 0.6 percent in 1984, 0.5 percent in 1989, 0.5 percent in 1994, 0.7 percent in 1998, and 1.1 percent in 2004. These figures suggest that don't knows are making a U-shaped pattern over time.

The last column, labeled rank, shows that missing data are not confined to a single section or area of the survey. Table 1.1 shows that in 1979 the work experience section, with 14.5 percent of the questions missing valid data, had the most problems. Fourteen percent of all questions asked in this section are labeled as invalid skips and only 0.5 percent of the questions were either refusals or don't knows. Military experience, the second most problematic section had almost half the rate of missing data (7.8 percent) as work experience. The table shows the problem of invalid skips is not related to subject matter since the section (rank 21 out of 21) with the least problems, titled "On Jobs," also focuses on labor market issues, like work experience.

While the "On Jobs" section of the survey consistently has the least problems in these surveys, the section with the most problems changes. Table 1.2, which examines the 1984 survey, shows the most problems in the "Fertility" section. Of the almost half-million questions asked in the fertility section, 5.6 percent contain missing data. While the majority of problems (3.4 percent) were due to invalid skips, a surprisingly large 2 percent of the missing responses are don't knows. The second most problematic section in the 1984 survey was "Drug Use", where 2.7 percent of the questions have missing data. Like "Fertility," the major portion of the problem is invalid skips (1.8 percent), but don't knows (0.8 percent) also account for a significant share. Interestingly, refusals account for only 0.1 percent, a relatively small proportion for a sensitive topic, suggesting that some of the don't knows were hidden refusals.

Scroll right to view additional table columns.

Table 1.1. Extent of refusals, don't knows, and invalid skips in 1979

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Family Background

660803 6196 90 12292 0.94% 0.01% 1.86% 2.81% 7

Marital Status

32995 131 25 467 0.40% 0.08% 1.42% 1.89% 14

Fertility

82141 679 23 624 0.83% 0.03% 0.76% 1.61% 17

Schooling

402134 994 14 5592 0.25% 0.00% 1.39% 1.64% 16

Pay

211504 22 0 3482 0.01% 0.00% 1.65% 1.66% 15

World of Work

220185 2220 31 2883 1.01% 0.01% 1.31% 2.33% 10

Military

145619 491 24 10885 0.34% 0.02% 7.47% 7.83% 2

CPS

396697 862 8 10969 0.22% 0.00% 2.77% 2.98% 5

On Jobs

230982 135 2 903 0.06% 0.00% 0.39% 0.45% 21

Employer Supplement

291836 2009 69 3575 0.69% 0.02% 1.23% 1.94% 13

Last Job

44504 31 0 261 0.07% 0.00% 0.59% 0.66% 20

Work Experience

67695 288 15 9476 0.43% 0.02% 14.00% 14.45% 1

Gov't Training

36728 62 28 2124 0.17% 0.08% 5.78% 6.03% 3

Other Training

103662 52 0 2936 0.05% 0.00% 2.83% 2.88% 6

Not at Work

90768 79 7 5019 0.09% 0.01% 5.53% 5.62% 4

Health

67869 358 2 545 0.53% 0.00% 0.80% 1.33% 18

Significant Others

58816 669 0 585 1.14% 0.00% 0.99% 2.13% 12

Residences

52845 94 7 1029 0.18% 0.01% 1.95% 2.14% 11

Rotter Scale

202976 1277 15 521 0.63% 0.01% 0.26% 0.89% 19

Income & Assets

321685 1667 216 6813 0.52% 0.07% 2.12% 2.70% 8

Expectations

252702 3824 20 2092 1.51% 0.01% 0.83% 2.35% 9

Total

3975146 22140 596 83073 0.56% 0.01% 2.09% 2.66% -
Table 1.2. Extent of refusals, don't knows, and invalid skips in 1984

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Calendar

88462 8 0 4 0.01% 0.00% 0.00% 0.01% 15

Marital Status

50206 273 18 561 0.54% 0.04% 1.12% 1.70% 4

Schooling

324139 1031 469 2164 0.32% 0.14% 0.67% 1.13% 9

Military

123126 337 41 1352 0.27% 0.03% 1.10% 1.41% 7

CPS

333267 467 5 4270 0.14% 0.00% 1.28% 1.42% 6

On Jobs

140382 0 0 17 0.00% 0.00% 0.01% 0.01% 16

Gaps in Jobs

120601 15 0 175 0.01% 0.00% 0.15% 0.16% 13

Gov't Training

31226 38 0 59 0.12% 0.00% 0.19% 0.31% 12

Other Training

45002 7 0 736 0.02% 0.00% 1.64% 1.65% 5

Fertility

462288 9141 891 15739 1.98% 0.19% 3.40% 5.57% 1

Child Care

114317 201 13 1157 0.18% 0.01% 1.01% 1.20% 8

Health

52866 35 3 29 0.07% 0.01% 0.05% 0.13% 14

Alcohol

314511 33 47 2234 0.01% 0.01% 0.71% 0.74% 11

Drug Use

414007 3464 300 7454 0.84% 0.07% 1.80% 2.71% 2

Income & Assets

439646 2945 241 938 0.67% 0.05% 0.21% 0.94% 10

Attitudes

13427 214 2 29 1.59% 0.01% 0.22% 1.82% 3

Total

3067473 18209 2030 36918 0.59% 0.07% 1.20% 1.86% -

Table 1.3 shows the amount of nonresponse in the 1989 survey. The most problematic section is "Income", missing data in 1.3 percent of its questions, with the CPS section a close second with 1.2 percent. Unlike earlier years, the major missing data problem in both the "Income" (1 percent) and CPS (0.8 percent) sections are don't knows, not invalid skips (0.1 percent income and 0.4 percent CPS).

Table 1.3. Extent of refusals, don't knows, and invalid skips in 1989

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Intro

14647 20 1 41 0.14% 0.01% 0.28% 0.42% 7

Marital Status

86563 372 121 450 0.43% 0.14% 0.52% 1.09% 3

Schooling

76999 179 39 217 0.23% 0.05% 0.28% 0.56% 6

Military

33579 1 1 40 0.00% 0.00% 0.12% 0.13% 10

CPS

406265 3320 52 1650 0.82% 0.01% 0.41% 1.24% 2

On Jobs

39749 0 0 1 0.00% 0.00% 0.00% 0.00% 12

Gaps in Jobs

91565 91 1 894 0.10% 0.00% 0.98% 1.08% 4

Gov't Training

49657 118 35 233 0.24% 0.07% 0.47% 0.78% 5

Fertility

152546 6 35 92 0.00% 0.02% 0.06% 0.09% 11

Health

154024 120 74 168 0.08% 0.05% 0.11% 0.24% 9

Alcohol

217441 74 400 201 0.03% 0.18% 0.09% 0.31% 8

Income

470686 4761 1124 439 1.01% 0.24% 0.09% 1.34% 1

Total

1793721 9062 1883 4426 0.51% 0.10% 0.25% 0.86% -

Table 1.4 shows that the most problematic area in the 1994 survey includes the asset questions, which are missing 2.5 percent of their answers (75 percent of those missing being don't knows). The second most problematic area includes income questions, which are missing 1.3 percent of their answers. While in the three previous surveys refusal rates were not an issue, the 1994 survey shows refusals are becoming significant. Slightly more than half a percent (0.6 percent) of the "Asset" section questions and more than one fifth of a percent (0.2 percent) of the "Income" section questions were refused.

Table 1.4. Extent of refusals, don't knows, and invalid skips in 1994

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Intro

36251 62 14 0 0.17% 0.04% 0.00% 0.21% 12

Marital Status

137540 1522 193 0 1.11% 0.14% 0.00% 1.25% 3

School

60166 302 2 0 0.50% 0.00% 0.00% 0.51% 7

Military

27372 6 1 0 0.02% 0.00% 0.00% 0.03% 15

CPS

269452 28 9 0 0.01% 0.00% 0.00% 0.01% 17

On Jobs

79567 6 7 0 0.01% 0.01% 0.00% 0.02% 16

Employer Supplement

1060679 7092 1342 8 0.67% 0.13% 0.00% 0.80% 5

Training

194147 246 29 47 0.13% 0.01% 0.02% 0.17% 13

Fertility

450871 1859 763 0 0.41% 0.17% 0.00% 0.58% 6

Child Care

26453 109 12 0 0.41% 0.05% 0.00% 0.46% 9

Relationship

81477 285 113 0 0.35% 0.14% 0.00% 0.49% 8

Health

282702 623 199 0 0.22% 0.07% 0.00% 0.29% 11

Alcohol

164663 46 61 0 0.03% 0.04% 0.00% 0.06% 14

Income

305693 3176 672 1 1.04% 0.22% 0.00% 1.26% 2

Program Participation

118305 297 63 0 0.25% 0.05% 0.00% 0.30% 10

Assets

169301 3239 930 1 1.91% 0.55% 0.00% 2.46% 1

Drugs

204621 772 1626 0 0.38% 0.79% 0.00% 1.17% 4

Total

3669260 19670 6036 57 0.54% 0.16% 0.00% 0.70% -

Table 1.5 examines the 1998 survey. Since the survey is fielded every other year in the late 1990s there is no 1999 interview, which would exactly continue the every five-year pattern. The 1998 survey is used as the closest substitute. This table, like the one for 1994, shows that the most problematic area is again the asset questions, which are missing 3.6 percent of their answers (75 percent of those missing being don't knows). The second most problematic area is the marital history questions, which added a new section that asked detailed questions about the work history and past life of the respondent's spouse. This expanded section is missing 1.8 percent of its answers. In the 1998 survey only two sections have relatively high refusal rates; assets (almost 0.6 percent) and drug use (0.79 percent).

Table 1.5. Extent of refusals, don't knows, and invalid skips in 1998

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Intro

10060 6 4 0 0.06% 0.04% 0.00% 0.10% 12

Marital Status

207805 3296 520 1 1.59% 0.25% 0.00% 1.84% 2

School

53928 197 45 0 0.37% 0.08% 0.00% 0.56% 10

Military

25691 0 0 0 0.00% 0.00% 0.00% 0.00% 15

CPS

301160 44 12 0 0.01% 0.00% 0.00% 0.02% 13

On Jobs

117144 2 0 1 0.00% 0.00% 0.00% 0.00% 14

Employer Supplement

1081493 10265 1441 1 0.95% 0.13% 0.00% 1.08% 3

Training

241013 1559 143 1 0.65% 0.06% 0.00% 0.71% 7

Fertility

578831 3180 1097 50 0.55% 0.19% 0.01% 0.75% 6

Child Care

23241 57 11 1 0.25% 0.05% 0.00% 0.30% 11

Relationship

86632 371 154 0 0.43% 0.18% 0.00% 0.61% 9

Health

350533 2460 223 0 0.70% 0.06% 0.00% 0.77% 5

Income

608849 3410 847 10 0.56% 0.14% 0.00% 0.70% 8

Assets

174570 4702 1566 10 2.69% 0.90% 0.01% 3.60% 1

Drugs

217175 419 1485 0 0.19% 0.68% 0.00% 0.88% 4

Total

4078125 29968 7548 75 0.73% 0.19% 0.00% 0.92% -

Table 1.6 examines the 2004 survey. This survey has two new sections that are not seen in the previous tables. The first section is found in the employer supplement and asks the respondent detailed questions about the pensions available from their employer and the respondent's participation in these pensions. This new section is ranked first in problems and has missing responses to 2.5% of all questions. The second new section is the over 40 health module. The goal of this section is to provide researchers with a baseline health measure that will be updated at ten year intervals. The health section is ranked 8th out of 13 sections and has a nonresponse rate slightly more than three-quarters of one percent.

Table 1.6. Extent of refusals, don't knows, and invalid skips in 2004

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped % Don't Knows % Refused % Invalid Skipped Total % Missed Rank

Intro

91277 39 16 4 0.04% 0.02% 0.00% 0.06% 12

Marital Status

77954 371 66 106 0.48% 0.08% 0.14% 0.70% 9

School

56716 554 39 4 0.98% 0.07% 0.01% 1.05% 7

Military

39772 20 5 0 0.05% 0.01% 0.00% 0.06% 13

Employer Supplement

734366 7729 1001 275 1.05% 0.15% 0.04% 1.23% 6

Pensions

189861 3753 508 485 1.98% 0.27% 0.26% 2.50% 1

Training

307708 2943 887 322 0.96% 0.29% 0.10% 1.35% 5

Fertility

521658 5801 733 1216 1.11% 0.14% 0.23% 1.49% 3

Child Care

34561 12 4 7 0.03% 0.01% 0.02% 0.07% 11

Relationship

1004 2 0 0 0.20% 0.00% 0.00% 0.20% 10

Over 40 Health

622644 4386 402 14 0.70% 0.06% 0.00% 0.77% 8

Income

412656 4382 1199 39 1.06% 0.29% 0.01% 1.36% 4

Assets

626393 12726 2634 233 2.03% 0.42% 0.04% 2.49% 2

Total

3716570 42718 7494 2705 1.15% 0.20% 0.07% 1.42% -

This section provides details on the amount of missing data associated with each respondent. Each table in this section shows the number of respondents who are missing data in one of the surveys. The tables are split into two parts. The left-hand part, columns one to four, shows the total number of questions that have missing data for each group of respondents. The right-hand part, columns five to nine, shows the percentage of questions that have missing data.

The top line of Tables 2.1.1 shows that in the 1979 survey, 12,527 respondents never refused to answer questions. While refusals are quite rare in this survey round, don't knows and incorrect skips are quite frequent. The top line shows that only 5,084 respondents had zero don't know responses and only 2,347 respondents were sent through the entire questionnaire without any sequencing errors. Subtracting these numbers from the 12,686 total respondents means that 60 percent, or 7,602 respondents, stated they did not know the answer to at least one question and 81. 5 percent, or 10,339 respondents, were incorrectly skipped somewhere in that questionnaire.

The top line of Table 2.1.2, which examines the percentage of questions missing data, shows a similar picture. Refusal rates are relatively low. There are 12,620 respondents who refused less than one percent of their questions, which means only 66 respondents refused one percent or more of the questions they were expected to answer. Thirty-five percent, or 8,185 respondents, answered don't know to less than one percent of their questions. Again, the largest group was respondents who were incorrectly skipped over questions. Only 4,313 respondents were incorrectly skipped over less than one percent of the questions, but 8,373 of the respondents were illegally skipped over one percent or more of their questions and 227 were skipped over more than 10 percent.

Refusal rates have increased steadily over time even though the more difficult respondents have presumably left the survey. Tables 2.2.1 and 2.2.2, which examine the 1984 survey, shows an increase over the 1979 refusal rates. While the number of respondents answering the survey is shrinking, the number refusing to answer questions is increasing. For example, while in 1979 only 10 respondents refused to answer more than 10 questions, in 1984 there were 41 respondents. This pattern of increase is evident in Tables 2.3.1 and 2.3.2, which examine 1989, through to Tables 2.6.1 and 2.6.2, which examine 2004. By 2004, there were 185 respondents who refused to answer more than 10 questions.

Increasing refusal rates are also seen in the percentage side of the table. In 1979, only 66 respondents refused to answer one percent or more of the questions they were asked. This increased in subsequent surveys to 320 respondents in 1984, 355 respondents in 1989, 480 respondents in 1994, 549 respondents in 1998, and 655 respondents in 2004.

"Don't know" rates have also risen over time. In the 1979 survey, 8,185 respondents had less than one percent of their questions labeled as don't knows. This number drops in 1984 to 7,003 respondents and further drops to 6,423 in 1989 and 5,942 in 1994, 4,741 in 1998 and 3,185 in 2004. While rates have risen, relatively few individuals have high levels of don't knows. In 1979, only 68 respondents didn't know the answer to more than five percent of the questions they were asked. This number falls to 19 respondents in 1984 and then rises to 66 in 1989 before falling back to 46 respondents in 1994 and then jumps back to 66 in 1998, and ends with 149 in 2004.

While don't know and refusal rates have risen, incorrect skip problems have clearly shrunk over time. In 1979, there were only 2,347 respondents who were correctly sequenced through the entire survey. In 1984, this number rises to 7,802 respondents, followed by a rise to 9,334 respondents in 1989. In 1994 and 1998 almost every respondent was correctly sequenced. Only 57 and 46 respondents were incorrectly skipped through part of the survey in each year respectively. Moreover, most of the respondents were only incorrectly skipped in a single question. In 2004 there were 349 respondents who were incorrectly skipped through one percent of their questions and 22 who were incorrectly skipped through 2 percent or more.

Nonresponse by Respondents in 1979 survey

Table 2.1.1 Number of Respondents with missing data by Number of Questions in 1979 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 12527 5084 2347
1 91 2974 1897
2 26 1723 1393
3 13 1016 1158
4 5 629 838
5 2 376 596
6 1 228 489
7 3 173 502
8 3 131 420
9 1 84 340
10 4 57 308
> 10 10 211 2398
Table 2.1.2 Number of Respondents with missing data by Percent of Questions in 1979 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 12620 8185 4313
1% 43 3247 3421
2% 7 773 1733
3% 5 264 989
4% 5 101 621
5% 0 48 397
6% 2 27 312
7% 1 18 278
8% 1 6 206
9% 0 7 118
10% 0 2 71
> 10% 2 8 227

Nonresponse by Respondents in 1984 survey

Table 2.2.1 Number of Respondents with missing data by Number of Questions in 1984 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 11222 4549 7802
1 610 3012 1289
2 73 1901 622
3 44 1136 413
4 38 668 252
5 13 345 369
6 6 177 174
7 1 108 93
8 7 63 115
9 4 38 73
10 10 28 64
> 10 41 44 803

Note: Not included in this table are 617 respondents who did not answer the survey.

Table 2.2.2 Number of Respondents with missing data by Percent of Questions in 1984 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 11749 7003 8956
1% 207 3807 1267
2% 44 944 674
3% 13 213 284
4% 15 62 133
5% 13 21 84
6% 10 11 139
7% 4 2 137
8% 5 3 107
9% 2 0 68
10% 2 3 36
> 10% 5 0 184

Note: Not included in this table are 617 respondents who did not answer the survey.

Nonresponse by Respondents in 1989 survey

Table 2.3.1 Number of Respondents with missing data by Number of Questions in 1989 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 10221 6135 9334
1 171 2517 781
2 59 1036 189
3 37 395 35
4 20 194 20
5 21 131 16
6 7 75 7
7 10 34 125
8 10 24 18
9 4 10 9
10 7 6 3
> 10 38 48 68
10% 3 8 3

Note: Not included in this table are 2,081 respondents who did not answer the survey.

Table 2.3.2 Number of Respondents with missing data by Percent of Questions in 1989 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 10250 6423 9461
1% 193 3221 843
2% 58 561 51
3% 35 219 69
4% 13 76 86
5% 10 39 24
6% 4 24 10
7% 4 17 10
8% 3 1 5
9% 3 3 9
> 10% 29 13 34

Note: Not included in this table are 2,081 respondents who did not answer the survey.

Nonresponse by Respondents in 1994 survey

Table 2.4.1 Number of Respondents with missing data by Number of Questions in 1994 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 7168 3559 8832
1 1129 1780 57
2 191 1082 0
3 87 693 0
4 41 443 0
5 28 334 0
6 29 232 0
7 22 171 0
8 21 115 0
9 17 105 0
10 18 72 0
> 10 138 303 0

Note: Not included in this table are 3,797 respondents who did not answer the survey.

Table 2.4.2 Number of Respondents with missing data by Percent of Questions in 1994 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 8409 5942 8889
1% 246 2060 0
2% 81 558 0
3% 41 165 0
4% 31 79 0
5% 20 39 0
6% 19 16 0
7% 6 15 0
8% 10 4 0
9% 9 2 0
10% 4 2 0
> 10% 13 7 0

Note: Not included in this table are 3,797 respondents who did not answer the survey.

Nonresponse by Respondents in 1998 survey

Table 2.5.1 Number of Respondents with missing data by Number of Questions in 1998 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 7248 2497 8353
1 473 1355 21
2 162 1020 23
3 83 729 0
4 60 589 2
5 42 447 0
6 35 343 0
7 26 277 0
8 19 201 0
9 23 169 0
10 12 120 0
> 10 216 652 0

Note: Not included in this table are 4,287 respondents who did not answer the survey.

Table 2.5.2 Number of Respondents with missing data by Percent of Questions in 1998 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 7850 4741 8385
1% 254 2441 13
2% 86 712 0
3% 58 283 1
4% 54 110 0
5% 27 46 0
6% 30 25 0
7% 14 11 0
8% 4 7 0
9% 8 9 0
10% 2 5 0
> 10% 12 9 0

Note: Not included in this table are 4,287 respondents who did not answer the survey.

Nonresponse by Respondents in 2004 survey

Table 2.6.1 Number of respondents with missing data by number of questions in 2004 Survey
  Number of Respondents
Number of Questions Refused Didn't Know Was Incorrectly Skipped Over
0 6531 1524 6539
1 298 993 440
2 194 755 334
3 171 624 145
4 78 592 42
5 45 486 98
6 51 387 29
7 45 360 13
8 29 314 3
9 23 235 5
10 11 178 7
> 10 185 1213 6

Note: Not included in this table are 5,025 respondents who did not answer the survey.

Table 2.6.2 Number of Respondents with missing data by Percent of Questions in 2004 Survey
  Number of Respondents
Percent of Questions Refused Didn't Know Was Incorrectly Skipped Over
0% 7006 3185 7290
1% 384 2399 349
2% 106 1122 18
3% 48 477 2
4% 40 226 1
5% 18 103 0
6% 16 68 0
7% 10 29 0
8% 8 14 0
9% 8 17 0
10% 3 6 1
> 10% 14 15 0

Note: Not included in this table are 5,025 respondents who did not answer the survey.

How much missing data are associated with particular questions? This section provides readers with an in-depth view of the questions within survey sections having a high amount of missing data. Like the previous parts, this section provides tables for each of the selected survey years. The first table (Table 3.1) examines questions from the 1979 survey's "Work Experience" section. This section has more missing data (14.5 percent) than any other 1979 survey section. The second set of tables (Tables 3.2 through 3.6) examines the most problematic section of the 1984 survey, "Fertility and Abortion." The third set of tables (Tables 3.7 and 3.8) examines the most problematic 1989 survey section, "Income and Assets." Since the 1994 "Income and Asset" section again ranked first in missing data, the next set of tables (Tables 3.9 and 3.10) substitutes the "Drug and Alcohol Use Supplements," given the high degree of research interest in understanding nonresponse in these sections. Table 3.11 highlights nonresponse in 1998 in the Marital History section. Table 3.12 tracks nonresponse problems in the over-40 health section.

To ensure the sets of tables are not overwhelming, all sections that could be naturally divided are split (Fertility, for instance). Additionally, only the most important question or questions with high rates of nonresponse are shown. Table 3.1, which examines the amount of missing data in the 1979 survey, shows the highest amount of missing data are associated with a pair of retrospective questions that asked respondents to remember what happened two years earlier. Interviewers incorrectly skipped slightly less than 1,750 respondents over R01150., weeks worked in 1977, and R01153., hours worked per week in 1977. Examining the 1979 questionnaire shows that these questions appear at the bottom of a page. Prior to these questions is a fairly complicated half page of instructions and questions that the interviewer must read, understand, and partially speak. It seems likely that many interviewers did not understand the instructions and skipped to the next page.

Table 3.1. Amount of missing data per question in the Work Experience section in 1979 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R01150.

Weeks Work in 1977

1735 11 1

R01151.

Weeks Work in 1976

418 18 1

R01152.

Weeks Work in 1975

240 11 0

R01153.

Hours/Week Work in 1977

1749 13 0

R01154.

Hours/Week Work in 1976

459 16 0

R01165.

Industry of 1st Job after School

628 4 1

R01166.

Occupation at 1st Job after School

627 3 1

R01167.

Hours/Week Work at 1st Job after School

631 6 1

R01168.

Hours/Day at 1st Job after School

632 6 1

R01169.

Rate of Pay at 1st Job after School

632 32 2

Tables 3.2-3.6, which examine the "Fertility" section, show a much lower number of invalid skips in all parts except in the abortion questions. While invalid skips do not reach the level seen in Table 3.1, on average 190 female respondents were not asked each abortion question (190 is an average from all abortion questions, not just those shown in the tables). The table also shows a number of other trends. First, respondents have higher levels of don't know answers the more precise the question being asked. For example, in Table 3.2, when males were asked the date of birth of their first child, only one did not know the year, three did not know the month and 10 did not know the day. This phenomena is most clearly seen in Table 3.5, which shows the year and month of the respondent's first sexual encounter. Only 43 respondents did not know the year, but 1,410 respondents did not know the month. This problem with dates is also seen in the abortion data where only four respondents did not know the year when they had their first abortion, but 13 did not know the month.

Refusal rates in the "Fertility" section are quite low except for a number of key questions. Asking the number of times they had sex in the last month elicited high rates of refusal for males and females. This question elicited 167 male and 135 female refusals. Interestingly, most individuals were willing to answer if they ever had sex since only 45 males and 54 females refused to answer these questions. Birth control questions did not have exceptionally high rates of refusal. Seventeen female respondents and no males refused to answer the birth control questions. Table 3.6 shows that 28 females refused to answer if they ever had an abortion and 28 more refused to state if they dropped out of school before they terminated the pregnancy.

Table 3.2. Amount of missing data per question in Male Fertility section in 1984 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13017.

Ever Had Any Children

0 3 0

R13019.

Month Birth Child#1 Born

41 3 0

R13020.

Day Birth Child #1 Born

45 10 0

R13021.

Year Birth Child#1 Born

39 1 0

R13022.

Sex of Child#1 Born

3 0 0

R13115.

Total #Children Expect to Have

12 45 3

R13117.

#Years Expect Have 1st/Next Child

22 120 0

R13118.

Had Any Children/Expecting

0 7 0

R13119.

Current Pregnancy Planned

131 0 0

R13121.

Ever Had Sexual Intercourse

12 0 45

R13122.

Age @First Sexual Intercourse

28 19 23

R13123.

#Times Sexual Intercourse Past Month

11 68 167

R13124.

Is Partner Now Pregnant

0 1 0

R13125.

Use Any Birth Control During Last Month

15 2 0

R13126.

#Times Try Prevent Pregnancy

65 0 0

R13127.-R13141.

Method of Birth Control

16 0 0

R13142.

Ever Have a Sex Education Course

10 0 12

R13148.

Month Took Sex-Ed Course

73 564 0

R13149.

Year Took Sex-Ed Course

36 58 0

R13150.

Time When Pregnancy Most Likely

19 1480 20
Table 3.3. Amount of missing data per question in Female Fertility section in 1984 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13191.

#Pregnancies

8 0 0

R13251.

Use Any Birth Control before Preg#1

18 0 1

R13254.

Want Be Pregnant before Preg#1

20 0 0

R13255.

Husband/Partner Want Preg#1

19 20 0

R13283.

Get Prenatal Care Preg#1

57 0 0

R13286.

Frequency Alcohol Use Preg#1

58 0 0

R13288.

#Cigarettes Smoked Preg#1

56 0 0

R13297.

X-Rays Taken Preg#1

57 0 0

R13302.

Sonogram Preg#1

57 6 0

R13358.

Amniocentesis Preg#1

57 0 0

R13411.

Took Vitamins Preg#1

57 0 0

R13443.

C-Section Child#1 Born

52 0 0

R13445.

Weight at Delivery, Preg#1

53 5 1

R13446.

Weight before Preg#1

51 5 1

R13449.

Length Child#1 Born at Birth

53 20 0

R13667.

Weight of Child#1 @Birth Lbs

25 6 0
Table 3.4. Amount of missing data per question in Feeding Part of Fertility section in 1984 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13670.

Child#1 Breastfed

27 0 0

R13672.

Month Age Child#1 Breast Fed Ended

27 1 0

R13674.

Month Age Child#1 Formula Fed

38 3 0

R13693.

Wk Age Child#1 Formula Fed Ended

57 0 0

R13694.

Month Age Child#1 Formula Fed Ended

57 6 0

R13696.

Months Age Child#1 - Cow's Milk

81 10 0

R13698.

Months Age Child#1 - Solid Food

86 10 0
Table 3.5. Amount of missing data per question in Child Part of Fertility section in 1984 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13791.

Age Had 1st Menstrual Period

8 14 22

R13792.

Year 1st Menstrual Period

0 7 0

R13793.

Month Had 1st Menstrual Period

17 2207 1

R13794.

R Ever Been Pregnant

0 1 0

R13795.

Ever Had Sexual Intercourse

4 0 54

R13796.

Age First Sexual Intercourse

5 26 78

R13797.

Year 1st Sexual Intercourse

0 43 66

R13798.

Month Sexual Intercourse 1st Time

19 1410 75

R13799.

#Times Sexual Intercourse Past Month

9 104 135

R13802.

#Times Try Prevent Pregnant Past Month

17 0 2
Table 3.6. Amount of missing data per question in Abortion Questions of Fertility section in 1984 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13827.

Ever Had An Abortion

135 0 28

R13828.

# of Abortions

143 0 0

R13830.

Year of 1st Reported Abortion

196 4 0

R13837.

Drop out School #1 Pregnant

155 0 28

R13839.

Year Left School 1st Time Pregnant

164 0 0

R13841.

Year Return School Time#1 after Pregnant

258 0 0

Tables 3.7 and 3.8 examine the "Income and Assets" section of the 1989 survey. While invalid skips are relatively rare in this section, refusals and don't know answers are fairly prevalent. The question with the highest amount of missing income data is R29822., which asks how much income was earned by other adults living in the household who were related to the respondent. While the previous questions showed that most respondents knew the type of income received by these family members, 958 could not come up with a specific amount. The second most problematic question with 11 invalid skips, 155 don't knows, and 113 refusals was R29714., which asked the respondent how much they earned from wages, salary, and tips.

Other questions with high numbers of don't knows are R29813., which asked about the amount of money received from other sources like interest and dividends, R29825., which asks about a partner's income, and R29827., which asks the number of exemptions used when filing a Federal tax return.

The asset table (Table 3.8) also shows invalid skips are rare but don't know and refusal rates are not. Surprisingly, one of the questions with the highest amount of missing data (315 missing answers) asks, "how much is your car worth (R29852.)?" Another question missing many observations asks the amount of the respondent's savings (R29835.). While the car worth question primarily elicits don't knows, the savings question resulted in 160 refusals. Three other questions elicited high numbers of don't knows: value of stocks and bonds (R29837.) - 219 don't knows; amount taken out of savings last year (R29842.) - 222 don't knows; and the market value of other items such as jewelry (R29854.) - 151 don't knows.

Table 3.7. Amount of missing data per question in Income section in 1989 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R29714.

Amount Rec from Wages/Salary/Tips

11 155 113

R29715.

In 1988 Receive Income from Own Business

1 0 11

R29717.

How Much Did R Receive after Expenses

6 49 23

R29732.

Amount Rec'd Per Week from Unemployment

0 5 1

R29736.

Amount Sp Rec'd 1988 from Wages

16 17 70

R29754.

How Much Did Sp Receive from Unemployment

8 12 0

R29758.

R/Spouse Rec'd Money for Child Support

1 1 10

R29759.

Amount R/Spouse Rec'd Child Support

2 14 2

R29760.

R/Spouse Rec'd AFDC Payments

0 4 9

R29774.

R/Spouse Rec'd Food Stamps

0 2 10

R29788.

R/Spouse Rec'd SSI/Public Assistance

0 4 9

R29808.

Rec'd Veteran Benefits

1 1 10

R29812.

R/Spouse Rec'd Money from Oth So

0 2 16

R29822.

Income Rec'd by Adults Related To R

7 958 8

R29825.

Total Income Rec'd before Deduct

2 200 4

R29826.

Sp File Federal Income Tax R

0 2 13

R29827.

R'S Filing Status on Federal Ret

11 8 2

R29828.

Exemptions Filed on 1988 Federal Tax

62 92 3
Table 3.8. Amount of missing data per question in Asset section in 1989 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R29831.

Amount Property Selling for on Today

5 53 10

R29832.

Amount R Owes on Property

4 85 25

R29833.

Amount Other Debt R Owes on Property

12 26 27

R29835.

Amount of Savings

7 166 160

R29837.

Current Market Value of Stocks

2 219 23

R29838.

R/Spouse Have Rights to Estate

2 3 18

R29839.

Total Value of Estate

3 90 6

R29840.

Put Money in/out of Savings

1 3 28

R29841.

How Much More Money Put in

6 110 53

R29842.

How Much More Money Take out

5 222 21

R29843.

R Have Business Investment

0 1 12

R29844.

R Have Investment in a Farm

4 0 0

R29847.

Total Market Value of Business

4 75 10

R29848.

Total Amount of Business Debt

1 55 8

R29851.

How Much Does R Owe on Vehicle

0 56 17

R29852.

Amount Vehicle Sells for Today

11 293 11

R29854.

Market Value of Other Items

5 151 25

R29856.

Total Amount R Owes

1 73 13

Table 3.9 and 3.10 examine the drug and alcohol use supplements in the 1994 survey. In these CAPI modules, there are no invalid skips. Interestingly, there are extremely low refusal and don't know rates within the "Alcohol" section (Table 3.9). The question with the highest refusals (nine respondents) asks if the individual had a drink since the 1989 interview. The typical question in the "Alcohol" section received only two refusals. Don't know rates are also low. The maximum number of don't knows at nine occurs in R49803., which asks if the respondent needs to drink more alcohol now in order to get drunk. On average, the "Alcohol" section records only 1.5 don't knows per question.

Table 3.9. Amount of missing data per question in Alcohol Use section in 1994 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R49791.

R Had Drink of Alcohol since 1989

0 3 9

R49792.

Had Alcoholic Beverage in Last 30

0 0 5

R49793.

Times Had 6/More Drinks Last

0 0 1

R49794.

How Many of Last 30 Days Drank A

0 6 2

R49795.

No. of Drinks on Avg. Day When R

0 8 3

R49803.

Need More to Get Drunk Than Before

0 9 0

R49808.

Arrested, in Police Trouble

0 0 3

R49809.

Drink More Than Before

0 4 3

These low numbers of refusals and don't knows are not seen in Table 3.10, which examines the "Drug Use" section. On average, the typical question in this supplement elicited 23 don't knows and 48 refusals. Readers should understand that this supplement was generally filled in directly by the respondent, not by the interviewer. To provide respondents with practice using a computer, the questionnaire asked them two practice questions not related to drug use. Refusal rates are even high for these two test questions, which ask how many more children the respondent expects to have and what type of entertainment, such as movies, concerts, or plays, the respondent went to last year.

The highest number of refusals (119) occurs in R50532., which asks the age the respondent first used marijuana. The second largest number of refusals occurs in a similar question, R50536., which asks the age of first cocaine use. These same questions have very high don't know responses (113 marijuana and 48 cocaine). One other question with a very high don't know rate is R50525., which asks if the respondent ever smoked cigarettes daily. Almost 80 individuals did not know the answer to this question. Given that the question wording is straightforward, it is likely a number of respondents are using don't know as a polite way of refusing to answer the question.

Table 3.10. Amount of missing data per question in Drug Use section in 1994 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R50524.

R Smoked at Least 100 Cigrtts in Life?

0 24 38

R50525.

R Ever Smoked Daily?

0 79 49

R50526.

Age When R 1st Started Smoking Daily?

0 33 12

R50531.

Total Occasion R Use Marijuana

0 33 89

R50532.

Age 1st Time Used Marijuana

0 113 119

R50533.

Most Recent Time Used Marijuana

0 35 89

R50535.

How Many Occasions Used Cocaine

0 19 86

R50536.

Age 1st Time Used Cocaine

0 48 103

R50537.

Most Recent Time Used Cocaine

0 15 78

R50539.

How Many Occasions Used Crack

0 15 77

R50540.

Age 1st Time Used Crack

0 33 82

R50541.

Most Recent Time Used Crack

0 16 74

R50553.

R Used Heroin w/o Doctor's Instr

0 9 53

The top ten questions show that a large number of respondents (ranging from 119 to 181 respondents, depending on the question) have difficulty with questions asking them about their spouse's rate and amount of pay, hours worked and weeks worked. In addition, questions which ask details about a spouse's previous marriage are also quite difficult for many respondents to answer.

Table 3.11. Amount of missing data per question in Marital History section in 1998 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R58067.

Rate of Pay for Spouse Main Job (Time Unit)

0 181 49

R58204.

Age of Spouse at 1st Marriage

0 213 2

R58125.

Spouse's Weekly Earnings at Main Job

0 159 29

R58068.

Spouse Receive Overtime at Main Job

0 151 26

R58127.

Estimate Spouse's Weekly Earning Main Job

0 149 26

R58178.

House Spouse Works Per Week Usually

0 170 1

R58177.

Number of Weeks Worked by Spouse in Last Year

0 140 24

R58179.

Number Weeks Not Working by Spouse Last Year

0 130 24

R58176.

Spouse Hourly Rate of Pay

0 119 28

R58208.

Duration of Spouse's Previous Marriage?

0 109 16

Table 3.12 examines the top questions with missing data problems from the health section in 2004. In this table, reference numbers starting with "R" are for questions asked of all respondents in the survey, while reference numbers starting with "H" represent questions in the "over 40 health module." This module was designed to provide researchers with more information about the health of the respondent when they turned 40 years old and is asked of respondents in the first interview after they turn 40.

While other data from the survey show that many people know if they are covered by health insurance, Table 3.12 reveals that many do not know details about this coverage. For example, one question with a large number of don't knows is R83036., which asks if the respondent's health insurance plan is an HMO, a preferred provider plan (PPO) or a network of affiliated doctors. This question had 428 missing responses out of 6,175 total responses (a 7% missing response rate). Other questions with high don't know rates ask if the respondent's children are covered by health insurance. The health question with the highest refusal rate asks the respondent how much they weigh, with 114 people refusing to divulge the number. Finally, in the 40+ health module a number of NLSY79 respondents have difficulty answering questions about the health and life status of their biological father. This is not surprising given a small but significant number of respondents stated in the past that they have never met their biological father.

Table 3.12. Amount of missing data per question in Health section in 2004 Survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R83036.

Primary Insurance Plan HMO, Network, PPO

0 426 2

R83037.

Is Primary Plan a PPO?

0 388 2

R83070.

Children Have Health/Hospitalization Plan?

0 328 15

R83038.

R's Primary Plan Need Authorization?

0 301 0

H00015.

Date Most Recent General Physical Exam

0 189 0

R82983.

How Much Does R Weigh?

0 50 114

H00014.

Ever Had A General Physical Exam?

0 147 2

H00017.

Cause Of Biological Dads Death

0 133 10

H00019.

Bio Dad Have Major Health Problems?

0 134 8

R82982.

Since What Date R Had This Health Limit

0 120 0

R82992.

Length Light Moderate Activities 10 Min

0 105 5

H00047.

Date Hypertension Diagnosed

0 91 0

H00016.

Is R's Biological Dad Living?

0 83 4

R82989.

Frequency of Light Mod Exercise 10 > Min

0 75 6

H00018.

Age Of Biological Dad At Death

0 68 1

H02445.

Date Most Recent Visit to Health Professional

0 52 11

H00012.

R Ever Visit Health Care Professional?

0 58 0

R83042.

Spouse Have Health/Hospital Plan

0 32 24

R83048.

Spouse Employer Pay All Health Plan Cost?

0 49 2

Note: Reference numbers that begin with the letter H are variables that are combined from different years of the over-40 health module. Researchers wanting to see the results from just the 2004 survey should use variable H00002.00, which is titled "Source Year for 40+ Health Module Data." Use this variable to select just those cases which answered the questions in 2004.

References

Mott, Frank L. "Patterning of Child Assessment Completion Rates in the NLSY: 1986-1996." CHRR, The Ohio State University, 1998.

Mott, Frank L. "Evaluation of Fertility Data and Preliminary Analytical Results from the 1983 (5th round) Survey of the National Longitudinal Survey of Work Experience of Youth." CHRR, The Ohio State University, 1985.

Mott, Frank L. "The Patterning of Female Teenage Sexual Behaviors and Attitudes." CHRR, The Ohio State University, 1994.

Mott, Frank L. "Fertility-Related Data in the 1982 National Longitudinal Surveys of Work Experience of Youth: An Evaluation of Data Quality and Some Preliminary Analytical Results." CHRR, The Ohio State University, 1983.

Olsen, Randall J. "The Effects of Computer Assisted Interviewing on Data Quality." CHRR, The Ohio State University, 1992.

Interviewer Remarks

Each NLSY79 questionnaire includes an interviewer remarks section that interviewers complete after finishing the interview with the respondent. Some of the information is objective (the presence of another person during an in-person survey, for instance) while other information is subjective on the part of the interviewer (such as rating how cooperative the respondent was).

Special circumstances. All survey rounds feature a series of questions about special circumstances that might have affected the quality of the data. The interviewers were asked to assess whether the respondent was hard of hearing, unable to see well, unable to read, lacking in basic social skills, mentally handicapped or retarded, physically handicapped, ill/injured, had a poor command of English.

Respondent's general demeanor and responsiveness. In all survey rounds, interviewers rated how informative and cooperative a respondent was during the interview. In addition, the interviews assessed the respondent's overall understanding (good, fair, poor) of the questions.

Presence of others during interview. All survey rounds include information about whether others were present (listening and/or participating) during in-person interviews and who the person or persons were (infant child, family member, etc.). Interviewers attempt to secure a private environment for all interviews, so the presence of another individual (other than a small child) is an exception and can be considered a disruption to the interview. 

Interviewer characteristics. Interviewers provide information on their own ethnicity, age, gender, highest grade completed, and how much experience (measured in years) they had as an interviewer.

Interview methodology. Interviewers record whether any portion of the interview took place on the phone and indicate if the interview was in Spanish or English.

Interviewer retention. Interviewers indicate each survey round whether they had interviewed that respondent the previous survey year.

Standard Errors & Design Effects

This section contains information on standard errors and design effects for the NLSY79 sample, briefly discussing how to use these two statistical factors. It then includes tables for the first round and for 1996 through 2020. Users interested in the intervening years should review the Technical Sampling Report and Technical Sampling Report Addendum.

Standard errors have been explicitly computed for a number of statistics based upon the entire NLSY79 sample (total, civilian, and military) and a number of sex or race subclasses. Standard errors for other statistics (defined over the entire sample or the subclasses) may be approximated with use of the DEFT factors given in the linked tables. Users who examine the tables will note that CHRR has calculated standard errors for different variables over time. The R program that computed the Standard Errors and Design Effects for survey year 2020 can be accessed in the document NLSY79 Design Effects R Program.docx.

Approximate standard errors: Percentages

The following formula approximates a standard error of a percentage:

se(P) approximately equal to DEFT times √P(100-P) divided by √n

where
se(P) = the approximate standard error for the percentage of P
P = the sample percentage (ranging from 0 to 100)
n = the actual unweighted sample size for the demographic subclass from which the percentage was developed
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the percentage was developed

For example, for 1996 the appropriate DEFT factor for estimating a standard error of the percentage of Hispanic or Latino males who were high school dropouts is 1.17744 (see proportion column, row seven of Table 2. Deft factors for round 17, 1996). Assuming the calculated sample (P) equals 22.19 percent and the unweighted sample estimate size is 946, then:

se(P) approximately equal to 1.17744 times √22.19(100-22.19) divided by √946

To approximate the standard error of the corresponding projected population total (NP/100), calculate:

se(NP divided by 100) approximately equal to N[se(P) divided by 100]

where
se(NP/100) = the approximate standard error of the projected population total corresponding to a percentage P within a particular demographic subclass and sample type
N = the appropriate projected total population base for the particular demographic subclass and sample type

For example, if the projected total population base for Hispanic or Latino males is 1,030,861, the projected number of civilian Hispanic or Latino male high school dropouts is equal to NP/100 or 1,030,861 * 22.19/100 = 228,748. Thus, the approximate standard error for the total number of Hispanic or Latino male high school dropouts is:

se(NP divided by 100) approximately equal to 1,030,861 times (1.5907 divided by 100) which is approximately 16,397.9

Note: 1.5907 came from the previous calculation.

Approximate standard errors: Means

One can compute approximate standard errors for means as follows:

se(X) approximately equal to DEFT times √(s squared divided by n)

where
se(X) = the approximate standard error of the mean
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the mean was developed
S2 = the weighted element variance computed for the demographic subclass and sample type from which the mean was developed
n = the unweighted sample size for the particular mean

For example, for 1979 the DEFT factor for all Hispanics or Latinos is 1.45699 (see means column, row four of Table 1. Deft factors for round 1, 1979). To approximate the standard error of the mean number of years of education completed by this subclass, where the weighted element variance is .72955 and the sample size is 77, compute:

se(X) approximately equal to 1.45699 times √(.72955 divided by 77) which is approximately .1418

Design effects

Because the samples are multi-stage, stratified random samples instead of simple random samples, respondents tend to come in geographic clusters and clusters of persons tend to be alike in a variety of ways for a variety of reasons. (For more information on the sampling and screening process, users are referred to section on Sample Design & Screening Process in this guide.) For example, there may be cultural differences by locality or ecological differences in labor market conditions. Depending upon the degree of this homogeneity, the conventionally computed standard deviations for the variables, which assume a simple random sample, may be too small. However, by controlling the rate at which particular strata are sampled, multi-stage, stratified random samples can improve upon simple random samples. The ratio of the correct standard error to the standard error computed under the assumption of a simple random sample is known as the design effect. The technical sampling report for the NLSY79 (Frankel, Williams, and Spencer 1983) and its addendum (CHRR) provide design effects for the various strata.

A single design effect that can be broadly applied to regression analysis cannot be constructed. To illustrate the approximate size of design effects in regression analysis, a regression of rate of pay for the CPS job in 1979 was estimated using race, sex, marital status, and education as explanatory variables. Assuming each of the roughly 200 PSUs has the same number of respondents in the sample of 5,724 persons with observed wages, the design effect was calculated to be 1.52; that is, the true standard errors were larger than the naively computed standard errors by a factor of 1.52. When this exercise was repeated for rate of pay on the CPS job in 1986, the design effect had fallen to 1.37.

This reduction reflects the fact that mobility tends to mix the respondents more uniformly through the country, reducing the clustering of the sample. Many of the persons who started out in the same PSU will have moved to different areas and, hence, no longer share unobservable labor market conditions. These shared unobservable labor market conditions are likely responsible for the spatial correlation of the error terms which generate design effects. Thus, another advantage of longitudinal data is the lessening of design effects over time.

By examining the Geocode data for the NLSY79, it is possible to control for some of the environmental factors generating design effects or, if desired, to compute design effects based upon county or metropolitan area clusters which continue to be present. To facilitate study of design effects, scrambled PSU codes from the 1979 survey are available to persons with authorized access to the NLSY79 Geocode data.

The Technical Sampling Report and Technical Sampling Report Addendum also provide information on design effects.

Click below to view the DEFT and standard errors tables.

Table 1. Deft factors for round 1, 1979

Demographic Group

Proportions Means

All Youth

1.72547 1.71282

Males

1.46605 1.56808

Females

1.58029 1.49720

Hispanics or Latinos

1.44342 1.45699

Blacks

1.35303 1.43730

Non-black/non-Hispanics

1.58686 1.56996

Hispanic or Latino Males

1.24321 1.22329

Hispanic or Latino Females

1.40353 1.25095

Black Males

1.19457 1.21378

Black Females

1.24877 1.25243

Non-black/non-Hispanic Males

1.33775 1.45962

Non-black/non-Hispanic Females

1.46889 1.37581
Table 2. Deft factors for round 17, 1996

Demographic Group

Proportions Means

All Youth

1.35848 1.967232

Males

1.28523 1.667333

Females

1.24536 1.621727

Hispanics or Latinos

1.28275 1.584298

Blacks

1.19735 1.423025

Non-black/non-Hispanics

1.19087 1.713184

Hispanic or Latino Males

1.17744 1.407125

Hispanic or Latino Females

1.13217 1.264911

Black Males

1.16541 1.174734

Black Females

1.13258 1.319091

Non-black/non-Hispanic Males

1.13217 1.456022

Non-black/non-Hispanic Females

1.09545 1.405347
Table 3. Deft factors for round 18, 1998

Demographic Group

Proportions Means

All Youth

1.38301 1.96469

Males

1.30836 1.66433

Females

1.28311 1.60000

Hispanics or Latinos

1.21917 1.52807

Blacks

1.19164 1.40890

Non-black/non-Hispanics

1.17937 1.67481

Hispanic or Latino Males

1.19248 1.37659

Hispanic or Latino Females

1.13418 1.25100

Black Males

1.14336 1.12694

Black Females

1.12088 1.31529

Non-black/non-Hispanic Males

1.18195 1.43353

Non-black/non-Hispanic Females

1.11028 1.37133
Table 4. Deft factors for round 19, 2000

Demographic Group

Proportions Means

All Youth

1.36423 1.90919

Males

1.26007 1.61864

Females

1.21244 1.58588

Hispanics or Latinos

1.24544 1.48492

Blacks

1.19954 1.42127

Non-black/non-Hispanics

1.20052 1.62327

Hispanic or Latino Males

1.19722 1.31909

Hispanic or Latino Females

1.09240 1.22474

Black Males

1.20277 1.18322

Black Females

1.08282 1.34907

Non-black/non-Hispanic Males

1.12750 1.39462

Non-black/non-Hispanic Females

1.13908 1.34907
Table 5. Deft factors for round 20, 2002

Demographic Group

Proportions Means

All Youth

1.34578 1.82757

Males

1.29701 1.58430

Females

1.18181 1.52807

Hispanics or Latinos

1.24097 1.47986

Blacks

1.20692 1.35647

Non-black/non-Hispanics

1.15085 1.56844

Hispanic or Latino Males

1.12450 1.28841

Hispanic or Latino Females

1.09479 1.21861

Black Males

1.20830 1.12694

Black Females

1.18743 1.33604

Non-black/non-Hispanic Males

1.20468 1.37659

Non-black/non-Hispanic Females

1.06829 1.30958

User note: Tables 6-14

Users are cautioned that the figures in the proportion column for the last six categories are becoming much less relevant over time. The proportion DEFT column is based on education, training, marriage, and employment variables. Over time categories, such as black females, have only a few respondents in school or training, which causes the DEFT factors to change from survey to survey. Broader categories, like "All Youth," "Males," and "Females" are more accurate to use.

Table 6. Deft factors for round 21, 2004

Demographic Group

Proportions Means

All Youth

1.38789 1.83712

Males

1.27377 1.55563

Females

1.23592 1.55081

Hispanics or Latinos

1.30336 1.46969

Blacks

1.14782 1.35831

Non-black/non-Hispanics

1.18163 1.57003

Hispanic or Latino Males

1.27083 1.31149

Hispanic or Latino Females

1.12750 1.19164

Black Males

1.14455 1.10454

Black Females

1.02896 1.37113

Non-black/non-Hispanic Males

1.09373 1.35647

Non-black/non-Hispanic Females

1.08224 1.32098
Table 7. Deft factors for round 22, 2006

Demographic Group

Proportions Means

All Youth

1.35881 1.81246

Males

1.23472 1.55563

Females

1.25553 1.52315

Hispanics or Latinos

1.13710 1.48661

Blacks

1.15994 1.33041

Non-black/non-Hispanics

1.14455 1.53460

Hispanic or Latino Males

1.15195 1.31719

Hispanic or Latino Females

1.00995 1.23085

Black Males

1.15247 1.09772

Black Females

1.11221 1.35647

Non-black/non-Hispanic Males

1.09636 1.32288

Non-black/non-Hispanic Females

1.08082 1.30192
Table 8. Deft factors for round 23, 2008

Demographic Group

Proportions Means

All Youth

1.31106 1.83712

Males

1.25599 1.60468

Females

1.22474 1.52315

Hispanics or Latinos

1.13235 1.43353

Blacks

1.16726 1.38203

Non-black/non-Hispanics

1.10855 1.56365

Hispanic or Latino Males

1.14837 1.27083

Hispanic or Latino Females

1.03870 1.18322

Black Males

1.14182 1.12916

Black Females

1.11467 1.34907

Non-black/non-Hispanic Males

1.09030 1.38564

Non-black/non-Hispanic Females

1.09829 1.28841
Table 9. Deft factors for round 24, 2010

Demographic Group

Proportions Means

All Youth

1.34024 1.80278

Males

1.26293 1.58745

Females

1.23288 1.48829

Hispanics or Latinos

1.19284 1.46116

Blacks

1.21295 1.36015

Non-black/non-Hispanics

1.12639 1.54434

Hispanic or Latino Males

1.19284 1.28452

Hispanic or Latino Females

1.11867 1.20208

Black Males

1.16458 1.10905

Black Females

1.13137 1.34907

Non-black/non-Hispanic Males

1.07877 1.37659

Non-black/non-Hispanic Females

1.03983 1.26886
Table 10. Deft factors for round 25, 2012

Demographic Group

Proportions Means

All Youth

1.34604 1.77682

Males

1.26681 1.55921

Females

1.24255 1.48757

Hispanics or Latinos

1.21171 1.46095

Blacks

1.19992 1.35592

Non-black/non-Hispanics

1.17951 1.52438

Hispanic or Latino Males

1.16338 1.24213

Hispanic or Latino Females

1.05880 1.20750

Black Males

1.11229 1.16998

Black Females

1.15019 1.32479

Non-black/non-Hispanic Males

1.14991 1.36160

Non-black/non-Hispanic Females

1.12411 1.25952
Table 11. Deft factors for round 26, 2014

Demographic Group

Proportions Means

All Youth

1.33370 1.77496

Males

1.25238 1.56764

Females

1.19779 1.50041

Hispanics or Latinos

1.15607 1.41956

Blacks

1.13520 1.38628

Non-black/non-Hispanics

1.18624 1.50758

Hispanic or Latino Males

1.15649 1.25180

Hispanic or Latino Females

1.06414 1.20324

Black Males

1.12620 1.19193

Black Females

1.00051 1.34394

Non-black/non-Hispanic Males

1.15447 1.35138

Non-black/non-Hispanic Females

1.18466 1.26346
Table 12. Deft factors for round 27, 2016

Demographic Group

Proportions Means

All Youth

1.40369 1.73651

Males

1.36746 1.53267

Females

1.23931 1.47176

Hispanics or Latinos

1.28005 1.44627

Blacks

1.10852 1.34987

Non-black/non-Hispanics

1.26546 1.47732

Hispanic or Latino Males

1.19194 1.22472

Hispanic or Latino Females

1.16081 1.23085

Black Males

1.10918 1.15997

Black Females

1.04381 1.30468

Non-black/non-Hispanic Males

1.21767 1.32061

Non-black/non-Hispanic Females

1.17469 1.24867
Table 13. Deft factors for round 28, 2018

Demographic Group

Proportions Means

All Youth

1.36769 1.72280

Males

1.29963 1.57090

Females

1.18347 1.46229

Hispanics or Latinos

1.23085 1.43839

Blacks

1.06561 1.30877

Non-black/non-Hispanics

1.21787 1.46098

Hispanic or Latino Males

1.12575 1.25443

Hispanic or Latino Females

1.10262 1.19304

Black Males

1.05849 1.15098

Black Females

0.97723 1.31684

Non-black/non-Hispanic Males

1.12186 1.35481

Non-black/non-Hispanic Females

1.11219 1.22446
Table 14. Deft factors for round 29, 2020

Demographic Group

Proportions Means

All Youth

1.36387 1.72145

Males

1.35466 1.56630

Females

1.12285 1.12285

Hispanics or Latinos

1.15142 1.15142

Blacks

1.05324 1.28861

Non-black/non-Hispanics

1.22780 1.45744

Hispanic or Latino Males

1.00312 1.22750

Hispanic or Latino Females

1.02489 1.21003

Black Males

0.95852 1.09251

Black Females

0.96780 1.34382

Non-black/non-Hispanic Males

1.16393 1.36001

Non-black/non-Hispanic Females

1.06213
1.19797

Scroll right to view additional table columns.

Table 15. Standard errors for round 1, 1979
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.00471 0.00627 0.00545 0.01385 0.00835 0.00527 0.01744 0.01814 0.01232 0.00928 0.00710 0.00619

Proportion Attending High School

0.00735 0.00893 0.01006 0.01554 0.01151 0.00904 0.02176 0.02146 0.01460 0.01628 0.01085 0.01233

Proportion Attending College

0.00597 0.00729 0.00778 0.01037 0.00784 0.00710 0.01230 0.01460 0.00919 0.01119 0.00862 0.00947

Proportion High School Grad

0.00658 0.00776 0.00905 0.01277 0.01033 0.00785 0.01440 0.01957 0.01217 0.01448 0.00926 0.01094

Mean Years of School Completed

0.02900 0.04000 0.03800 0.08200 0.05700 0.03400 0.10000 0.10500 0.06100 0.07400 0.04600 0.04400

Mean Years of School Expected

0.04600 0.05900 0.04700 0.10800 0.06400 0.05500 0.12500 0.11700 0.07900 0.07900 0.07100 0.05500

Proportion Living in South

0.02286 0.02353 0.02324 0.05641 0.04264 0.02544 0.04973 0.06060 0.04555 0.04084 0.02610 0.02601

Mean Numbers of Children Expected

0.02400 0.02700 0.03200 0.05800 0.04600 0.02800 0.06500 0.07000 0.05600 0.05500 0.03100 0.03700

Proportion Married

0.00454 0.00365 0.00686 0.01023 0.00533 0.00570 0.00923 0.01646 0.00440 0.00884 0.00448 0.00855
Table 16. Standard errors for round 17, 1996
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.004 0.002 0.009 0.001 0.007 0.003 0.003 0.001

Proportion High School Dropouts

0.006 0.008 0.006 0.014 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.001 0.001 0.002 0.001 0.001 0.002 0.002 0.001 0.002 0.001 0.000

Proportion Attending College

0.003 0.003 0.005 0.006 0.005 0.004 0.008 0.009 0.005 0.007 0.004 0.005

Proportion High School Grad

0.006 0.007 0.006 0.015 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.034 0.034 0.036 0.052 0.046 0.039 0.049 0.059 0.046 0.048 0.038 0.041

Proportion Currently Married

0.007 0.010 0.010 0.016 0.013 0.008 0.020 0.021 0.018 0.017 0.011 0.011

Proportion Employed at Present

0.006 0.007 0.009 0.015 0.009 0.007 0.017 0.020 0.014 0.013 0.007 0.010

Proportion Unemployed

0.002 0.003 0.003 0.006 0.005 0.003 0.007 0.009 0.008 0.008 0.004 0.004

Proportion in Labor Force

0.005 0.005 0.008 0.013 0.008 0.006 0.015 0.018 0.012 0.012 0.006 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.003 0.002 0.004 0.001 0.001

Average Number of Children

0.023 0.027 0.030 0.054 0.035 0.028 0.067 0.065 0.040 0.050 0.033 0.036

Average Highest Grade Completed

0.060 0.074 0.063 0.109 0.065 0.073 0.137 0.119 0.074 0.081 0.091 0.077

Proportion Currently Enrolled

0.003 0.004 0.005 0.006 0.005 0.004 0.008 0.008 0.005 0.007 0.004 0.006
Table 17. Standard errors for round 18, 1998
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.003 0.002 0.008 0.002 0.006 0.003 0.003 0.001

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.000 0.001 0.000 0.001 0.000 0.000 0.001 0.001 0.001 0.000 0.001

Proportion Attending College

0.003 0.003 0.005 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.035 0.034 0.037 0.051 0.045 0.039 0.047 0.058 0.044 0.047 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.015 0.012 0.008 0.021 0.021 0.018 0.016 0.011 0.010

Proportion Employed at Present

0.006 0.007 0.009 0.014 0.009 0.007 0.017 0.020 0.012 0.014 0.008 0.011

Proportion Unemployed

0.002 0.003 0.003 0.005 0.005 0.002 0.007 0.008 0.007 0.007 0.003 0.003

Proportion in Labor Force

0.005 0.006 0.009 0.013 0.008 0.006 0.016 0.019 0.011 0.011 0.006 0.011

Proportion Gov't Training

0.001 0.001 0.001 0.002 0.002 0.001 0.003 0.004 0.003 0.004 0.001 0.001

Average Number of Children

0.024 0.028 0.030 0.050 0.036 0.028 0.061 0.065 0.042 0.050 0.033 0.035

Average Highest Grade Completed

0.061 0.077 0.063 0.114 0.066 0.073 0.147 0.121 0.074 0.082 0.09. 0.074

Proportion Currently Enrolled

0.003 0.003 0.004 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005
Table 18. Standard errors for round 19, 2000
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.003 0.003 0.001 0.006 0.001 0.005 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.001 0.000 0.001 0.002 0.002 0.000 0.000 0.000

Proportion Attending College

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.004 0.007 0.003 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.037 0.052 0.043 0.039 0.049 0.059 0.044 0.046 0.038 0.041

Proportion Currently Married

0.008 0.010 0.010 0.014 0.012 0.008 0.022 0.021 0.018 0.015 0.011 0.010

Proportion Employed at Present

0.006 0.006 0.009 0.012 0.009 0.007 0.014 0.018 0.014 0.012 0.007 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.004 0.003 0.003 0.001 0.001

Average Number of Children

0.024 0.029 0.030 0.048 0.037 0.027 0.061 0.064 0.046 0.051 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.069 0.074 0.146 0.118 0.078 0.089 0.092 0.078

Proportion Currently Enrolled

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.005 0.007 0.003 0.005

Table 18 note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small subsample sizes for "Proportion government training participant" and "Proportion in high school or less" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 19. Standard errors for round 20, 2002
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.000 0.004 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.001 0.001 0.000

Proportion Attending College

0.002 0.003 0.004 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.036 0.053 0.042 0.039 0.050 0.060 0.043 0.045 0.039 0.041

Proportion Currently Married

0.009 0.010 0.011 0.015 0.013 0.009 0.023 0.022 0.018 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.009 0.012 0.011 0.008 0.016 0.015 0.016 0.014 0.008 0.011

Proportion Gov't Training

0.002 0.002 0.002 0.004 0.004 0.002 0.006 0.006 0.006 0.006 0.002 0.002

Average Number of Children

0.023 0.028 0.028 0.051 0.037 0.026 0.062 0.067 0.048 0.053 0.034 0.034

Average Highest Grade Completed

0.061 0.077 0.065 0.120 0.066 0.074 0.150 0.125 0.073 0.091 0.094 0.078

Proportion Currently Enrolled

0.002 0.003 0.003 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

Table 19 note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small sample sizes for "Proportion government training participant" and "Proportion in high school or less: make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 20. Standard errors for round 21, 2004
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.001 0.003 0.002 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.012 0.010 0.009 0.006

Proportion Living in South

0.034 0.034 0.036 0.053 0.044 0.039 0.051 0.059 0.044 0.045 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.014 0.012 0.008 0.021 0.020 0.018 0.014 0.010 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.009 0.008 0.018 0.018 0.012 0.013 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.001 0.003 0.006 0.004 0.003 0.002 0.002

Average Number of Children

0.024 0.029 0.031 0.053 0.037 0.028 0.069 0.065 0.049 0.051 0.035 0.036

Average Highest Grade Completed

0.061 0.076 0.065 0.115 0.069 0.074 0.149 0.119 0.074 0.096 0.093 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

Table 20 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 21. Standard errors for round 22, 2006
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.002 0.001 0.003 0.001 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.052 0.043 0.039 0.048 0.059 0.043 0.046 0.039 0.041

Proportion Currently Married

0.009 0.010 0.012 0.014 0.012 0.009 0.022 0.018 0.016 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.010 0.008 0.020 0.017 0.014 0.015 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.002 0.004 0.004 0.005 0.002 0.002

Average Number of Children

0.023 0.029 0.030 0.055 0.037 0.027 0.069 0.068 0.048 0.052 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.067 0.074 0.145 0.126 0.072 0.096 0.093 0.078

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

Table 21 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 22. Standard errors for round 23, 2008
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.001

Proportion Attending College

0.002 0.002 0.003 0.004 0.003 0.002 0.005 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.032 0.031 0.034 0.050 0.043 0.035 0.046 0.058 0.042 0.046 0.034 0.038

Proportion Currently Married

0.009 0.010 0.011 0.015 0.012 0.008 0.022 0.020 0.017 0.015 0.011 0.012

Proportion Employed at Present

0.008 0.010 0.013 0.011 0.008 0.018 0.017 0.015 0.014 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.003 0.004 0.003 0.004 0.002 0.002

Average Number of Children

0.023 0.030 0.030 0.054 0.038 0.027 0.068 0.067 0.049 0.052 0.036 0.035

Average Highest Grade Completed

0.062 0.078 0.066 0.109 0.070 0.075 0.141 0.117 0.076 0.094 0.096 0.079

Proportion Currently Enrolled

0.002 0.002 0.003 0.004 0.004 0.002 0.006 0.006 0.005 0.007 0.002 0.004

Table 22 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 23. Standard errors for round 24, 2010
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.003

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.034 0.033 0.037 0.051 0.042 0.039 0.047 0.058 0.042 0.044 0.038 0.041

Proportion Currently Married

0.009 0.010 0.011 0.016 0.012 0.008 0.021 0.023 0.017 0.016 0.010 0.012

Proportion Employed at Present

0.008 0.009 0.011 0.014 0.011 0.009 0.019 0.020 0.017 0.014 0.011 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.002 0.004 0.005 0.004 0.004 0.002 0.002

Average Number of Children

0.024 0.030 0.030 0.057 0.037 0.027 0.072 0.068 0.049 0.053 0.036 0.035

Average Highest Grade Completed

0.062 0.079 0.064 0.112 0.072 0.075 0.140 0.125 0.077 0.098 0.096 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.004

Table 23 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table 24. Standard errors for round 25, 2012
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.001 0.000 0.000

Proportion High School Dropouts

0.007 0.005 0.014 0.009 0.005 0.020 0.015 0.012 0.009 0.009 0.006

Proportion in High School or Less

NA NA NA NA NA NA NA NA NA NA NA NA

Proportion Attending College

0.002 0.003 0.003 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.006 0.020 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.055 0.043 0.039 0.055 0.064 0.044 0.046 0.039 0.041

Proportion Currently Married

0.009 0.011 0.011 0.016 0.012 0.009 0.022 0.022 0.016 0.015 0.012 0.012

Proportion Employed at Present

0.008 0.010 0.011 0.015 0.011 0.009 0.020 0.018 0.016 0.015 0.010 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.004 0.003 0.001 0.004 0.005 0.004 0.005 0.002 0.002

Average Number of Children

0.024 0.030 0.031 0.058 0.038 0.027 0.068 0.069 0.053 0.052 0.036 0.036

Average Highest Grade Completed

0.062 0.080 0.065 0.114 0.073 0.076 0.139 0.126 0.084 0.098 0.098 0.078

Proportion Currently Enrolled

0.002 0.003 0.004 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

Table 24 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was labeled "NA" since no NLSY79 respondent was in this category.

Table 25. Standard errors for round 26, 2014
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.021 0.016 0.012 0.010 0.009 0.007

Proportion Attending College

0.002 0.003 0.002 0.004 0.005 0.004 0.005 0.008 0.003 0.007 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.020 0.014 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.033 0.036 0.056 0.042 0.038 0.059 0.061 0.044 0.046 0.038 0.041

Proportion Currently Married

0.009 0.011 0.012 0.016 0.012 0.009 0.022 0.021 0.017 0.016 0.012 0.012

Proportion Employed at Present

0.009 0.011 0.011 0.014 0.010 0.010 0.021 0.019 0.015 0.013 0.012 0.013

Proportion Gov't Training

0.001 0.001 0.002 0.002 0.003 0.001 0.003 0.003 0.004 0.003 0.001 0.002

Average Number of Children

0.024 0.029 0.032 0.055 0.039 0.027 0.066 0.070 0.054 0.054 0.035 0.037

Average Highest Grade Completed

0.064 0.084 0.067 0.114 0.077 0.078 0.145 0.129 0.088 0.100 0.102 0.080

Proportion Currently Enrolled

0.002 0.002 0.004 0.005 0.004 0.003 0.005 0.008 0.003 0.007 0.003 0.004

Table 25 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25, the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26, the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table 26. Standard errors for round 27, 2016
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.018 0.018 0.012 0.009 0.008 0.006

Proportion Attending College

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.002 0.006 0.003 0.004

Proportion High School Grads

0.0046 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0337 0.033 0.036 0.058 0.041 0.038 0.061 0.063 0.042 0.045 0.038 0.040

Proportion Currently Married

0.0093 0.011 0.011 0.016 0.012 0.009 0.023 0.021 0.017 0.016 0.012 0.011

Proportion Employed at Present

0.0084 0.010 0.011 0.015 0.011 0.009 0.023 0.018 0.016 0.015 0.011 0.013

Proportion Gov't Training

0.0012 0.001 0.001 0.004 0.002 0.001 0.007 0.004 0.004 0.004 0.002 0.002

Average Number of Children

0.0239 0.031 0.031 0.059 0.039 0.028 0.070 0.073 0.054 0.051 0.036 0.037

Average Highest Grade Completed

0.0624 0.080 0.067 0.118 0.075 0.076 0.142 0.134 0.085 0.103 0.098 0.080

Proportion Currently Enrolled

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.003 0.006 0.003 0.004

Table 26 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table 27. Standard errors for round 28, 2018
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.016 0.012 0.008 0.008 0.006

Proportion Attending College

0.0010 0.001 0.002 0.002 0.002 0.001 0.004 0.003 0.000 0.004 0.002 0.002

Proportion High School Grad

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0334 0.033 0.036 0.058 0.042 0.038 0.060 0.063 0.043 0.045 0.038 0.041

Proportion Currently Married

0.0094 0.011 0.011 0.017 0.012 0.009 0.025 0.019 0.016 0.016 0.012 0.012

Proportion Employed at Present

0.0086 0.010 0.012 0.016 0.012 0.010 0.020 0.022 0.017 0.016 0.011 0.014

Proportion Gov't Training

0.0008 0.009 0.001 0.002 0.002 0.001 0.000 0.003 0.003 0.003 0.001 0.001

Average Number of Children

0.0248 0.033 0.032 0.057 0.038 0.029 0.067 0.070 0.054 0.053 0.039 0.037

Average Highest Grade Completed

0.0610 0.081 0.066 0.117 0.074 0.074 0.151 0.126 0.084 0.105 0.100 0.078

Proportion Currently Enrolled

0.0011 0.001 0.002 0.003 0.002 0.001 0.004 0.004 0.000 0.004 0.002 0.002

Table 27 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Table 28. Standard errors for round 29, 2020
  All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.000 0.006

Proportion Attending College

0.0007 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

Proportion High School Grad

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.009 0.006

Proportion Living in South

0.0332 0.034 0.035 0.058 0.042 0.038 0.062 0.062 0.044 0.044 0.039 0.040

Proportion Currently Married

0.0100 0.012 0.012 0.017 0.012 0.010 0.025 0.020 0.017 0.016 0.013 0.012

Proportion Employed at Present

0.0092 0.013 0.012 0.015 0.013 0.011 0.020 0.023 0.018 0.017 0.015 0.014

Proportion Gov't Training

0.0008 0.001 0.001 0.003 0.002 0.001 0.005 0.003 0.003 0.002 0.001 0.001

Average Number of Children

0.0250 0.034 0.032 0.055 0.039 0.029 0.068 0.071 0.055 0.057 0.040 0.037

Average Highest Grade Completed

0.0630 0.085 0.065 0.121 0.075 0.076 0.155 0.130 0.082 0.108 0.103 0.076

Proportion Currently Enrolled

0.0008 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

Table 28 note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Sample Weights & Clustering Adjustments

Sample weights

In each survey year a set of sampling weights is constructed. These weights provide the researcher with an estimate of how many individuals in the United States each respondent's answers represent. Weighting decisions for the NLSY79 are guided by the following principles:

  1. individual case weights are assigned for each year in such a way as to produce group population estimates when used in tabulations
  2. the assignment of individual respondent weights involves at least three types of adjustment, with additional considerations necessary for weighting of NLSY79 Child data

The interested user should consult the NLSY79 Technical Sampling Report (Frankel, Williams, and Spencer 1983) for a step-by-step description of the adjustment process. A cursory review of the process follows.

  • Adjustment One. The first weighting adjustment involves the reciprocal of the probability of selection at the first interview. Specifically, this probability of selection is a function of the probability of selection associated with the household in which the respondent was located, as well as the subsampling (if any) applied to individuals identified in screening.
  • Adjustment Two. This process adjusts for differential response (cooperation) rates in both the screening phase and subsequent interviews. Differential cooperation rates are computed (and adjusted) on the basis of geographic location and group membership, as well as within-group subclassification.
  • Adjustment Three. This weighting adjustment attempts to correct for certain types of random variation associated with sampling as well as sample "undercoverage." These ratio estimations are used to conform the sample to independently derived population totals.

Sampling Weight Readjustments. Sampling weights for the main survey are readjusted to account for noninterviews each survey year. The readjustments are necessitated by differential nonresponse and use base year sample parameters for their creation, employing a procedure similar to that described above. The only exception occurs in the final stage of post-stratification. Post-stratification weights in survey rounds two and above have been recomputed on the basis of completed cases in that year's sample rather than the completed cases in the base year sample.

Custom weights

Users looking for a simple method to correct a single year's worth of raw data for the effects of over-sampling, clustering and differential base year participation should use the weights include each round on the data release. Unfortunately, while each round of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years' worth of data. The NLS has a custom weighting program which provides the ability to create a set of customized longitudinal weights. These weights improve a researcher's ability to accurately calculate summary statistics from multiple years of data.

The custom weighting program calculates its weights by first creating a new temporary list of individuals who meet all of a researcher's criteria. This list is then weighted as if the individuals had participated in a new survey round. The weights for this temporary list are the output of the custom weighting program.

There are two options for the custom weighting program on the Custom Weights for the NLSY79 page. The first option allows researchers to specify the particular rounds in which respondents participated. Researchers can also select if "The respondents are in all of the selected years" or can select if "The respondents are in any or all of the selected years." The second option allows users to input a list of respondent ids to get the appropriate weights for just that list. For example, this second option allows researcher to weight only those people who ever reported smoking cigarettes in any survey or weight only people who needed extra time to graduate from college.

Important information on the Custom Weighting Program

  • If you select all survey rounds available and also pick "The respondents are in any or all of the selected years," the weights produced are identical to round 1 survey weight. This result arises because the any selection combined with all survey rounds produces a list of every person who participated in the survey.
  • The output of the custom weight program has 2 implied decimal places just like the weights found in the data release. Dividing each custom weight output value by 100 results in the number of individuals the respondent represents.

Practical usage of weights

The application of sampling weights varies depending on the type of analysis being performed. If tabulating sample characteristics for a single interview year in order to describe the population being represented (that is, compute sample means, totals, or proportions), researchers should weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by persons born in 1957 through 1964, simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not correspond to the intended result include:

Samples Generated by Dropping Observations with Item Nonresponses. Often users confine their analysis to subsamples for which respondents provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don't knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes only a small proportion of the data for variables under analysis, population estimates (that is, weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort under analysis. For more information on item nonresponse in the NLSY79, see the Item Nonresponse section of this guide.

Data from Multiple Waves. Because the weights are specific to a single wave of the study, and because respondents occasionally miss an interview but are contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, occasionally the weights for a respondent in different years may be quite dissimilar, leaving the user uncertain as to which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. In practice, if the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used (for more information on weighting see the section on Sample Weights & Clustering Adjustments).

Regression Analysis. A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may not lead to correct estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership.

Users interested in calculating the population average effect of, for example, education upon earnings, should simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, and so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts.

Users uncertain about the appropriate grouping should consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would not be properly specified.

Clustering adjustments

Researchers use NLSY79 data to estimate a variety of statistics. Since NLSY79 data come from a sample instead of data from every age appropriate individual in the U.S. the statistics produced are only estimates of the "true" national values. When researchers use a computer package to compute a statistic such as a mean or a regression coefficient, the program automatically provides a second set of statistics, such as the standard error, standard deviation, or t-statistic, which tells researchers how precisely the mean or coefficient is measured.

Details. Instead of randomly selecting individuals located anywhere in the U.S. during 1978, only a random selection of areas were selected. By randomly selecting a fixed number of small areas, interviewers reduced the amount of time they spent traveling for each interview. In this way, costs were lowered and the survey was fielded faster yielding data more quickly. Like all other national data sets that use clustering, NLSY79 data has many groups or bunches of respondents who share similar characteristics because they lived in the same neighborhood during 1978. This makes survey results appear more homogeneous, or similar, than actually found in the US.

Researchers can use two different approaches to correct this problem. The first approach uses the tables found in the NLSY79 Technical Sampling Report. For each survey round there is a table that lists the "Design Effects" or DEFT factors. These DEFTs give users a simple method for determining approximately how much they should increase their standard errors when trying to measure the precision of their estimates. Using the DEFT factors is a simple method of adjusting standard errors to account for clustering. However, when using specialized subsamples, these tables provide no guidance for users on how to adjust regression coefficients being based on calculations from only a small subset of NLSY79 variables.

The more general method is to correct for clustering by using a specialized software package. Two of the most widely used packages to adjust surveys for clustering effects are Stata, sold by the Stata Corporation and Sudaan, sold by RTI International. This section describes how to adjust for clustering using Sudaan. Sudaan is used to generate the DEFT factors found in the Technical Sampling Report.

Important information about clustering

If you do not have access to the Geocode data set, you cannot use Sudaan or Stata to adjust for clustering. The Geocode data set can only be accessed by individuals approved by BLS. See Geographic Residence and Neighborhood Composition for information on obtaining the Geocode data CD.

Table 1. Effect of clustering correction on a mean value's standard error, 1998 data, example one

Variable

Mean Value Uncorrected Std Error Corrected Std Error

Net Worth

$128,068 $3,403 $5,826

Family Income

$55,031 $536 $1,137

BMI

26.7 0.06 0.09

Table 2 shows how adjusting for clustering affects a simple regression. Using the same 1998 data, a simple unweighted least squares equation was run with both SAS and Sudaan using net worth as the dependent variable and six independent variables. Three of these independent variables (BMI, income and age) take a wide range of values, while the remaining three variables (black, Hispanic or Latino, and female) take the value of 1 if the respondent has the particular characteristic and 0 otherwise.

The table shows that adjusting for clustering changes many of the standard errors and associated t-values. The biggest effect is seen on the income line. The uncorrected standard error increases from 0.06 to 0.19, resulting in the t-value falling from 44.37 to 13.87. Smaller changes are seen for the other variables. The intercept, age, and female standard errors all increase in size while the BMI, black, and Hispanic or Latino variables all end up with slightly smaller standard errors.

Overall, both examples show that adjusting for clustering effects is important. The next subsection shows what variables are needed to adjust for clustering. The section ends with the specific Sudaan commands used to create the tables in this chapter.

Key Variables Needed For Clustering Correction. Two variables are needed to adjust the data set for clustering. Both variables are found only on the Geocode data set and are placed there because researchers can use these variables to determine where each civilian respondent lived in 1978.

Table 2. Effect of clustering correction on a mean value's standard error, 1998 data, example two

Variable

Coefficient Estimate Uncorrected Std Error Uncorrected t Value Corrected Std Error Corrected t Value

Intercept

186,808 43,534 4.29 52,166 3.58

BMI

1,091 466 2.34 457 2.39

Income

2.63 0.06 44.37 0.19 13.87

Black

40,394 5,938 6.80 4,259 9.48

Hispanic

41,382 6,617 6.25 4,554 9.09

Age

5,285 1,086 4.87 1,252 4.22

Female

2,814 4,891 0.58 5,064 0.56

As discussed above, the NLSY79 is a multi-stage clustered sample. The clusters were created by first dividing the entire U.S. into Primary Sampling Units, or PSUs. These PSUs were defined by NORC and were composed of Standard Metropolitan Statistical Areas (SMSAs), entire counties when the counties were small, parts of counties when the counties were large, and independent cities. NORC randomly selected two different sets of PSUs for inclusion in the study, each of which by itself randomly represents the U.S. This selection of two sets of PSUs means the NLSY79 is composed of two replicates or strata. Within each is a random selection of PSUs. The replicate or strata that a respondent belongs to is found in the Geocode data set only and is labeled variable R02191.46, entitled "Within Stratum Replicate Of Primary Sampling Unit." This variable takes either the value 1 or 2, for either the first or second replicate.

The variable, containing the PSU is labeled R02191.45, and is entitled "Stratum Number For Primary Sampling Units." R02191.45 ranges in value from 1 to 120. Researchers who want to know which geographic areas correspond to particular values should look at Attachment 104 of the Geocode Codebook Supplement for the crosswalk table. Respondents with a PSU code of 52 to 70 are part of the military sample and do not have any known geographic location.

Important information: Clarification on variable labeling

The label for variable R02191.46 found in SAS and SPSS programs that is automatically produced by NLS Investigator is confusing. The label reads "PRIMARY SAMPLNG UNIT PSU SCRAMBLED 79". This variable contains the scrambled replicate, or stratum number, not the PSU. PSU information is found in R02191.45. Users should be careful when adjusting geographic variables using the clustering corrections. The complete title for variable R02191.46 is "Within Stratum Replicate Of Primary Sampling Unit (PSU) - Scrambled." Because this variable is randomly scrambled, doing clustering corrections on some geographic variables produces incorrect results. Scrambling has no effect on variables that are not geographic, such as education, income, or training.

Using the Key Variables In Sudaan. The specific steps used to generate the tables above are covered in this section. While the tables were produced using the Windows Version 8.0 Standalone package, the steps and commands are similar for other versions of Sudaan. To adjust summary statistics such as means or regressions with Sudaan, the researcher needs to create three files: one containing the data, one telling Sudaan how to read the data, and one containing the specific commands. Any computer package can be used to create the data file. Data can even be written directly from NLS Investigator to a file. Figure 1 has the relevant portion of the SAS program used to create the data file used in Tables 1 and 2 above.

Figure 1. SAS commands to create Sudaan data file

Data obesity;
(SAS commands that generate variables like Age, Income, and BMI are placed here)
PSU =R0219145;
REPLICATE =R0219146;
proc sort; /* Sort the data since Sudaan can not handle unsorted */
by replicate psu;
Data;
Set obesity;
file 'C:\DesignEffects\ObesitySudaanAdjustment.dbs'
put ID     5.
PSU         3.
REPLICATE   2.
WGHT       7.
BLACK      2.
HISPANIC    2.
AGE        3.
SEX        2.
INCOME      9.
BMI        4.1
NETASSET    9

Run;

One of the key things to note is that the data are sorted by the PSU and replicate variables before being written to the file. For most operations, Sudaan requires the data to be in this order before processing.

The second file is the "label" file. This file is used to read the data into Sudaan. The label file, called "ObesitySudaanAdjustment.lab," is shown in Figure 2. The label file has five parts. The first column on the left is the variable's name, followed by a letter which tells Sudaan if the variable contains numeric or character data. The third and fourth columns contain the number of bytes (characters) taken up by the variable and the number of decimal places in the number. The last column contains the label. Sudaan expects the label file to follow a precise format with columns starting and ending in very specific places.

Figure 2. Sudaan label file

ID

N 5 0 ID# (1-12686)

PSU

N 3 0 # OF PSU

REPLICAT

N 2 0 REPPLICATE SCRAMBLED

WGHT

N 7 0 SAMPLING WEIGHT

BLACK

N 2 0 T/F BLACK

HISPANIC

N 2 0 T/F HISPANIC

AGE

N 3 0 AGE OF RESPONDENT

SEX

N 2 0 MALE 0 - FEMALE 1

TOTINC

N 9 0 TOTAL INCOME

BMI

N 4 1 BODY MASS

NETASS

N 9 0 TOTAL NET WORTH

The third file is the set of commands used to run Sudaan. Many versions of Sudaan allow commands to be typed directly into the program so researchers are not forced to create command files. Figures 3 and 4 provide the Sudaan commands that were used to create Tables 1 and 2 above. Figure 3 has three sections. The top section below the "Proc Descript" command tells Sudaan where to find the raw data and what variable contains the basic survey weights. The nest command defines which variables contain the replicate and PSU information. The middle section, beginning with "Var," tells Sudaan which variables will have descriptive statistics created. The final section, beginning with "Print," specifies the types of output that are shown.

The first section of Figure 4 is similar to commands seen above in Proc Descript. The large difference is that the "weight" command has the reserved name "_ONE_" after it instead of the NLSY79 weight, "wght." Putting the "wght" variable after the weight command would cause Sudaan to run weighted least squares. By using "_ONE_" instead, Sudaan weights all variables with the same 1.0 value, resulting in Sudaan running unweighted least squares. The second part of the command, which begins with "Model," shows the exact regression to run.

Figure 3. Sudaan commands used to create summary statistics in Table 1

Proc Descript
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr mean DEFT1est_no=12686;
weight wght;
nest REPLICAT PSU / MISSUNIT;
Var NETASS BMI TOTINC BLACK HISPANIC AGE SEX;
Print nsum="Sample Size" WSUM="Population Size" Mean
semean="Std. Err." DEFFMEAN="Design Effect" / style=nchs
nsumfmt=f6.0 wsumfmt=f10.0 deffmeanfmt=f6.2 semeanfmt=f11.2;


Figure 4. Sudaan commands used to create regression values in Table 2

Proc Regress
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr DEFT1est_no=12686;
weight ONE;
nest REPLICAT PSU / MISSUNIT;
Model NETASS = BMI TOTINC BLACK HISPANIC AGE SEX;

Related Variables The 1979 Geocode data also contain the State, county, and metropolitan statistical area where the respondent lived in 1979.
Documentation Additional information can be found in Standard Errors and Design Effects section of this User's Guide, in the NLSY79 Technical Sampling Report, and in Attachment 104 of the Geocode Codebook Supplement.
Data Files Data on clustering can be found only in the NLSY79 Geocode files under the "GEOCODE" 1979 area of interest.

Types of Variables

There are six types of variables present in the NLSY79 data. Some are the raw answers provided by the respondent, while others are constructed. Types of variables include:

  1. Direct (or raw) responses from a questionnaire or other survey instrument
  2. Edited variables constructed from raw data according to consistent and detailed sets of procedures, such as occupational codes, KEY variables, and so forth
  3. Constructed variables based on responses to more than one data item, either cross-sectionally or longitudinally, and edited for consistency where necessary, such as variables on the NLSY79 Supplemental Fertility File ("Fertility and Relationship History/Created" area of interest in NLS Investigator)
  4. Constructed variables from other sources, such as the County & City Data Book information present on the NLSY79 Geocode data files
  5. Variables provided by an outside organization based on sources not directly available to the user, such as the high school survey and transcript data, scores from the Armed Services Vocational Aptitude Battery, and so forth
  6. Data collected from or about one universe of respondents reconstructed with a second universe as the unit of observation, such as variables on the NLSY79 Child File

The type of variable impacts:

  • the title or variable description naming each variable,
  • physical placement of each variable within the codebook, and
  • location of a variable within a given area of interest.

Reference numbers

Every variable in the main NLSY79 data files has been assigned a reference number or identifier that determines its relative position within the data file and NLS documentation system. Persons contacting NLS User Services should be prepared to discuss their question or problem in relationship to the reference number(s) of the variable(s) in question.

Important information about data consistency processes

In general, the Center for Human Resource Research (CHRR) does not impute missing values or perform internal consistency checks across waves. Exceptions to this general rule occur when financial support is available, as is the case with the consistency edits performed since 1982 on the NLSY79 fertility data. When bounded interviewing methods are used, responses from the previous interview appear in the text of a question, both to verify that past information and as a point from which to update current information. Bounded interviewing techniques, using data from the Information Sheets or flap items, are intended to impose consistency across waves. Data quality checks most often occur in the process of constructing (1) cumulative and current status variables, such as 'Highest Grade Completed,' and (2) NLSY79 employment-related variables, such as 'Weeks Working in Past Calendar Year,' 'Total Tenure with Employer,' and so forth. More information on NLSY79 instruments can be found in the Survey Instruments section.

Once assigned to variables within the NLSY79 data files, reference numbers remain constant through subsequent revisions of the files. Reference numbers are assigned sequentially, with variables referring to the first survey year having a lower reference number than those variables specific to the second year and so forth.

Occasionally variables are created in a year later than that in which the data were actually collected. These variables are frequently given a reference number with a decimal value that reflects the year in which the actual data were gathered rather than the year the created variable was constructed, for example, R01461.01. Beginning with the 1993 survey, decimals are also used to indicate that more than one variable has been derived from a single question.

Important information about reference numbers

Reference numbers in the main and Geocode data files have traditionally begun with the letter "R." Beginning with the 2000 data release, the work history variables are incorporated with the main data on the same data set. However, these work history variables are assigned reference numbers beginning with "W" for easy identification. Beginning in 2006, government program participation or recipiency variables are assigned reference numbers beginning with "G,", health module variables are assigned reference numbers beginning with "H," and all other variables are assigned reference numbers beginning with "T."

Variable descriptions or variable titles

Each variable within NLSY79 main file data files has been assigned an 80 character summary title that serves as the verbal representation of that variable throughout the documentation.

Variable titles are assigned by CHRR archivists who endeavor, within the limitations described below, to capture the core "content" of the variable and to incorporate within the title:

  1. "NLS Investigator areas of interest" that facilitate easy identification of related variables,
  2. "Universe identifiers" that specify the subset of respondents for which each variable is relevant, and
  3. "Reference periods" that indicate the specific period of time (e.g., survey year, calendar year) to which the data pertain for some variables. Universe identifiers and reference periods are discussed below.

Universe Identifiers. If two ostensibly identical variables differ only in that they refer to different universes, the variable title will include a reference to the applicable universe by either appending in parentheses to each title the appropriate universe (Example 1) or by identifying the universe before the variable title (Example 2).

  • Example 1: 'Did R Have Any Job since Last Int? (Unemployed or OLF) (1994)'
  • Example 2: 'Female - Number of Children R Has Had since Last Interview'

Reference Periods. Variable descriptions may include a phrase indicating the time period to which the data refer. When a date follows a verbal description of a variable and is preceded by the prepositional phrase "in 19XX," the date identifies the calendar year for which the relevant information was collected.

  • Example: 'Received Income from Child Support in 1991?' This 1992 survey question refers to child support payments received in calendar year 1991.

Important information about verifying variable details

Do not presume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s) for the cohort. In addition, variables with similar content may have completely different titles, depending on the type of variable (raw versus created).

Variables with similar content, such as information on respondents' labor force status, may have completely different titles, depending on the type of variable (raw versus created). In addition, such variables may be located within different NLSY79 areas of interest.

  • Example 1: 'Employment Status Recode' (ESR), in 1979-98 and 2006, is the created or reconstructed version of the 'Activity Most of Survey Week' raw variable. The 'Activity' variable is derived from the first question of the full series of questions used by the Department of Labor (DOL) to obtain employment status; the title reflects questionnaire content. ESR, on the other hand, reflects the procedure used to recode the 'Activity' variable. This produces a constructed variable for all respondents based upon responses to the 'Activity' question and all other questions used by the DOL to obtain employment status. These other questions serve to qualify and refine employment status beyond the answer to the initial 'Activity' question.
  • Example 2: NLSY79 raw fertility variables appear within the various "Children," "Birth Record," or "Birth Record xxxx" areas of interest while edited and constructed versions of these variables appear within the "Fertility and Relationship History/Created" area of interest.

Finally, different archivists, for a period of more than 20 years, have performed the task of assigning variable descriptions to data. While every effort has been made to maintain consistency, users may find some differences in variable title and area of interest assignment.

New variables created by researchers

Researchers sometimes use the NLS public datasets to generate a new variable to use in their research. In some cases, researchers like to make that new variable publicly available (through their own data repository) so that it can be easily accessed for follow-up studies. This is permissible as long as researchers are using public NLS data (rather than restricted) and that they make it clear they are the author of the variable rather than the NLS team.

Survey Instruments

The primary variables found within the main data set are derived directly from survey instruments, such as questionnaires, household interview forms, and so forth. This section describes each of the NLSY79 instruments in the order that they appear in the following list.

Types of NLSY79 survey instruments & user aids

This section also explains the conventions used in the NLSY79 documentation system to identify questionnaire items from some of the primary survey instruments. An additional document, the interviewer reference manual, provides background information on specific survey instruments.

Important information on instrument terminology

Questionnaire Item or Question Number. This generic term refers the user to the printed source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item that appears within one of the survey instruments. Each questionnaire item has been assigned a number or a combination of numbers and letters within the NLSY79 documentation system to assist the user in linking each variable to its location in a survey instrument. NLSY79 questionnaire item assignment is complex and varies across survey years and instruments. For some years, NLSY79 questionnaire item identification is dependent upon various combinations of the deck and column numbers used in data entry that are printed to the right of the answer categories on the survey instrument. In other years, designation is made by section and question numbers. Specific information on the conventions used appears below, after each relevant instrument, under the subheadings "Question Numbering."

A unique set of survey instruments has been used during each survey year to collect information from respondents. The term "survey instrument" is used to refer to:

  1. the questionnaires that serve as the primary source of information on a given respondent
  2. questionnaire supplements fielded during select survey years that contain additional sets of questions
  3. documents such as the household interview forms or household record cards that collect information on members of each respondent's household

Users should be aware that, while the source of the majority of variables in the main NLSY79 data files is the questionnaire or one of the other survey instruments, certain NLSY79 variables are created either from other NLSY79 variables or from information found in an external data source (see Types of Variables).

Household information

Each NLSY79 interview includes the collection of information on the members of each respondent's household. For NLSY79 respondents, such household data are collected prior to the administration of the main questionnaire and for many years used separate survey instruments called the Household Interview Forms. Both the instruments used for the yearly household data collection and the household screening instruments that were used to draw the samples of respondents are described below.

NLSY79 1978 Household Screener and Interviewer's Reference Manual. This document (fully titled NLSY-National Longitudinal Survey of Labor Force Behavior Interviewer's Manual-Household Screening, NORC 1978) contains detailed information on the 1978 screening of households conducted by NORC from which the civilian youth samples (the cross-sectional and supplemental samples) were drawn. It provides a copy of the short 25-question screener, question-by-question specifications for administering the form, and a sample completed screener. Most of the information collected on each respondent during the screening is presented within the data set. The screener is the source for important data such as the sex and race or ethnicity variables that were used to assign each respondent to a specific NLSY79 subsample, as well as the relationship codes (for example, brother, sister, husband, wife) that allow researchers to identify related NLSY79 respondents who shared a household at the time of the screening.

Question Numbering. Question numbers for the 1978 screener were arbitrarily assigned by NORC using an artificial questionnaire section number that followed the last section of the 1979 questionnaire ("Section 25" for all screener variables) even though the actual administration of the screener preceded that of the 1979 questionnaire.

Users should note that screener questions are identified within the documentation as 1979 variables even though these data were collected during 1978. Most variables from the screener use the phrase HOUSEHOLD SCREENER at the beginning of the variable title, appear physically within the codebook after the 1979 household record series, and have been placed within the "Misc. 1979" area of interest.

Household Interview Forms. Yearly household information for the NLSY79 is collected from either the respondent or the head of household prior to the administration of the main questionnaire. NLSY79 Household Interview Forms are used to:

  1. enumerate all persons currently living in the respondent's household
  2. record information about each person's age, highest grade completed, work experience in the past year, and relationship to the respondent
  3. collect, during the 1979-86 surveys, certain family income information

Information on household members is collected using the questions on the Household Interview Forms; however, much of the information is actually recorded on the "Household Enumeration" section of the Face Sheet discussed below.

During the 1979-86 interviews, different versions of the Household Interview Forms were administered depending upon the type of residence of the respondent. Version A was used if the respondent was living with his or her parents (or in-laws), in which case the interview was conducted with the respondent's parents (or in-laws) in order to gather information on household income sources. Version B was used if the respondent was living in group quarters, such as a dormitory or the military, or in temporary facilities, such as a hospital or prison, and was administered to the respondent. If the respondent had a permanent residence elsewhere, the household interview gathered information about that household. Version C was administered to the respondent if he or she was living in his or her own dwelling unit, military family housing, an orphanage, a religious institution, or other individual quarters or was the head of a family unit. Table 1 in the Household Composition section of depicts, by survey year, the universe and residential unit(s) specific to each form.

During the first eight survey rounds, many respondents were younger than 18 and living with their parents; thus, Version A was frequently used. Beginning with the 1987 survey, all respondents were 21 or older and living predominantly on their own; consequently, the household interview forms were consolidated into a single version. For 1979-86, these forms appear as separate documents. Beginning with the 1987 interview, household interview questions were incorporated within each year's questionnaire. Some variation in administration of these forms has occurred over survey years. Users should refer to each survey year's Interviewer's Reference Manual for more information.

Interviewing aids

Certain instruments used during fielding of the NLSY79 provide researchers with interview- and respondent-specific information that appears as variables within the NLSY79 data files.

Face Sheet. Immediately prior to fielding, a Face Sheet is computer-generated for each respondent and forwarded to the interviewer assigned to that case. The Face Sheet contains:

  1. various items of respondent-specific information (name, address, phone number)
  2. information about each member of the household or family unit as of the last interview (full name, sex, relationship to youth, education, and whether the household member worked during the year), generated from the most recent administration of the Household Interview Forms
  3. a historical overview of previous interview rounds (whether the respondent refused to be interviewed, the respondent was interviewed after initially refusing, the interview was complete or incomplete, and so forth)
  4. for the 1980-86 survey years, information on the version of the Household Interview Form that was used in the previous interview

This information is used to alert the interviewer and field manager to potential problems, assist them in preparing a successful location and fielding strategy, and provide details necessary to conduct an efficient interview, such as a listing of previous employers. Information about the respondent's household and family unit from each survey year's Face Sheet can be found by searching the "Household Record" area of interest with NLS Investigator. Sample Face Sheets for most survey years can be found in the various Interviewer Reference Manuals.

Information Sheet. This document contains data on the respondent from the previous interview that will be referred to and used to update information during the interviewing process. Items found on this document include marital status, high school completion status, university last attended, names of previous employers, training program enrollment, and pregnancy status. This information enables the interviewer to accurately route the respondent through the relevant sections of the questionnaire and provides on-the-spot reconciliation of earlier errors. Information Sheet items appear within the NLSY79 data set ("Last Interview Information" area of interest in NLS Investigator). Beginning with the 1993 interviews, the information sheet is incorporated into the CAPI instrument. Sample Information Sheets can be found in the Interviewer Reference Manuals. In CAPI surveys, information sheet data are stored electronically on the interviewer's laptop and accessed by the survey program during the interview; no paper information sheet is used.

Children's Record Forms (CRF) (1985-92). This interviewing aid containing information on biological (collected each survey) and nonbiological (that is, adopted or step-; collected biennially) children was used in the 1985-92 surveys to:

  1. provide identification numbers, names, dates of birth, sex, and deceased/adopted status for each child
  2. identify special sections of the main questionnaire (such as immunization, feeding, and so forth) that needed to be administered for particular children

Sample Children's Record Forms can be found in the Interviewer's Reference Manuals. Beginning with the 1993 interviews, this form is incorporated into the CAPI instrument. As with information sheets, these data are automatically accessed by the survey program during CAPI interviews, so the hard copy CRF is no longer needed.

Questionnaires

There are separate and distinctly different questionnaires for each survey year of the NLSY79. Each questionnaire is organized around a set of topical subjects, the titles of which usually appear on either the first page of each section of the questionnaire or as a header.

Important information on questionnaire use

The questionnaires are critical elements of the NLSY79 documentation system and should be used by each researcher to ascertain the wording of questions, coding categories, and the universe of respondents asked to respond to a given question.

NLSY79 questionnaires record:

  1. interview dates
  2. responses to the topical survey questions (see discussion below)
  3. locating information which will assist NORC in finding the respondent for the next interview
  4. interviewer remarks on such topics as the race and sex of respondent, language in which the interview was conducted, interviewer's impressions, and so forth

Show Cards. These are interviewing aids used in conjunction with the questionnaire and list the possible response categories for selected questions. Show cards help the respondent keep the more complicated response categories in mind.

NLSY79 questionnaires explore the following core topics:

  • current labor force status
  • jobs and employers
  • work experience and attitudes
  • training
  • assets and income
  • family background
  • marital history
  • fertility
  • regular schooling
  • military service
  • health

Additional sets of questions have been fielded during select survey years on such topics as:

  • childcare
  • alcohol use
  • drug use
  • job search methods
  • educational/occupational aspirations
  • school discipline
  • pre-and post-natal health behaviors
  • delinquency
  • childhood residences

During the 1979-92 paper-and-pencil (PAPI) interviews, questionnaires and other survey instruments were preprinted paper products used during fielding. With the advent of computer-assisted interviewing (CAPI) in 1993, the "questionnaire" became a series of visual screens that not only told the interviewers what questions to ask but provided helpful instructions on how to administer the interview. Separate supplemental documents such as the job-specific Employer Supplements were integrated into the electronic main questionnaire. NLSY79 CAPI questionnaires incorporate some helpful elements of the traditional codebook, with reference numbers assigned to variables and greater specificity on coding and universes provided within each codeblock.

Question Numbering. The conventions used to assign question numbers within the NLSY79 documentation system vary by survey year and are based on various combinations of the questionnaire section number, the question number, or the deck and column numbers (Table 1). Users can locate a variable within the codebook--which represents each question fielded in the same order as it appears within the questionnaire--by finding the question number which appears (in parentheses) to the right of each reference number.

Table 1. NLSY79 question numbering conventions

Survey Year

Designated By

Example

1979 Section # (S) and Question # (Q) S02Q01: Question 1 in Section 2
1980-82 Section # (S), Deck # (D), and Column # S06D1314: Question appearing in Section 6, deck 13, column 14
1983-87,
1989-92
Deck # and Column # Q0413: Question appearing in deck 4, column 13
1988 Section # and Question # (Q) Q5.3: Question 3 in Section 5
1993-present Section #, Question # (Q) and Loop # as applicable Q5-26.3: Question 26 in Section 5, with the appended .03 representing the third loop

Deck and column numbers are vestigial items that were used to locate the data when it was input on punch cards. The deck numbers are printed at the upper right hand corner of each page in the survey instruments and at the beginning point for each new deck for the 1980 through 1992 instruments. The column numbers are printed to the left of the response categories. If the variable contains more than one digit, the column reference is to the starting column for that variable. 

Important information on questionnaire content

Although NLSY79 questionnaires are to some extent topically arranged, the user should be aware that the absence of a section title on a given subject does not mean that no questions on that topic were fielded during that survey year. For example, the 1987 and 1989 NLSY79 questionnaires contain no section entitled "Childcare." However, a small number of childcare questions were asked in those years and appear within the "Fertility" section of the questionnaires.

Questionnaire supplements

Separate instruments called "supplements" have been used since the onset of the NLSY79 to administer distinct sets of questions. The NLSY79 has made extensive use of supplements for collecting information from separate universes such as schools or children or for administering confidential sets of questions on illegal activities or abortion. The following section describes each supplemental instrument used for the NLSY79. The use of such separate supplements has diminished with CAPI-administered interviews. In the main youth and young adult instruments, all supplements are now incorporated as electronic modules in a questionnaire. Children still use multiple supplements, one self-report, one interviewer-administered, and one completed by the mother.

Illegal Activities Form J (1980). This confidential questionnaire supplement, administered during the 1980 survey, contains a series of questions designed to collect information on the extent of respondents' participation in various delinquent and criminal activities such as:

  • skipping school
  • alcohol/marijuana use
  • vandalism
  • shoplifting
  • drug dealing
  • and robbery

This series supplements those on reported contacts with the criminal justice system collected within the main questionnaire.

Employer Supplement. Information about each employer for whom a NLSY79 respondent has worked since the last interview has been collected since 1980. One Employer Supplement is administered for each employer and contains questions about gaps when the respondent was not working, the number of hours worked, the type of work done, and the wages earned at that job. Note: Comparable information for the 1979 survey can be found in the "On Jobs" section of the main questionnaire and within the separate single sheet 1979 Employer Flap. Beginning with the 1993 CAPI interviews, all employer supplement questions appear within the body of the main questionnaire.

Question Numbering. Five numbering systems have been used to identify questionnaire items within the Employer Supplement (Table 2). Although data from up to 10 jobs are collected, the main data set includes information on only the first five jobs since few individuals work at more than five jobs between interviews. Data on all ten jobs are used to construct a series of summary variables for hours and weeks worked; see the Labor Force Status, Time & Tenure with Employers, and Work Experience sections for more information.

Table 2. Employer Supplement question numbering conventions: 1980-present

Survey Years

Question Numbering Description

1980-87
1989-91
A supplement identifier, i.e., the letter B, representing the first supplement, through F, the fifth supplement, is combined with the deck and column numbers preprinted in the instrument. The deck numbers for the first Employer Supplement would be B1, B2, B3, and B4 while the second supplement would use C with each deck and column number. The question number QB140 thus refers to B (the first supplement), 1 (deck 1), 40 (column 40), while QC166 refers to Employer Supplement C, deck 1, column 66.
1988 Letter designations, i.e., ESB, ESC, ESD, ESE, ESF, continue to identify the specific supplement in use; however, deck and column numbers are not used. Appended to the supplement identifier is the actual question number as printed in the supplement. For example, ESB.1 refers to the first supplement, question 1.
1992 A series of supplemental deck numbers are attached to the column numbers preprinted in the supplement. Question numbers 7439-7831 refer to information collected in the first supplement, 7939-8331 to the second supplement, 8439-8831 to the third supplement, 8939-9331 to the fourth supplement, and 9439-9831 to the fifth supplement.
1993-1996 The designation QES and a number, e.g., QES5, indicates that this series of questions collected information about the fifth employer. Hyphenated numbers attached to the QES5, e.g., QES5-26, QES5-27, etc. indicate the specific question number within the series, while a decimal number following a question number, QES5-26.3, reflects the third repetition of that question for that employer.
1998-present Beginning in 1998, the number identifying the employer was moved to a decimal after the question number. The question previously labeled QES5-26.3, for example, was now designated as QES-26.05.03. The decimal number ".05" indicates this information was collected about the fifth employer. Again, ".03" represents the third repetition of question 26 for the fifth employer.

Fertility Supplement (1983). Respondents (both male and female) who were not interviewed during 1982 were administered a special set of supplementary fertility questions during the 1983 survey. The Fertility Supplement was designed to collect complete fertility data, including all live births for males and females, and all pregnancy losses and contraception between pregnancies for females. For those not interviewed in 1982, these questions replaced the fertility questions found in Section 10 of the 1983 questionnaire.

Confidential Abortion Forms. Biennially beginning in 1984, female NLSY79 respondents have completed a short confidential abortion form which elicited information on the number and dates of each abortion. Copies of these supplementary questions are provided within the survey instrument sets. The 1984 form also collected information on the dates that respondents left school prior to 1979 if leaving school was associated with early childbearing. Beginning in 2002, the abortion form was included in the main instrument. 

Drug Use Supplement (1988, 1992, 1994, and 1998). The 1988 supplement contains the confidential set of drug use questions which were, through a random assignment process, self-administered by the respondent in half of the cases and administered by the interviewer in the other half. Questions were asked on age at first use of marijuana and cocaine, extent of lifetime and most recent use, and method(s) practiced in using cocaine. The 1992 and 1994 supplements contain the confidential set of questions on respondents' use of cigarettes, alcohol, marijuana, cocaine, or other drugs. Users should note that while the 1988 and 1992 supplements are bound as separate booklets, the 1994 and 1998 supplements are bound with the main questionnaire.

Childhood Residence Calendar (1988). The 1988 questionnaire contained a special section detailing the living arrangements of respondents from birth through age 18. The Childhood Residence Calendar, the interviewing aid used to collect these data, depicts for each year of life the type of parent (biological-, adoptive-, or step-) with whom each respondent lived for at least four months and, for those ages when he or she was not living with a parent, in what other arrangements the respondent resided, such as, with grandparents, foster parents, friends, or in a children's home, detention center, or other institution.

Supplemental data collections

High School Survey (1980). A supplemental survey of the last secondary school attended by civilian NLSY79 respondents was conducted in 1980. This survey gathered information on each school's grading system, course offerings, dropout rate, student body composition, and faculty characteristics, as well as respondent scores from a variety of intelligence and aptitude tests. Copies of the high school survey instruments, the "School Questionnaire" and the "Student's School Record Information" form, are included within the documentation item called the NLSY High School Transcript Survey: Overview and Documentation

Transcript Surveys (1980-83). Transcript information on up to 64 courses was collected from high school records for civilian NLSY79 respondents who were expected to complete high school within the United States. A copy of the instrument used to collect transcript information, called the "Transcript Coding Sheet," is included within the NLSY High School Transcript Survey: Overview and Documentation.

ASVAB. The Armed Services Vocational Aptitude Battery (ASVAB) was administered to most NLSY79 respondents in 1980 as part of a Department of Defense effort to renorm this military enlistment test. The scores from this supplemental data collection are included in the NLSY79 data file. For details, see the Aptitude, Achievement & Intelligence Scores section.

Interviewer's Reference Manual (Question-by-Question [Q by Q] specifications)

Each questionnaire or set of survey instruments is accompanied by an Interviewer's Reference Manual. This document provides NORC interviewers with background information on the NLSY79 and detailed question-by-question instructions for administering and coding the questionnaire, Employer Supplement, Household Interview Forms, and other survey supplements. Separate Q by Q's exist for each survey year. Printed copies of the CAPI help screen information, which each interviewer could access during the course of the interview, replace the traditional interviewer's manual instrument beginning with the 1993 release.

Environmental Variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • Scroll right to view additional table columns.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Region of residence * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current residence urban or rural * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current residence in metropolitan statistical area * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Changes in residence since January 1, 1978, or date of last interview (collected as a history) * *   *                             * * * * * * * * * * *

Human Capital and Other Socioeconomic Variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • Scroll right to view additional table columns.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Nationality and birthplace *       *                                                
Birth date *   *                                                    
Ethnic self-identification (revised 2002) *                                     *                  
Year foreign-born R entered the United States         *             *                                  
Month and year R entered the United States to live for at least 6 months *                     *                                  
Immigration or visa status                       *                                  
Religious affiliation, frequency of attendance *     *                             *                    
Periods lived away from parents (birth to age 18) *                 *                                      
Non-English language spoken when R was a child *                                                        
Were magazines, newspapers, or library cards available in home when R was age 14 *                                                        
Person(s) R lived with at age 14 *                     *                                  
Occupations of primary adults when R was 14 *                                                        
Birthplace of parents: State or country *                                                        
Highest grade completed by father and mother *                                                        
Employment status of father and mother in past year * *                                                      
Are R's parents living * *                               * * * * * * * * * * * *
R's biological parents---life status, health, cause of death (40+/50+/60+ health modules)                                   * * * * * * * * * * * *

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Current enrollment status, date of last enrollment * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Highest grade completed * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Reason stopped attending school * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Highest degree and date received                   * * * * * * * * * * * * * * * * * * * *
Is or was school public or private *                                                        
High school curriculum * * * * * * *                                            
Comparison of high school courses to skills training                             *                            
College degree received * * * * * *       * * * * * * * * * * * * * * * * * * * *
Type of college attending (2- or 4-year) * * * * * * * *   * * *   * * * * * * * * * * * * * * * *
Field of study or specialization in college * * * * * * * *   * * *   * * * * * * * * * * * * * * *  
College tuition *                                                        
Educational loans or financial aid in college * * * * * * * *   * * *   * * * * * * * * * * * * * * * *
Attitude toward selected aspects of high school *                                                        
Courses taken during last year of high school *                                                        
Ever suspended or expelled from school; date   *                                                      

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Type(s) of training * * * * * * * *   * * * * * * * * * * * * * * * * * * * *
Number of weeks, hours per week in training * * * * * * * *   * * * * * * * * * * * * * * * * * * * *
Was training completed * * * * * * * *   * * * * * * * * * * * * * * * * * * * *
Was degree, certificate, or journeyman's card obtained * *                                                      
Was training related to specific job or employer       * * * * *   * * * * * * * * * * * * * * * * * * * *
Was training related to a promotion                       * * * * *                          
Reason for training       * * *             * * * * * * * * * * * * * * * *  
Method of financing training       * * *       * * * * * * * * * * * * * * * * * * * *
Informal job learning activities (questions vary)                             * * * * * * * * * * * * * * *

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Participation in programs * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type of program * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Satisfaction with program * * * * * * * * *                                        
Did program help on subsequent jobs * * * * * * * * *   * * * * * * * *                      
Services provided by program * * * * * * * * *                                        
Length of participation in program * * * * * * * *   * * * * * * * * * * * * * * * * * * * *
Hours per week and per day spent in program * * * * * * * *   * * * * * * * * * * * * * * * * * * * *
Amount of income from participating in program * * * * * * * *                                          
Aspects liked most and least about programs *                                                        
Reasons for entering and leaving programs * * * * * * * * *                                        

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Does health limit work, duration of limitation * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type of health problem (ICD-9 code) * * * *                                                  
Work-related injury or illness (ICD-9 code)                   * * *   * * * * * *                    
Height     * *     *                             * * * * * * *  
Weight     * *     * *   * * * * * * * * * * * * * * * * * * * *
Health insurance coverage---R, spouse, children                     * *   * * * * * * * * * * * * * * * *
Frequency and intensity of R's physical activity                                   * * * * * * * * * * * *
R's general health behaviors                                       * * * * * * * * * *
General perception of health (40+/50+/60+ health modules)                                   * * * * * * * * * * * *
Does health interfere with daily activities (40+/50+/60+ health modules)                                   * * * * * * * * * * * *
Emotional health in past 4 weeks (40+/50+/60+ health modules)                                   * * * * * * * * * * * *
CES-Depression Scale                           *   *   * * * * * * * * * * * *
R's various health problems (heart problems, cancer, diabetes, poor eyesight or hearing, and so forth) (40+/50+/60+ health modules)                                   * * * * * * * * * * * *
Time spent on healthcare activities (40+/50+ health modules)                                   * * * * * * * * * *    
Diagnosed with asthma (40+/50+ modules)                                   * * * * * * * * * *    
Diagnosed with Alzheimer's/dementia (60+ health module)                                                       * *
Satisfaction With Life Scale/SWLS (60+ health module)                                                       * *
General Anxiety Disorder/GAD scale (60+ health module)                                                       * *
Brief Resiliance Scale/BRS (60+health module)                                                       *  
Cognition                                           * * * * * * * *
National Death Index data                                                         *

All spouse items also refer to partners beginning in 1994.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Dating behaviors and attitudes (unmarried females)                   *       *   * * * * * * * * * * *      
Marital status * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Changes in marital status since 1/1/1978 or previous interview; number and duration of marriages * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Month, year R and partner began living together                       * * * * * * * * * * * * * * * * * *
Did R and spouse live together continuously before marriage (or R and partner continuously until now)                       * * * * * * * * * * * * * * * * * *
Changes in cohabitation with partner since last interview                                       * * * * * * * * * *
Occupation of spouse * * * * * * * * * * * * * * * * * * * * * * * * * * * *  
Race of Spouse                                             * * * * * * *
Extent spouse worked in previous calendar year * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Current labor force status, reason not employed for spouse                   * * * * * * * * * * * * * * * * * * * *
Shift worked by spouse       *           * * * * * * * * * * * * * * * * * * *  
Rate of pay, hourly rate of pay of spouse                       * * * * * * * * * * * * * * * * *  
Spouse/partner's religious affiliation and attendance       *                             * * * * * * * * * * *
Number of spouse's marriages, details       *                           * * * * * * * * * * * *
Effect of spouse's health on R's work       *                                                  
Quality of R's relationship (14 items) (mothers in 1988; females all other years)                   *       *   * * * * * * * * * * * *    
Age at which R expects to marry *                                                        

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Relationship of household or family members to R * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Household or family members' demographics (sex, age, highest grade completed, work status in past year) * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Number of dependents or exemptions * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Number and ages of R's children living in household * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Expected number of children *     * * * * *   *   *   *   * * * * * * * * * * *      
Number of children R considers ideal *     *                                                  
Healthcare during pregnancy (females)         * * * *   *   *   *   * * * * * * * * * * * * *  
Postnatal infant healthcare and feeding (females)         * * * *   *   *   *   * * * * * * * * * * * * * *
Father's relationship with children (males)                                   * *                    
Fertility history * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Use of birth control methods       *   * * *   *   *   *   * * * * * * * * * * * *    
Pregnancies not resulting in live births (includes how ended through 1990)       * * * * *   *   *   *   * * * * * * * * * * * * * *
Characteristics of children with asthma                                         * * * * * *      

Asked of female respondents only in even years after 1986.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Current childcare arrangements       * * * * * * *                                      
Childcare during first 3 years of life               *   *       *   * * * * * * * * * * *      
Cost per week       *     * *   *                                      
Number of hours per week       * * * * *   *                                      
Is childcare a hindrance to R's work, school, or training       * * *       * *                                    
Extent of various neighborhood problems                           *   * * * *                    

All spouse items also refer to partners beginning in 1994.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Total family income in previous calendar year * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Income of R and spouse in previous calendar year from: Farm or own business * * * * * * * * * * * * * * * * * * *   * * * * *   * * *
Income of R and spouse in previous calendar year from: Wages or salary * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Income of R and spouse in previous calendar year from: Business or Professional Practice Investment or Ownership                                         *   *   *   * * *
Unemployment compensation * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Public assistance * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Food Stamps * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Targeted cash or noncash benefits                                   * *                    
Pensions/Social Security * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Military service * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Veterans' benefits, workers' compensation, other disability (collected separately beginning in 2002) * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Other sources * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
R receives government rent subsidy or public housing * * * * * * * * * * * * * * * * * * * * * * *   *   *   *
Income from child support * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Child support expected vs. received                             * * * * *                    
Rights to estate or trust; income from inheritances (since last interview)                                     * * * * * * * * * * *
R claimed Earned Income Tax Credit (EITC) on previous tax return, amount                                     * * * * * * * * * * *
Possession of various assets (R and spouse)   * * * * * * * * * * *   * * * * * *   *   *   *   *   *
Asset market value (R and spouse)             * * * * * *   * * * * * *   *   *   *   *   *
Amount of debt             * * * * * *   * * * * * *   *   * * * * * * *
Amount spent on food, other than Food Stamps                       * * * * *           *              
Effect of 1996 welfare reform on R (shorter in 2000)                                   * *         *          
R receives targeted benefits from public assistance program (gas vouchers, childcare, and so forth)                                     *                    
R ever declare bankruptcy                                         *   * * * * *   *
Home foreclosure                                               * * * * *  
R has a will                                                 * * * *  
Financial literacy                                                 * * * *  
Educational expenditures                                                   *      
Effects of Coronavirus outbreak on earnings                                                         *
R and spouse receive Coronavirus stimulus check                                                         *

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Branch of Armed Forces * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Months spent in Armed Forces * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Military occupation(s) * * * * * * *                                            
ROTC or officer training *                                                        
Reserve or guard activities * * * * * * *                                            
Pay grade and income * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Type and amount of military training * * * * * * *                                            
Does R use military skills on civilian job * * * * * * * * * * * * * * * * * * * * * * * * * * * *  
Formal education received while in service * * * * * * *                                            
Family members who have served on active duty         *                                                
Participation in Veteran's Educational Assistance Program (VEAP) (after 1985, with GI bill) * * * * * * * * * * * * * * * * * * * * * * * * * * *    
Attitude toward military service * * * * * * *                                            
Future military plans * * * * * * *                                            
Reason for entering and leaving military   * * * * * *                                            
Contact with military recruiters * * * * * * *                                            
Type of discharge   * *                                                    
Enlistment or reenlistment bonuses received * * * * * * *                                            
Civilian job offer at time of discharge   * * * * * *                                            
Return to same employer after active duty   * * * * * *                                            

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Would R like more education or training; type *                                                        
How much education desired and actually attained *   * *                                                  
Kind of work R would like to be doing at age 35 *     *                                                  
Expectation of achieving occupational goal *     *                                                  

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Knowledge of World of Work score *                                                        
Would R work if had enough money to live on *                                                        
Characteristics of job R is willing to take (R unemployed or out of labor force) * * * * * * * *                                          
Reaction to hypothetical job offers *                                                        
Internal-External Locus of Control Scale (Rotter) *                                                 * *    
Mastery Scale (Pearlin)                           *                              
Attitude toward women working *     *         *                       *                
Self-Esteem Scale (Rosenberg) (10 items)   *             *                         *              
CES-Depression Scale                           *   *   * * * * * * * * * * * *
Person having most influence on R, his or her responses to various situations *                                                        
Retirement expectations                                         * * * * * * * *  
R risk aversion questions                                               * * *      
Ten-Item Personality Inventory (TIPI)                                                   * * *  
Life satisfaction                                                   * * * *

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Perception of age, race, and sex discrimination *     *                                                  
Reason for problems in obtaining employment *     *                                                  

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Activities within last year (20 items)   *                                                      
Income from illegal activities within last year   *                                                      
Alcohol consumption in last week or month       * * * *     * *     *   *       *   * * * * *   * *
Extent of cigarette use           *               *   *   *         * * * *   * *
Age R first smoked and stopped smoking cigarettes           *
(first smoked only)
              *   *   *                      
Extent of marijuana use   *       *       *       *   *   *                      
Age R first used marijuana           *       *       *   *   *                      
Extent of cocaine use, age R first used           *       *       *   *   *                      
Extent of "crack" cocaine use, age R first used                           *   *   *                      
Ever used sedatives, barbiturates, and so forth           *               *   *   *                      
Cigarette and alcohol use during pregnancy         * * * *   *   *   *   * * * * * * * * * * * * *  
Marijuana and cocaine use during pregnancy                   *   *   *   * * * * * * * * * * * * *  

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Number of times stopped by police   *                                                      
Number of times booked or arrested   *                                                      
Number of convictions, charges   *                                                      
Number of times incarcerated; date of release   *                                                      

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Use of time at various activities (school, work, watching TV, household chores, and so forth)     *                                                    
Volunteerism/Philanthropy                                           *   * * *      

Labor Market Experience Variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • Scroll right to view additional table columns.

Beginning in 1994, characteristics of the current or most recent job were collected in the first Employer Supplement loop, rather than in the CPS section. To maintain consistency, these questions are still included in this section.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Survey week labor force and employment status * * * * * * * * * * * * * * * * * *       *              
Occupation (DOT code) *                                                        
Hours worked in survey week * * * * * * * * * * * * * * * * * *       *              
Hours per week usually worked * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Shift worked * * * * * * *     * * * * * * * * * * * * * * * * * * * *
Promotion (varies with year)           *       * * *         * * * * * * * * * * *    
Commuting time to current job * * *             *         * *                          
Availability of benefits (beginning in 1994 for multiple jobs) * *   * * * * * * * * * * * * * * * * * * * * * * * * * *
Global job satisfaction item * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Job satisfaction scale * * * *           *                                      
Job characteristics inventory *     *                                                  
Size of employer * *           * * * * * * * * * * * * * * * * * * * * * *
Minority status of coworkers (1980, 1982), supervisor   *   *                                                  
Time R expects to stay at job * * * *                                                  
Participation in work-study program * * * * * * * *                                          

Work experience since Jan. 1, 1978, or previous survey, or in past calendar year.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Weeks worked * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Hours usually worked per week * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Number of weeks, spells of unemployment * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Weeks out of labor force * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

  • Characteristics of jobs since Jan. 1, 1978, or last survey, including current or most recent job of more than 10 to 20 hours per week and more than 9 weeks in duration if not a CPS job.
  • Beginning in 2002, the questionnaire includes separate sets of questions for self-employed respondents and respondents with nontraditional employment arrangements. The information collected is very similar to the regular employment questions, but wordings may vary to accommodate different situations. The three types of employer questions are not represented separately in the table.

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Occupation and industry (Census code) * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Class of worker * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Start date and stop date * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Hours usually worked at home                   * * * * * * * * * * * * * * * * * * * *
Shift worked                               * * * * * * * * * * * * * *
Rate of pay, hourly rate of pay * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Covered by collective bargaining * * * * * * * * * * * * * * *                            
Is R union member *                 * * * * * * * * * * * * * * * * * * * *
Reason for leaving job * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Severance pay received                               * * * * * * * * * * * * * *
Availability of benefits (CPS job) (all jobs since 1994)             * * * * * * * * * * * * * * * * * * * * * * *
Characteristics of employer's pension plan                               * * * * * * * * * * * * * *
Is employer exempt from Social Security; does another plan replace it                                       * * * * * * * * * *
Global job satisfaction item                               * * * * * * * * * * * * * *
Promotion and promotion potential with employer           *       * * *         * * * * * * * * * * *    
Size of employer                                 * * * * * * * * * * * * *
Sex of supervisor and coworkers   *   *                         * *                      
Is R a temporary or contractual worker                               * * *         * * * * * * *
Effects of COVID-19 on job                                                         *

Variable

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 98 00 02 04 06 08 10 12 14 16 18 20
Job search activities and (some years) intentions * * * * * * * * * * * * * * * * * * *     *              
R looking for work or employed when found current or most recent job       *                       * * * *                    
Methods of job search       *   * * * * * * * * * * * * * *     *              
Job offers rejected (while looking for each job)       *       * * *           * * * *                    
Desired characteristics of job sought       *   * * *                                          
Subscribe to NLSY79