Health conditions related to work limitations (1979-2000)
The health sections of the NLSY79 surveys for most interview years between 1979-2000 elicited reports of health problems that limited the amount or kind of work the respondent could do. These health problems are coded using a modified version of the International Classification of Diseases (ICD-9) codes taken from the World Health Organization, International Classification of Diseases, Ninth Revision, 2 vols., WHO, Geneva, 1977 (vol. 1) and 1978 (vol. 2).
The health consolidation codes present in 1979, 1981 and 1982 represent the most complete description possible of the respondent's main cause of limitation in work gathered from a reading of the whole health section. They were also coded using the modified ICD-9 codes after the rest of the health section has been coded. The health consolidation codes, thus, represent the (proximal) description of the respondent's main cause of limitation in work, and not an underlying cause (remote) preceding the current complaint.
As originally coded, the untruncated NLSY79 health codes corresponded to the ICD-9 codes with the decimal point deleted, except for the supplemental V and E ICD-9 codes. The supplemental ICD-9 codes were modified by dropping the alpha code and adding the remaining numeric portion of the supplemental codes to a larger number. The following numeric conversion was made for the supplemental codes: V codes = 10000+ and E codes = 11000+. (Zero (0) means no health problem). The changes were necessary because alphas are not used as variable values in the NLSY79 data. These ICD-9 codes have then been collapsed further by truncating them by one digit.
A link to the ICD-9 is included below. The NLSY79 modified codes can be tracked back to general disease classifications in the ICD-9 scheme. The supplemental E and V codes and the medical terms associated with them appear in Tables 2-3 below. For more detail, see the International Classification of Diseases, Ninth Revision.
40+, 50+ and 60+ biological parent and respondent health conditions (1998-2022)
With the addition of the 40+ Health Module in 1998, questions collecting health conditions for the biological parents of respondents have been asked of the appropriate cohort each round. This has continued with the subsequent 50+ and 60+ Health Modules. In the 50+ and 60+ Health Modules, respondents also have had the opportunity to report any other information they wanted to about their own health.
The major health conditions of respondents’ biological parents collected in the 40+, 50+ and 60+ Health Modules are coded using the CDC, National Vital Statistics System 113 List, contained in Table 1 below. These are taken from the National Vital Statistics Reports, Volume 65, no 2 (2/16/2016), and can be found online on the CDC website.
Parts of body affected by work-limiting health conditions and reported cancer (1979-2022)
A set of numeric codes for parts of the body was developed by NORC for the 1979-1981 health sections. These codes are contained in Table 4. They have been used to code parts of the body affected by work-related injuries/illnesses mentioned above. Additionally, respondents have been asked to report cancer diagnoses and parts of the body affected by these cancers in the 40+, 50+ and 60+ Health Modules. The same codes have been applied to parts of the body affected by cancer.
Table 1. CDC – National Vital Statistics System 113 List (A conversion crosswalk from the ICD-10 health codes and the National Vital Statistics System 113 List codes can be accessed in the file ICD-10 to National Vital Statistics 113 List Crosswalk (XLSX).
Table 1. CDC – National Vital Statistics System 113 list
Code
Description
1
Salmonella infections
2
Shigellosis and amebiasis
3
Certain other intestinal infections
4
Tuberculosis: Respiratory tuberculosis
5
Tuberculosis: Other tuberculosis
6
Whooping cough
7
Scarlet fever and erysipelas
8
Meningococcal infection
9
Septicemia
10
Syphilis
11
Acute poliomyelitis
12
Arthropod-borne viral encephalitis
13
Measles
14
Viral hepatitis
15
Human immunodeficiency virus (HIV) disease
16
Malaria
17
Other and unspecified infectious and parasitic diseases and their sequelae
18
Malignant neoplasms: Malignant neoplasms of lip, oral cavity and pharynx
19
Malignant neoplasms: Malignant neoplasm of esophagus
20
Malignant neoplasms: Malignant neoplasm of stomach
21
Malignant neoplasms: Malignant neoplasms of colon, rectum and anus
22
Malignant neoplasms: Malignant neoplasms of liver and intrahepatic bile ducts
23
Malignant neoplasms: Malignant neoplasm of pancreas
24
Malignant neoplasms: Malignant neoplasm of larynx
25
Malignant neoplasms: Malignant neoplasms of trachea, bronchus and lung
26
Malignant neoplasms: Malignant melanoma of skin
27
Malignant neoplasms: Malignant neoplasm of breast
28
Malignant neoplasms: Malignant neoplasm of cervix uteri
29
Malignant neoplasms: Malignant neoplasms of corpus uteri and uterus, part unspecified
30
Malignant neoplasms: Malignant neoplasm of ovary
31
Malignant neoplasms: Malignant neoplasm of prostate
32
Malignant neoplasms: Malignant neoplasms of kidney and renal pelvis
33
Malignant neoplasms: Malignant neoplasm of bladder
34
Malignant neoplasms: Malignant neoplasms of meninges, brain and other parts of central nervous system
35
Malignant neoplasms: Malignant neoplasms of lymphoid, hematopoietic and related tissue: Hodgkin’s disease
36
Malignant neoplasms: Malignant neoplasms of lymphoid, hematopoietic and related tissue:Non-Hodgkin’s lymphoma
37
Malignant neoplasms: Malignant neoplasms of lymphoid, hematopoietic and related tissue:Leukemia
38
Malignant neoplasms: Malignant neoplasms of lymphoid, hematopoietic and related tissue:Multiple myeloma and immunoproliferative neoplasms
39
Malignant neoplasms: Malignant neoplasms of lymphoid, hematopoietic and related tissue:Other and unspecified malignant neoplasms of lymphois, hematopoietic and related tissue
40
All other and unspecified malignant neoplasms
41
In situ neoplasms, benign neoplasms and neoplasms of uncertain or unknown behavior
42
Anemias
43
Diabetes mellitus
44
Nutritional deficiencies: Malnutrition
45
Nutritional deficiencies: Other nutritional deficiencies
46
Meningitis
47
Parkinson’s disease
48
Alzheimer’s disease
49
Major cardiovascular diseases: Diseases of heart: Acute rheumatic fever and chronic rheumatic heart disease
50
Major cardiovascular diseases: Diseases of heart: Hypertensive heart and disease
51
Major cardiovascular diseases: Diseases of heart: Hypertensive heart and renal disease
52
Major cardiovascular diseases: Ischemic heart diseases: Acute myocardial infarction
53
Major cardiovascular diseases: Ischemic heart diseases: Other acute ischemic heart diseases
54
Major cardiovascular diseases: Ischemic heart diseases: Other forms of chronic ischemic heart diseases: Atherosclerotic cardiovascular disease, so described
55
Major cardiovascular diseases: Ischemic heart diseases: Other forms of chronic ischemic heart diseases: All other forms of chronic ischemic heart disease
56
Major cardiovascular diseases: Other heart diseases: Acute and subacute endocarditis
57
Major cardiovascular diseases: Other heart diseases: Diseases of pericardium and acute myocarditis
58
Major cardiovascular diseases: Other heart diseases: Heart failure
59
Major cardiovascular diseases: Other heart diseases: All other forms of heart disease
60
Essential hypertension and hypertensive renal disease
61
Cerebrovascular diseases
62
Atherosclerosis
63
Other diseases of circulatory system: Aortic aneurysm and dissection
64
Other diseases of circulatory system: Other diseases of arteries, arterioles and capillaries
65
Other disorders of circulatory system
66
Influenza and pneumonia: Influenza
67
Influenza and pneumonia: Pneumonia
68
Other acute lower respiratory infections: Acute bronchitis and bronchiolitis
69
Other acute lower respiratory infections: Other and unspecified acute lower respiratory infections
70
Chronic lower respiratory diseases: Bronchitis, chronic and unspecified
71
Chronic lower respiratory diseases: Emphysema
72
Chronic lower respiratory diseases: Asthma
73
Chronic lower respiratory diseases: Other chronic lower respiratory diseases
74
Pneumoconioses and chemical effects
75
Pneumonitis due to solids and liquids
76
Other diseases of respiratory system
77
Peptic ulcer
78
Diseases of appendix
79
Hemia
80
Chronic liver disease and cirrhosis: Alcoholic liver disease
81
Chronic liver disease and cirrhosis: Other chronic liver disease and cirrhosis
82
Cholelithiasis and other disorders of gallbladder
83
Nephritis, nephrotic syndrome and nephrosis: Acute and rapidly progressive nephritic and nephrotic syndrome
84
Nephritis, nephrotic syndrome and nephrosis: Chronic glomerulonephritis, nephritis and nephropathy not specified as acute or chronic and renal sclerosis unspecified
85
Nephritis, nephrotic syndrome and nephrosis: Renal failure
86
Nephritis, nephrotic syndrome and nephrosis: Other disorders of kidney
87
Infections of kidney
88
Hyperplasia of prostate
89
Inflammatory diseases of female pelvic organs
90
Pregnancy, childbirth and the puerperium: Pregnancy with abortive outcome
91
Pregnancy, childbirth and the puerperium: Other complications of pregnancy, childbirth and the puerperium
92
Certain conditions originating in the perinatal period
93
Congenital malformations, deformations and chromosomal abnormalities
94
Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
95
All other diseases
96
Accidents (unintentional injuries):Transport accidents: Motor vehicle accidents
97
Accidents (unintentional injuries):Transport accidents: Other land transport accidents
98
Accidents (unintentional injuries): Transport accidents: Water, air and space, and other and unspecified transport accidents and their sequelae
Accidents (unintentional injuries): Nontransport accidents: Accidental discharge of firearms
101
Accidents (unintentional injuries): Nontransport accidents: Accidental drowning and submersion
102
Accidents (unintentional injuries): Nontransport accidents: Accidental exposure to smoke, fire and flames
103
Accidents (unintentional injuries): Nontransport accidents: Accidental poisoning and exposure to noxious substances
104
Accidents (unintentional injuries): Nontransport accidents: Other and unspecified nontransport accidents and their sequelae
105
Intentional self-harm (suicide): Intentional self-harm (suicide) by discharge of firearms
106
Intentional self-harm (suicide): Intentional self-harm (suicide) by other and unspecified means the their sequelae
107
Assault (homicide): Assault (homicide) by discharge of firearms
108
Assault (homicide): Assault (homicide) by other and unspecified means the their sequelae
109
Legal Intervention
110
Events of undetermined intent: Discharge of firearms, undetermined intent
111
Events of undetermined intent: Other and unspecified events of undetermined intent and their sequelae
112
Operations of war and their sequelae
113
Complications of medical and surgical care
Modified ICD-9 codes
The ICD-9 can be found on the CDC website as well as through many other online resources. With the exception of the E and V codes (see tables below), the truncated codes contained in the NLSY79 data for health conditions affecting ability to work map directly to general disease classifications in the ICD-9 codes.
Table 2. Modified V codes: Supplementary classification of factors influencing health status and contact with health services
Codes
Description
1001 - 1007
Persons with potential health hazards related to communicable diseases
1010 - 1019
Persons with potential health hazards related to personal and family history
1020 - 1028
Persons encountering health services in circumstances related to reproduction and development
1030 - 1039
Healthy liveborn infants according to birth type
1040 - 1049
Persons with a condition influencing their health status
1050 - 1059
Persons encountering health services for specific procedures and aftercare
1060 - 1068
Persons encountering health services in other circumstances
1070 - 1082
Persons without reported diagnosis encountered during examination and investigation of individuals and populations
Table 3. Modified E codes: Supplementary classification of external causes of injury and poisoning
Codes
Description
1180 - 1180
Railway accidents
1181 - 1181
Motor vehicle traffic accidents
1182 - 1182
Motor vehicle nontraffic accidents
1182 - 1182
Other road vehicle accidents
1183 - 1183
Water transport accidents
1184 - 1184
Air and space transport accidents
1184 - 1184
Vehicle accidents not elsewhere classifiable
1185 - 1185
Accidental poisoning by drugs, medicaments and biologicals
1186 - 1186
Accidental poisoning by other solid and liquid substances, gases and vapours
1187 - 1187
Misadventures to patients during surgical and medical care
1187 - 1187
Surgical and medical procedures as the cause of abnormal reaction of patient or later complication
1188 - 1188
Accidental falls
1189 - 1189
Accidents caused by fire and flames
1190 - 1190
Accidents due to natural and environmental factors
1191 - 1191
Accidents caused by submersion, suffocation and foreign bodies
1191 - 1192
Other accidents and late effects of accidental injury
1193 - 1194
Drugs, medicaments and biological substances causing adverse effects in therapeutic use
1195 - 1195
Suicide and self-inflicted injury
1196 - 1196
Homicide and injury purposely inflicted by other persons
1197 - 1197
Legal intervention
1198 - 1198
Injury undetermined whether accidentally or purposely inflicted
1199 - 1199
Injury resulting from operations of war
Table 4. Codes for parts of body in health section
Code
Description
01
Brain, CNS, spinal cord
02
Peripheral nervous system
03
Emotions, "nerves"
04
Heart
05
Blood, spleen
06
Vascular system
07
Lymphatic system, lymph glands
08
Pituitary gland
09
Thyroid gland
10
Adrenal gland
11
Other endocrine glands; endocrine system; pineal gland; parathyroid gland; thymus
12
Eye
13
Vision
14
Lacrimal gland and duct
15
Eyelid
16
Ear (inner and outer)
17
Hearing
18
Nose
19
Smell
20
Tonsils and adenoids
21
Sinus
22
Vocal cords, larynx
23
Speech
24
Throat, pharynx
25
Lung, trachea and bronchi
26
Breathing
27
Mouth and tongue
28
Gums
29
Teeth
30
Esophagus
31
Stomach
32
Upper digestive tract
33
Liver
34
Biliary tract
35
Gallbladder
36
Pancreas
37
Abdomen
38
Upper abdomen
39
Lower abdomen
40
Intestine and colon
41
Rectum
42
Anus
43
Lower digestive tract
44
Digestive system
45
Kidneys
46
Bladder
47
Prostate
48
Other genitourinary tract; urethra; ureter
49
Penis
50
Other male reproductive system; scrotum; vas deferens; testes
51
Breast, nipple
52
Vulva, clitoris
53
Vagina, cervix, uterus
54
Other female reproductive system; fallopian tubes; ovaries
55
Menstruation
56
Skin
57
Hair
58
Scalp
59
Nails
60
Head, skull
61
Face, forehead, lips
62
Jaw
63
Chin
64
Neck, cervical vertebrae
65
Back, dorsal spine
66
Low back, lumbar spine
67
Trunk
68
Chest
69
Chest wall, external chest; axilla
70
Collarbone
71
Ribs
72
Side, flank
73
Shoulder
74
Arm
75
Upper arm
76
Elbow
77
Lower arm
78
Wrist
79
Hand (palm)
81
Fingers
82
Pelvis
83
Groin
84
Buttocks
85
Hip
86
Leg
87
Upper leg, thigh
88
Knee, kneecap
89
Lower leg
90
Ankle
91
Foot
92
Toes
93
Muscles, tendons, ligaments NOS (not otherwise specified)
94
Bone(s) NOS (not otherwise specified)
95
Joints NOS (not otherwise specified)
96
"Entire body"
80
Other NOS (not otherwise specified)
National Death Index (NDI) data
The current 1979-2022 NLSY79 data release contains information regarding cause, dates and location of death for deceased respondents for whom a matching death certificate was returned from an NDI search. Most of these variables are limited to the geocode and zipcode releases. Data for a subset of NLSY79 respondents was submitted for an NDI search. The subset included respondents identified as deceased during survey field periods and respondents who have proven difficult to locate or have not been interviewed for a period of time with no confirmation of their status. In order to maximize the possibility of an NDI death certificate match, multiple submissions were made for individual respondents whenever possible. These multiple submissions could include elements such as maiden names, different married names, nicknames and various multiple ethnicities reported by the respondent, any of which might appear in various combinations on a death certificate. Individual respondent records, interviewer notes and administrative data were examined in conjunction with NDI search results to determine those for which valid matches could be established. Related variables are found in the NDI VERIFICATION area of interest. Table 1 depicts the coding scheme used for the underlying cause of death, found on the current data release. See also the Health section for information on NDI-related variables.
2000 Census 3-Digit Industry and Occupation Codes (PDF). The 2000 Census codes were used to code industry and occupation for all jobs in the 2002 NLSY79 survey. Census published slightly revised codes in 2002, and these revised codes were used to code all jobs in the 2004 survey. Census issued another revision in 2003, and these codes were used for the 2006 survey. This attachment lists the 2000 codes, followed by the 2002 and 2003 codes; the 2002 tables note the slight differences from the 2000 list.
This section describes these three primary components of the NLSY79 codebook system and discusses the important types of information found within each. An additional codebook supplement exists for the Geocode data file.
Codebooks
The codebook is the principal element of the NLSY79 documentation system and contains information intended to be complete and self-explanatory for each variable in a data file. The software accompanying the NLSY79 data sets allows easy access to each variable's codebook information and permits the user to print a codebook extract for preselected variables.
Every variable is presented within the NLSY79 documentation as a block of information called a "codeblock." Each codeblock entry depicts the following important information:
reference number
variable title
coding information
frequency distribution
location within the data file
reference to the questionnaire item or source of the variable
information on the derivation of created variables
Users will find that NLSY79 CAPI codeblocks present greater detail on each variable, including universe totals, universe skip patterns, and range of acceptable values information. Each of these terms is described more completely below. Codeblocks for many variables include special notes containing additional information designed to assist in the accurate use of data from that variable.
Codebooks are arranged in reference number order. As a general rule, raw questionnaire items appear first for a given survey year, followed by items from such instruments as the Information Sheet and Employer Supplement. Variables from the main body of the questionnaire are followed by created or constructed variables drawn from an external data source, such as the County & City Data Book.
Beginning with the 1993 CAPI surveys, questions relating to each job/employer, which were formerly located within the unique Employer Supplements, are merged with the main questionnaire items. A comparison of the reference number assignments used for the 1988 PAPI and 1993 CAPI variables appear in Tables 1 and provide users with a sample set of reference numbers. Users should note that not all survey year assignments will be ordered in precisely this manner.
Table 1. NLSY79 1988 and 1993 reference number assignment
Description
1988 PAPI Rnum
1993 CAPI Rnum
All Raw, Edited and Created Variables
R25000.-R28927.
R41001.-R44308.
Questionnaire Items
R25000.-R27467.
R41001.-R43988. (including the Employer Supplement series) Note 1.1
Note: PAPI refers to paper-and-pencil interviews which were conducted with the NLSY79 during 1979-92. CAPI or computer-assisted personal interviews began for the full NLSY79 cohort in 1993.
Note 1.1: Beginning in 1993, variables from the employer supplement series are included within the raw questionnaire items.
Note 1.2: The childhood residence retrospective was unique to 1988.
The following figures give users an example of codebook pages before (Figure 1) and after (Figure 2) CAPI implementation.
Figure 1. NLSY79 sample PAPI codeblock
Figure 2. NLSY79 sample CAPI codeblock
Coding information
Each codeblock entry presents the set of legitimate codes that a variable may assume along with a text entry describing the codes.
Dichotomous variables
Dichotomous or yes/no variables that are uniformly coded "Yes" = 1, "No" = 0. Other dichotomous variables have frequently been reformulated to permit this convention to be followed.
Discrete variables
Discrete (categorical), as in the case of the categories in 'Activity Most of Survey Week CPS Item':
WORKING
WITH A JOB, NOT AT WORK
LOOKING FOR WORK
KEEPING HOUSE
GOING TO SCHOOL
UNABLE TO WORK
OTHER
Continuous variables
Continuous (quantitative), as in the case of hourly rate of pay in the example above. These variables have continuous data but are presented in the codebook using a convenient frequency distribution. NLSY79 users will note that most valid data are positive numbers. Special cases are flagged by negative numbers in the NLSY79. See Appendix 13: Intro to CAPI Questionnaires and Codebooks in the NLSY79 Codebook Supplement for more detail on the handling of negative numbers in the data files. The following conventions have been used throughout the data:
Noninterview -5
Valid Skip -4
Invalid Skip -3
Don't Know -2
Refusal -1
Important information: Coding information
Coding information for a given variable in the NLSY79 codeblock is:
not necessarily consistent with the codes found within the questionnaire, and
not necessarily consistent for the same variable across years. Use only the codebook coding information for analysis.
Frequency distribution
In the case of discrete (categorical) variables, frequency counts are normally shown in the first column to the left of the code categories. In the case of continuous (quantitative) variables, a distribution of the variable is presented using a convenient class interval. The format of these distributions varies.
Derivations
The decision rules employed in the creation of main file constructed variables have been included, whenever possible, in the codebook under the title "DERIVATIONS." This information enables researchers to determine whether available constructs are appropriate to their needs. In the case of the example NLSY79 variable in Figure 1, no derivation is shown because these variables are picked up directly from the interview schedule. Certain variables will contain a reference to an appendix for the decision rules that were used in creating the variable.
Questionnaire item
"Questionnaire item" is a generic term identifying the printed source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item appearing within one of the survey instruments.
The questionnaire location for NLSY79 entries appears either in parentheses or brackets directly after the reference number, for example R04434. (SO6D1314). The five questionnaire item numbering conventions used in the codebook are described in the Survey Instruments section (see especially Table 2).
Before the adoption of CAPI if an NLSY79 variable was not taken directly from one of the survey instruments, the questionnaire location contained an asterisk (*) in the codebook. The following categories of variables had no questionnaire numbers:
assigned identification numbers for the respondent, child, or family unit;
all derived or constructed variables;
variables from the following special surveys: Profiles (ASVAB), the School Survey, and the Transcript Survey;
variables found on constructed data files such as the Supplemental Fertility File (area of interest "Fertility and Relationship History/Created"); and
variables drawn from an external data source such as those found on the Geocode files.
In CAPI years, survey staff assign a question name that is not used in the questionnaire. This name remains the same in subsequent rounds, so similar created variables can be easily located.
Section, deck, and question numbers have been somewhat arbitrarily assigned to the information and questions found in special survey instruments such as the Household Screener, Information Sheet, Children's Record Forms, Household Interview Forms, and the Employer Supplements. The section and deck numbers for these special survey items were numbered sequentially after the main survey items and their specific order varies each year. The exception to this is the assignment of the deck numbers for the Employer Supplements. Question numbering is discussed earlier in the Survey Instruments section (see especially Table 3).
Universe information
Universe information was attached to select 1979-92 variables. Beginning with the 1993 CAPI interviews, the amount of universe information was expanded to include:
Universe Totals: Two totals are presented:
the sum of the frequency counts for each coding category is presented below the individual codes; and
the sum of the valid responses plus missing response counts of "refusals," "don't knows," and "invalid skips" can be found in the TOTAL==========> field. The number of respondents who legitimately did not respond to a question, that is, "valid skips (-4)" and "noninterviews (-5)," are also depicted.
Universe Skip Patterns: The following detailed universe information will enable researchers to easily trace the flow of respondents both backward and forward through various parts of the CAPI questionnaire items included in the codebook:
"Go to Reference # XXXXX.," appended to certain coding categories, indicates that respondents selecting that answer category were routed to the next question specified.
"Lead In(s) Reference # XXXXX." identifies the question or questions immediately preceding the codeblock question through which the universe of respondents was routed. Each lead-in reference number is followed by the relevant response value indicators, (Default), (ALL), [1:1], [1:6], and so forth. For example:
R41000. (All) This means that all cases where R41000. is asked will branch to the current question. This does not imply all respondents are asked question R41000.
R41000. (Default) This means that the default path of control from question R41000. is to branch to the current question, but there may be conditions under which a different path would be taken.
R41000. [1:6] This means that whenever the response category for question R41000. takes on the values one to six inclusive, the next question is the current question record.
"Default Next Question" specifies the next question that all respondents of the current codeblock will be asked unless some other skip condition indicates otherwise.
Valid values range
Depicted below the frequency distribution is information relating to the range of valid values for that particular distribution. "MINIMUM" indicates the smallest recorded value exclusive of "NA" and "DK." "MAXIMUM" indicates the largest recorded value. The computer-assisted interview contains internal range checks that limit responses to those between predesignated values, alert interviewers to verify unusual values, and bolster the information provided by the traditional minimum and maximum fields (see, for example, Figure 2 above).
Maximum and Minimum Fields. The MIN and MAX fields define the range, that is, the lower limit and the upper limit, of data values for a given question. A MAX of $156,359 on an income question, for example, means that this value was the highest value recorded.
Hardmax and Hardmin Fields. Hard Maximum and Hard Minimum fields denote the highest and lowest values that were accepted by the CAPI program. A Hardmax of 500,000 and a Hardmin of 0 on an income question indicate that no values above $500,000 or values lower than zero (no income) can be accepted. Dates, such as month/day/year of the respondent's last interview [lintdate] and current interview [curdate], are used as Hardmin and Hardmax values in order to restrict responses to certain questions to values within that range. Responses outside this range must be entered by the interviewer in the comment field.
Softmax and Softmin Fields. Softmax and Softmin fields cover ranges where an answer may exceed reasonable limits yet remain within the absolute limits and are acceptable after verification. A Softmax set to $80,000 on an income question will cause the machine to "beep" and a warning to appear on the screen. Interviewers are thus alerted that the value is unusual and the respondent's answer should be verified.
Restricted Income Values. Confidentiality issues restrict release of all income values. To insure respondent confidentiality, the values of income variables exceeding particular limits are truncated and the upper limits converted to a set maximum value.
From 1979 through 1984, the upper limit on income variables was $75,000, and any amounts exceeding $75,000 were converted to $75,001
Beginning in 1985, the upper limit on income amounts was increased to $100,000 due to inflation and the advancing age of the cohort, and amounts exceeding $100,000 were converted to $100,001
Beginning in 1996, the top two percent of respondents with valid values were averaged and that average value replaced all values in the top range
Users should be aware of these changes in the income ceiling if they are carrying out longitudinal analyses with these data. Upward trends in mean income statistics may reflect this change in the ceiling value. More information about truncation is available in the Income section.
Restricted Asset Values. Confidentiality issues also restrict release of all asset values. To insure respondent confidentiality, the values of asset variables exceeding particular limits are truncated and the upper limits converted to a set maximum value. The asset amounts have different upper limits, and the types of variables and limits for those variables are as follows:
Starting in 1985 all mortgage, market value of residential property, debt on residential property, miscellaneous debt and total market value of assets worth more than $150,000 were converted to $150,001; the market value and debt on a farm or business and savings that was worth more than $500,000 was converted to $500,001; the market value and debt on vehicles that was more than $30,000 was converted to $30,001
Beginning in 1989, the amounts exceeding the upper limits mentioned above were assigned the average value of all values exceeding the limits, in an effort to more accurately reflect the true range of income and asset values
Beginning in 1996, the top two percent of respondents with valid values were averaged and that average value replaced all values in the top range
Users should be aware of these changes in the asset ceiling if they are carrying out longitudinal analyses with these data. Upward trends in mean asset statistics may reflect this change in the ceiling value. More information about truncation is available in the "Assets" section of this guide.
Verbatim
Generally during the PAPI years, when a NLSY79 variable was taken directly from the questionnaire, the verbatim of the question appeared beneath the variable title. If a question is the source for more than one variable, the first variable contains the verbatim while subsequent variables prompt the user to refer back to the variable containing the verbatim. The following verbatim responses appear for reference numbers R03194. and R03195. and demonstrate this convention.
R03194. 'In Which Months of 1979 Did You (or Your Husband/Wife) Receive Supplemental Security Income? January 80 INT'
R03195. 'See R (3194.) February'
Codebook supplements and other technical documentation
The Other Documentation section of the website includes several items that provide additional information about the NLSY79 survey. There are two NLSY79 codebook supplements. The first supplement, the NLSY79 Codebook Supplement, contains a series of attachments and appendices, variable creation procedures, supplementary coding categories, and derivations for selected variables on the main NLSY79 data files. Information provided within this document is not available in the NLSY79 codebooks, nor will it be found on the documentation files on the NLSY79 data sets. The other supplement contains comparable information specific to the NLSY79 Geocode data files. The Technical Sampling Report describes the selection of the NLSY79 sample and provides additional statistical information. Finally, the School & Transcript Surveys Documentation provides technical information about those special data collections.
Error updates
Prior to working with an NLSY79 data file, users should make every effort to acquire information on current data or documentation errors. A variety of methods are used to notify users of errors in the data files or documentation and to provide those persons who acquired an NLSY79 data set directly from the Center for Human Resource Research with corrected information.
When data errors are discovered within the data file, the correction is made and the date file is updated. These updated files then become the default files on NLS Investigator. NLSY79 Errata notices can be found in "Other Documentation" section.
NLSY79 variables (as well as the variables from other other NLS cohorts) are accessed using NLS Investigator, which is available as a Web application. The main application of NLS Investigator is to access NLS variables for the purposes of identifying, selecting, extracting, and/or running frequencies or cross-tabulations. This interface allows the researcher to connect to a database and perform variable extractions without installing any software on a local computer. Through a personal online account, a researcher's selected variable tag sets, frequencies, and extracts are available for a specified period of time from any computer location with Web access. Because there is one central data source for all users, researchers will have the assurance that they are always working with the most up-to-date data, and that any necessary corrections will be immediate and universal.
This section examines and quantifies the extent of missing data, formally called item nonresponse, in the NLSY79. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.
This section focuses on the three types of missing data: refusals, invalid skips, and don't knows. Overall, the section shows that in these six rounds of the NLSY79, 20 million questions were asked. Out of all the questions asked to respondents, about 1.5 percent do not have valid answers and are missing data. Of the three missing data categories, about half the missing data are don't knows and about half are invalid skips. Given the vast majority of invalid skips occur in paper-and-pencil years, the percentage of problems attributed to this category has been steadily falling as more computer survey rounds are fielded.
Introduction
Missing data, or nonresponse, happens in a number of ways in the NLSY79. First, a number of respondents do not participate at all, causing all information in that particular survey to be missing. Participation rates and reasons for noninterview in each survey round are discussed in the section on Retention & Reasons for Noninterview.
A second reason missing data occurs is that respondents do not provide a valid answer to a question. When this happens, interviewers make a determination about whether to mark the answer as a refusal or don't know value. Users should be cautioned that the assignment of refusals and don't knows is likely to vary across interviewers. Moreover, some respondents may believe it is impolite to refuse a question and decline to answer by saying they do not know. Hence, whether a question is marked either a refusal or a don't know is somewhat arbitrary. Note: Financial questions may often elicit the "refusal" or "don't know" responses. For more information about nonresponse to financial questions, see Appendix 26: Non-Response to Financial Questions and Entry Points.
The last major way missing data can occur is when the interviewer incorrectly follows the survey instrument's flow. Incorrect flows result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked. Data archivists have removed from the data most of the extraneous question responses. While extra information can be removed, missing data is not imputed in the NLSY79. Missing data caused by this reason is flagged with a special "invalid skip" code. The number of invalid skipped drops precipitously beginning in 1993 with the introduction of CAPI. Nevertheless, invalid skips are still possible in CAPI data. If the CAPI survey contains a programming mistake, the instrument could incorrectly sequence a respondent. When these errors are found, the CAPI survey is patched in the field to prevent further invalid skips but the incorrect cases are not asked the questions again.
All missing data are clearly flagged in the NLSY79 data set. Five negative numbers are used to indicate to users that the variable does not contain useful information. The five values are (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview. These five numbers are reserved as missing value flags and, with a few exceptions (see Appendix 5: Supplemental Fertility and Relationship Variables), are rarely used in the NLSY79 for valid data values.
In the tables that follow, every attempt has been made to look at only variables in a given survey year that were filled in by either a respondent or an interviewer. The goal was to eliminate all created, machine check, date and time stamp, and variables generated in data post-processing from the analysis. Given there is no automatic way to check every question to see if it meets these criteria, the number of questions analyzed by the below tables overstates the number of questions actually filled in by the respondent or interviewer. The overstatement occurs because some questions with meaningful titles are actually hidden machine checks. While every effort was made to eliminate these questions it is impossible to eliminate all of them.
This section is not the only research on the extent of missing data in the NLS. Olsen (1992) investigated the effect of switching from PAPI to CAPI interviewing. His research shows fewer interviewer errors occur from navigating the instrument as well as fewer don't knows in the CAPI survey. More importantly, CAPI respondents appeared more willing to reveal sensitive material in the alcohol use section. Mott (1985, 1984, and 1983) examines the NLSY79's fertility data. In these reports, he examines the 1982 and 1983 surveys and finds very low refusal rates for the data in general. However, by shifting to a confidential abortion reporting method, the willingness to respond greatly increases. Mott (1998) examines the amount of missing data about the children of NLSY79 females. He finds that Hispanics or Latinos and, to a smaller extent blacks, have a much higher probability of not finishing the child assessments after starting the interview.
Additional nonresponse information
The Item Nonresponse by Section examines which sections of the NLSY79 have high nonresponse rates; the Item Nonresponse by Respondents examines how many times individuals do not respond to questions; and the Item Nonresponse within Problem Sections examines which particular questions in sections with high nonresponse rates are causing problems.
Click below to read more about each nonresponse topic.
This section examines and quantifies the extent of missing data, formally called item nonresponse, in each section of the NLSY79. The six tables below show which areas of the NLYS79 respondents are least likely to answer by tracking the total number and percentage of questions that have missing data for each group of respondents. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.
The first column of the tables contains the section names within the survey. The second column shows the total number of questions that all respondents and all interviewers should have answered in that section. This number is determined by first calculating within each section the number of questions each respondent should answer. A question is considered answerable if it does not have a valid skip (-4) or noninterview (-5) as its answer. A total for the section is obtained by summing up the answers for all NLSY79 respondents.
The third (don't know), fourth (refusal), and fifth (invalid skip) columns show the total number of nonresponses found in each section. Columns six, seven, and eight show the same information except in percentage form. The ninth column shows the total percentage of questions missed and is the sum of the previous three percentages. The last column, labeled rank, shows which sections have the most (closer to 1) and least (further from 1) amount of nonresponse.
The bottom row of each table combines the information and shows totals. For example, the bottom of the "Number Questions Asked" column in the 1979 survey shows that almost four million questions (3,975,146) were expected to be filled in by respondents or interviewers. While the 1979 survey contains many questions, other years are not far behind. In 1984, there were 3 million questions, 1989 had 1.8 million, 1994 had 3.7 million questions, 1998 had had 4.1 million questions and 2004 had 3.7 million. Readers are cautioned that each year of NLSY79 data contains far more data points since the tables exclude questions obviously labeled as machine checks, date and time stamps, and questions with valid skip or noninterview data flags.
The six tables show that the overall rate of missing data for many years dropped steadily over time. In 1979, 2.7 percent of the questions in the survey were not answered. This number drops to 1.9 percent in 1984 and then falls to 0.9 percent in 1989 and reaches a low point of 0.7 percent in 1994. After 1994 the number rises again with 0.92 percent in 1998 and 1.42 percent in 2004. Hence, nonresponse problems are of slightly less concern after the initial round of surveying.
Combining the data from all sections in all the tables shows the majority of nonresponse is caused by don't knows and invalid skips. The surveys examined asked a total of 20 million questions. Of these questions more than 140,000 or 0.7 percent were don't knows and slightly more than 127,000, or 0.6 percent were invalid skips. The last category, refusal, contains about 26,000 questions which is roughly 0.1 percent of all questions asked.
Examining the tables over time shows a steady decrease in the amount of data missing due to invalid skips. In 1979, invalid skips accounted for 2.1 percent of the questions asked. This number dropped sharply to 1.2 percent by 1984 and then down to 0.25 percent by 1989. Analysis indicated that CAPI dramatically lowered the problem of invalid skips with only 57 questions out of almost 3.7 million incorrectly skipped in 1994 and 75 questions out of 4 million in 1998.
While invalid skips fall over time, the percentage of refusals has increased slightly. Refusals accounted for 0.01 percent in 1979, 0.07 percent in 1984, 0.10 percent in 1989, 0.16 percent in 1994, 0.19 percent in 1998, and 0.20 percent in 2004. Nevertheless, while refusals steadily increase over time in absolute terms the numbers are still quite small.
While invalid skips fall and refusals are rising over time, the trend in don't knows is more complex. Don't knows accounted for 0.6 percent in 1979, 0.6 percent in 1984, 0.5 percent in 1989, 0.5 percent in 1994, 0.7 percent in 1998, and 1.1 percent in 2004. These figures suggest that don't knows are making a U-shaped pattern over time.
The last column, labeled rank, shows that missing data are not confined to a single section or area of the survey. Table 1.1 shows that in 1979 the work experience section, with 14.5 percent of the questions missing valid data, had the most problems. Fourteen percent of all questions asked in this section are labeled as invalid skips and only 0.5 percent of the questions were either refusals or don't knows. Military experience, the second most problematic section had almost half the rate of missing data (7.8 percent) as work experience. The table shows the problem of invalid skips is not related to subject matter since the section (rank 21 out of 21) with the least problems, titled "On Jobs," also focuses on labor market issues, like work experience.
While the "On Jobs" section of the survey consistently has the least problems in these surveys, the section with the most problems changes. Table 1.2, which examines the 1984 survey, shows the most problems in the "Fertility" section. Of the almost half-million questions asked in the fertility section, 5.6 percent contain missing data. While the majority of problems (3.4 percent) were due to invalid skips, a surprisingly large 2 percent of the missing responses are don't knows. The second most problematic section in the 1984 survey was "Drug Use", where 2.7 percent of the questions have missing data. Like "Fertility," the major portion of the problem is invalid skips (1.8 percent), but don't knows (0.8 percent) also account for a significant share. Interestingly, refusals account for only 0.1 percent, a relatively small proportion for a sensitive topic, suggesting that some of the don't knows were hidden refusals.
Scroll right to view additional table columns or click the link at the bottom of each table to open in a new window.
Table 1.1. Extent of refusals, don't knows, and invalid skips in 1979
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Family Background
660803
6196
90
12292
0.94%
0.01%
1.86%
2.81%
7
Marital Status
32995
131
25
467
0.40%
0.08%
1.42%
1.89%
14
Fertility
82141
679
23
624
0.83%
0.03%
0.76%
1.61%
17
Schooling
402134
994
14
5592
0.25%
0.00%
1.39%
1.64%
16
Pay
211504
22
0
3482
0.01%
0.00%
1.65%
1.66%
15
World of Work
220185
2220
31
2883
1.01%
0.01%
1.31%
2.33%
10
Military
145619
491
24
10885
0.34%
0.02%
7.47%
7.83%
2
CPS
396697
862
8
10969
0.22%
0.00%
2.77%
2.98%
5
On Jobs
230982
135
2
903
0.06%
0.00%
0.39%
0.45%
21
Employer Supplement
291836
2009
69
3575
0.69%
0.02%
1.23%
1.94%
13
Last Job
44504
31
0
261
0.07%
0.00%
0.59%
0.66%
20
Work Experience
67695
288
15
9476
0.43%
0.02%
14.00%
14.45%
1
Gov't Training
36728
62
28
2124
0.17%
0.08%
5.78%
6.03%
3
Other Training
103662
52
0
2936
0.05%
0.00%
2.83%
2.88%
6
Not at Work
90768
79
7
5019
0.09%
0.01%
5.53%
5.62%
4
Health
67869
358
2
545
0.53%
0.00%
0.80%
1.33%
18
Significant Others
58816
669
0
585
1.14%
0.00%
0.99%
2.13%
12
Residences
52845
94
7
1029
0.18%
0.01%
1.95%
2.14%
11
Rotter Scale
202976
1277
15
521
0.63%
0.01%
0.26%
0.89%
19
Income & Assets
321685
1667
216
6813
0.52%
0.07%
2.12%
2.70%
8
Expectations
252702
3824
20
2092
1.51%
0.01%
0.83%
2.35%
9
Total
3975146
22140
596
83073
0.56%
0.01%
2.09%
2.66%
-
Table 1.2. Extent of refusals, don't knows, and invalid skips in 1984
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Calendar
88462
8
0
4
0.01%
0.00%
0.00%
0.01%
15
Marital Status
50206
273
18
561
0.54%
0.04%
1.12%
1.70%
4
Schooling
324139
1031
469
2164
0.32%
0.14%
0.67%
1.13%
9
Military
123126
337
41
1352
0.27%
0.03%
1.10%
1.41%
7
CPS
333267
467
5
4270
0.14%
0.00%
1.28%
1.42%
6
On Jobs
140382
0
0
17
0.00%
0.00%
0.01%
0.01%
16
Gaps in Jobs
120601
15
0
175
0.01%
0.00%
0.15%
0.16%
13
Gov't Training
31226
38
0
59
0.12%
0.00%
0.19%
0.31%
12
Other Training
45002
7
0
736
0.02%
0.00%
1.64%
1.65%
5
Fertility
462288
9141
891
15739
1.98%
0.19%
3.40%
5.57%
1
Child Care
114317
201
13
1157
0.18%
0.01%
1.01%
1.20%
8
Health
52866
35
3
29
0.07%
0.01%
0.05%
0.13%
14
Alcohol
314511
33
47
2234
0.01%
0.01%
0.71%
0.74%
11
Drug Use
414007
3464
300
7454
0.84%
0.07%
1.80%
2.71%
2
Income & Assets
439646
2945
241
938
0.67%
0.05%
0.21%
0.94%
10
Attitudes
13427
214
2
29
1.59%
0.01%
0.22%
1.82%
3
Total
3067473
18209
2030
36918
0.59%
0.07%
1.20%
1.86%
-
Table 1.3 shows the amount of nonresponse in the 1989 survey. The most problematic section is "Income", missing data in 1.3 percent of its questions, with the CPS section a close second with 1.2 percent. Unlike earlier years, the major missing data problem in both the "Income" (1 percent) and CPS (0.8 percent) sections are don't knows, not invalid skips (0.1 percent income and 0.4 percent CPS).
Table 1.3. Extent of refusals, don't knows, and invalid skips in 1989
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Intro
14647
20
1
41
0.14%
0.01%
0.28%
0.42%
7
Marital Status
86563
372
121
450
0.43%
0.14%
0.52%
1.09%
3
Schooling
76999
179
39
217
0.23%
0.05%
0.28%
0.56%
6
Military
33579
1
1
40
0.00%
0.00%
0.12%
0.13%
10
CPS
406265
3320
52
1650
0.82%
0.01%
0.41%
1.24%
2
On Jobs
39749
0
0
1
0.00%
0.00%
0.00%
0.00%
12
Gaps in Jobs
91565
91
1
894
0.10%
0.00%
0.98%
1.08%
4
Gov't Training
49657
118
35
233
0.24%
0.07%
0.47%
0.78%
5
Fertility
152546
6
35
92
0.00%
0.02%
0.06%
0.09%
11
Health
154024
120
74
168
0.08%
0.05%
0.11%
0.24%
9
Alcohol
217441
74
400
201
0.03%
0.18%
0.09%
0.31%
8
Income
470686
4761
1124
439
1.01%
0.24%
0.09%
1.34%
1
Total
1793721
9062
1883
4426
0.51%
0.10%
0.25%
0.86%
-
Table 1.4 shows that the most problematic area in the 1994 survey includes the asset questions, which are missing 2.5 percent of their answers (75 percent of those missing being don't knows). The second most problematic area includes income questions, which are missing 1.3 percent of their answers. While in the three previous surveys refusal rates were not an issue, the 1994 survey shows refusals are becoming significant. Slightly more than half a percent (0.6 percent) of the "Asset" section questions and more than one fifth of a percent (0.2 percent) of the "Income" section questions were refused.
Table 1.4. Extent of refusals, don't knows, and invalid skips in 1994
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Intro
36251
62
14
0
0.17%
0.04%
0.00%
0.21%
12
Marital Status
137540
1522
193
0
1.11%
0.14%
0.00%
1.25%
3
School
60166
302
2
0
0.50%
0.00%
0.00%
0.51%
7
Military
27372
6
1
0
0.02%
0.00%
0.00%
0.03%
15
CPS
269452
28
9
0
0.01%
0.00%
0.00%
0.01%
17
On Jobs
79567
6
7
0
0.01%
0.01%
0.00%
0.02%
16
Employer Supplement
1060679
7092
1342
8
0.67%
0.13%
0.00%
0.80%
5
Training
194147
246
29
47
0.13%
0.01%
0.02%
0.17%
13
Fertility
450871
1859
763
0
0.41%
0.17%
0.00%
0.58%
6
Child Care
26453
109
12
0
0.41%
0.05%
0.00%
0.46%
9
Relationship
81477
285
113
0
0.35%
0.14%
0.00%
0.49%
8
Health
282702
623
199
0
0.22%
0.07%
0.00%
0.29%
11
Alcohol
164663
46
61
0
0.03%
0.04%
0.00%
0.06%
14
Income
305693
3176
672
1
1.04%
0.22%
0.00%
1.26%
2
Program Participation
118305
297
63
0
0.25%
0.05%
0.00%
0.30%
10
Assets
169301
3239
930
1
1.91%
0.55%
0.00%
2.46%
1
Drugs
204621
772
1626
0
0.38%
0.79%
0.00%
1.17%
4
Total
3669260
19670
6036
57
0.54%
0.16%
0.00%
0.70%
-
Table 1.5 examines the 1998 survey. Since the survey is fielded every other year in the late 1990s there is no 1999 interview, which would exactly continue the every five-year pattern. The 1998 survey is used as the closest substitute. This table, like the one for 1994, shows that the most problematic area is again the asset questions, which are missing 3.6 percent of their answers (75 percent of those missing being don't knows). The second most problematic area is the marital history questions, which added a new section that asked detailed questions about the work history and past life of the respondent's spouse. This expanded section is missing 1.8 percent of its answers. In the 1998 survey only two sections have relatively high refusal rates; assets (almost 0.6 percent) and drug use (0.79 percent).
Table 1.5. Extent of refusals, don't knows, and invalid skips in 1998
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Intro
10060
6
4
0
0.06%
0.04%
0.00%
0.10%
12
Marital Status
207805
3296
520
1
1.59%
0.25%
0.00%
1.84%
2
School
53928
197
45
0
0.37%
0.08%
0.00%
0.56%
10
Military
25691
0
0
0
0.00%
0.00%
0.00%
0.00%
15
CPS
301160
44
12
0
0.01%
0.00%
0.00%
0.02%
13
On Jobs
117144
2
0
1
0.00%
0.00%
0.00%
0.00%
14
Employer Supplement
1081493
10265
1441
1
0.95%
0.13%
0.00%
1.08%
3
Training
241013
1559
143
1
0.65%
0.06%
0.00%
0.71%
7
Fertility
578831
3180
1097
50
0.55%
0.19%
0.01%
0.75%
6
Child Care
23241
57
11
1
0.25%
0.05%
0.00%
0.30%
11
Relationship
86632
371
154
0
0.43%
0.18%
0.00%
0.61%
9
Health
350533
2460
223
0
0.70%
0.06%
0.00%
0.77%
5
Income
608849
3410
847
10
0.56%
0.14%
0.00%
0.70%
8
Assets
174570
4702
1566
10
2.69%
0.90%
0.01%
3.60%
1
Drugs
217175
419
1485
0
0.19%
0.68%
0.00%
0.88%
4
Total
4078125
29968
7548
75
0.73%
0.19%
0.00%
0.92%
-
Table 1.6 examines the 2004 survey. This survey has two new sections that are not seen in the previous tables. The first section is found in the employer supplement and asks the respondent detailed questions about the pensions available from their employer and the respondent's participation in these pensions. This new section is ranked first in problems and has missing responses to 2.5% of all questions. The second new section is the over 40 health module. The goal of this section is to provide researchers with a baseline health measure that will be updated at ten year intervals. The health section is ranked 8th out of 13 sections and has a nonresponse rate slightly more than three-quarters of one percent.
Table 1.6. Extent of refusals, don't knows, and invalid skips in 2004
Section Name
Number Questions Asked
Number Don't Knows
Number Refused
Number Invalid Skipped
Percent Don't Knows
Percent Refused
Percent Invalid Skipped
Total Percent Missed
Rank
Intro
91277
39
16
4
0.04%
0.02%
0.00%
0.06%
12
Marital Status
77954
371
66
106
0.48%
0.08%
0.14%
0.70%
9
School
56716
554
39
4
0.98%
0.07%
0.01%
1.05%
7
Military
39772
20
5
0
0.05%
0.01%
0.00%
0.06%
13
Employer Supplement
734366
7729
1001
275
1.05%
0.15%
0.04%
1.23%
6
Pensions
189861
3753
508
485
1.98%
0.27%
0.26%
2.50%
1
Training
307708
2943
887
322
0.96%
0.29%
0.10%
1.35%
5
Fertility
521658
5801
733
1216
1.11%
0.14%
0.23%
1.49%
3
Child Care
34561
12
4
7
0.03%
0.01%
0.02%
0.07%
11
Relationship
1004
2
0
0
0.20%
0.00%
0.00%
0.20%
10
Over 40 Health
622644
4386
402
14
0.70%
0.06%
0.00%
0.77%
8
Income
412656
4382
1199
39
1.06%
0.29%
0.01%
1.36%
4
Assets
626393
12726
2634
233
2.03%
0.42%
0.04%
2.49%
2
Total
3716570
42718
7494
2705
1.15%
0.20%
0.07%
1.42%
-
This section provides details on the amount of missing data associated with each respondent. Each table in this section shows the number of respondents who are missing data in one of the surveys. The tables are split into two parts. The left-hand part, columns one to four, shows the total number of questions that have missing data for each group of respondents. The right-hand part, columns five to nine, shows the percentage of questions that have missing data.
The top line of Tables 2.1.1 shows that in the 1979 survey, 12,527 respondents never refused to answer questions. While refusals are quite rare in this survey round, don't knows and incorrect skips are quite frequent. The top line shows that only 5,084 respondents had zero don't know responses and only 2,347 respondents were sent through the entire questionnaire without any sequencing errors. Subtracting these numbers from the 12,686 total respondents means that 60 percent, or 7,602 respondents, stated they did not know the answer to at least one question and 81. 5 percent, or 10,339 respondents, were incorrectly skipped somewhere in that questionnaire.
The top line of Table 2.1.2, which examines the percentage of questions missing data, shows a similar picture. Refusal rates are relatively low. There are 12,620 respondents who refused less than one percent of their questions, which means only 66 respondents refused one percent or more of the questions they were expected to answer. Thirty-five percent, or 8,185 respondents, answered don't know to less than one percent of their questions. Again, the largest group was respondents who were incorrectly skipped over questions. Only 4,313 respondents were incorrectly skipped over less than one percent of the questions, but 8,373 of the respondents were illegally skipped over one percent or more of their questions and 227 were skipped over more than 10 percent.
Refusal rates have increased steadily over time even though the more difficult respondents have presumably left the survey. Tables 2.2.1 and 2.2.2, which examine the 1984 survey, shows an increase over the 1979 refusal rates. While the number of respondents answering the survey is shrinking, the number refusing to answer questions is increasing. For example, while in 1979 only 10 respondents refused to answer more than 10 questions, in 1984 there were 41 respondents. This pattern of increase is evident in Tables 2.3.1 and 2.3.2, which examine 1989, through to Tables 2.6.1 and 2.6.2, which examine 2004. By 2004, there were 185 respondents who refused to answer more than 10 questions.
Increasing refusal rates are also seen in the percentage side of the table. In 1979, only 66 respondents refused to answer one percent or more of the questions they were asked. This increased in subsequent surveys to 320 respondents in 1984, 355 respondents in 1989, 480 respondents in 1994, 549 respondents in 1998, and 655 respondents in 2004.
"Don't know" rates have also risen over time. In the 1979 survey, 8,185 respondents had less than one percent of their questions labeled as don't knows. This number drops in 1984 to 7,003 respondents and further drops to 6,423 in 1989 and 5,942 in 1994, 4,741 in 1998 and 3,185 in 2004. While rates have risen, relatively few individuals have high levels of don't knows. In 1979, only 68 respondents didn't know the answer to more than five percent of the questions they were asked. This number falls to 19 respondents in 1984 and then rises to 66 in 1989 before falling back to 46 respondents in 1994 and then jumps back to 66 in 1998, and ends with 149 in 2004.
While don't know and refusal rates have risen, incorrect skip problems have clearly shrunk over time. In 1979, there were only 2,347 respondents who were correctly sequenced through the entire survey. In 1984, this number rises to 7,802 respondents, followed by a rise to 9,334 respondents in 1989. In 1994 and 1998 almost every respondent was correctly sequenced. Only 57 and 46 respondents were incorrectly skipped through part of the survey in each year respectively. Moreover, most of the respondents were only incorrectly skipped in a single question. In 2004 there were 349 respondents who were incorrectly skipped through one percent of their questions and 22 who were incorrectly skipped through 2 percent or more.
Nonresponse by Respondents in 1979 survey
Table 2.1.1 Number of respondents with missing data by number of questions in 1979 survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
12527
5084
2347
1
91
2974
1897
2
26
1723
1393
3
13
1016
1158
4
5
629
838
5
2
376
596
6
1
228
489
7
3
173
502
8
3
131
420
9
1
84
340
10
4
57
308
> 10
10
211
2398
Table 2.1.2 Number of respondents with missing data by percent of questions in 1979 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
12620
8185
4313
1%
43
3247
3421
2%
7
773
1733
3%
5
264
989
4%
5
101
621
5%
0
48
397
6%
2
27
312
7%
1
18
278
8%
1
6
206
9%
0
7
118
10%
0
2
71
> 10%
2
8
227
Nonresponse by Respondents in 1984 survey
Table 2.2.1 Number of respondents with missing data by number of questions in 1984 survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
11222
4549
7802
1
610
3012
1289
2
73
1901
622
3
44
1136
413
4
38
668
252
5
13
345
369
6
6
177
174
7
1
108
93
8
7
63
115
9
4
38
73
10
10
28
64
> 10
41
44
803
Note: Not included in this table are 617 respondents who did not answer the survey.
Table 2.2.2 Number of respondents with missing data by percent of questions in 1984 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
11749
7003
8956
1%
207
3807
1267
2%
44
944
674
3%
13
213
284
4%
15
62
133
5%
13
21
84
6%
10
11
139
7%
4
2
137
8%
5
3
107
9%
2
0
68
10%
2
3
36
> 10%
5
0
184
Note: Not included in this table are 617 respondents who did not answer the survey.
Nonresponse by Respondents in 1989 survey
Table 2.3.1 Number of respondents with missing data by number of questions in 1989survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
10221
6135
9334
1
171
2517
781
2
59
1036
189
3
37
395
35
4
20
194
20
5
21
131
16
6
7
75
7
7
10
34
125
8
10
24
18
9
4
10
9
10
7
6
3
> 10
38
48
68
10%
3
8
3
Note: Not included in this table are 2,081 respondents who did not answer the survey.
Table 2.3.2 Number of respondents with missing data by percent of questions in 1989 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
10250
6423
9461
1%
193
3221
843
2%
58
561
51
3%
35
219
69
4%
13
76
86
5%
10
39
24
6%
4
24
10
7%
4
17
10
8%
3
1
5
9%
3
3
9
> 10%
29
13
34
Note: Not included in this table are 2,081 respondents who did not answer the survey.
Nonresponse by Respondents in 1994 survey
Table 2.4.1 Number of respondents with missing data by number of questions in 1994 survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
7168
3559
8832
1
1129
1780
57
2
191
1082
0
3
87
693
0
4
41
443
0
5
28
334
0
6
29
232
0
7
22
171
0
8
21
115
0
9
17
105
0
10
18
72
0
> 10
138
303
0
Note: Not included in this table are 3,797 respondents who did not answer the survey.
Table 2.4.2 Number of respondents with missing data by percent of questions in 1994 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
8409
5942
8889
1%
246
2060
0
2%
81
558
0
3%
41
165
0
4%
31
79
0
5%
20
39
0
6%
19
16
0
7%
6
15
0
8%
10
4
0
9%
9
2
0
10%
4
2
0
> 10%
13
7
0
Note: Not included in this table are 3,797 respondents who did not answer the survey.
Nonresponse by Respondents in 1998 survey
Table 2.5.1 Number of respondents with missing data by number of questions in 1998 survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
7248
2497
8353
1
473
1355
21
2
162
1020
23
3
83
729
0
4
60
589
2
5
42
447
0
6
35
343
0
7
26
277
0
8
19
201
0
9
23
169
0
10
12
120
0
> 10
216
652
0
Note: Not included in this table are 4,287 respondents who did not answer the survey.
Table 2.5.2 Number of respondents with missing data by percent of questions in 1998 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
7850
4741
8385
1%
254
2441
13
2%
86
712
0
3%
58
283
1
4%
54
110
0
5%
27
46
0
6%
30
25
0
7%
14
11
0
8%
4
7
0
9%
8
9
0
10%
2
5
0
> 10%
12
9
0
Note: Not included in this table are 4,287 respondents who did not answer the survey.
Nonresponse by Respondents in 2004 survey
Table 2.6.1 Number of respondents with missing data by number of questions in 2004 survey
Number of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0
6531
1524
6539
1
298
993
440
2
194
755
334
3
171
624
145
4
78
592
42
5
45
486
98
6
51
387
29
7
45
360
13
8
29
314
3
9
23
235
5
10
11
178
7
> 10
185
1213
6
Note: Not included in this table are 5,025 respondents who did not answer the survey.
Table 2.6.2 Number of respondents with missing data by percent of questions in 2004 survey
Percent of Questions
Number of Respondents
Refused
Didn't Know
Was Incorrectly Skipped Over
0%
7006
3185
7290
1%
384
2399
349
2%
106
1122
18
3%
48
477
2
4%
40
226
1
5%
18
103
0
6%
16
68
0
7%
10
29
0
8%
8
14
0
9%
8
17
0
10%
3
6
1
> 10%
14
15
0
Note: Not included in this table are 5,025 respondents who did not answer the survey.
How much missing data are associated with particular questions? This section provides readers with an in-depth view of the questions within survey sections having a high amount of missing data. Like the previous parts, this section provides tables for each of the selected survey years. The first table (Table 3.1) examines questions from the 1979 survey's "Work Experience" section. This section has more missing data (14.5 percent) than any other 1979 survey section. The second set of tables (Tables 3.2 through 3.6) examines the most problematic section of the 1984 survey, "Fertility and Abortion." The third set of tables (Tables 3.7 and 3.8) examines the most problematic 1989 survey section, "Income and Assets." Since the 1994 "Income and Asset" section again ranked first in missing data, the next set of tables (Tables 3.9 and 3.10) substitutes the "Drug and Alcohol Use Supplements," given the high degree of research interest in understanding nonresponse in these sections. Table 3.11 highlights nonresponse in 1998 in the Marital History section. Table 3.12 tracks nonresponse problems in the over-40 health section.
To ensure the sets of tables are not overwhelming, all sections that could be naturally divided are split (Fertility, for instance). Additionally, only the most important question or questions with high rates of nonresponse are shown. Table 3.1, which examines the amount of missing data in the 1979 survey, shows the highest amount of missing data are associated with a pair of retrospective questions that asked respondents to remember what happened two years earlier. Interviewers incorrectly skipped slightly less than 1,750 respondents over R01150., weeks worked in 1977, and R01153., hours worked per week in 1977. Examining the 1979 questionnaire shows that these questions appear at the bottom of a page. Prior to these questions is a fairly complicated half page of instructions and questions that the interviewer must read, understand, and partially speak. It seems likely that many interviewers did not understand the instructions and skipped to the next page.
Table 3.1. Amount of missing data per question in the Work Experience section in 1979 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R01150.
Weeks Work in 1977
1735
11
1
R01151.
Weeks Work in 1976
418
18
1
R01152.
Weeks Work in 1975
240
11
0
R01153.
Hours/Week Work in 1977
1749
13
0
R01154.
Hours/Week Work in 1976
459
16
0
R01165.
Industry of 1st Job after School
628
4
1
R01166.
Occupation at 1st Job after School
627
3
1
R01167.
Hours/Week Work at 1st Job after School
631
6
1
R01168.
Hours/Day at 1st Job after School
632
6
1
R01169.
Rate of Pay at 1st Job after School
632
32
2
Tables 3.2-3.6, which examine the "Fertility" section, show a much lower number of invalid skips in all parts except in the abortion questions. While invalid skips do not reach the level seen in Table 3.1, on average 190 female respondents were not asked each abortion question (190 is an average from all abortion questions, not just those shown in the tables). The table also shows a number of other trends. First, respondents have higher levels of don't know answers the more precise the question being asked. For example, in Table 3.2, when males were asked the date of birth of their first child, only one did not know the year, three did not know the month and 10 did not know the day. This phenomena is most clearly seen in Table 3.5, which shows the year and month of the respondent's first sexual encounter. Only 43 respondents did not know the year, but 1,410 respondents did not know the month. This problem with dates is also seen in the abortion data where only four respondents did not know the year when they had their first abortion, but 13 did not know the month.
Refusal rates in the "Fertility" section are quite low except for a number of key questions. Asking the number of times they had sex in the last month elicited high rates of refusal for males and females. This question elicited 167 male and 135 female refusals. Interestingly, most individuals were willing to answer if they ever had sex since only 45 males and 54 females refused to answer these questions. Birth control questions did not have exceptionally high rates of refusal. Seventeen female respondents and no males refused to answer the birth control questions. Table 3.6 shows that 28 females refused to answer if they ever had an abortion and 28 more refused to state if they dropped out of school before they terminated the pregnancy.
Table 3.2. Amount of missing data per question in male Fertility section in 1984 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R13017.
Ever Had Any Children
0
3
0
R13019.
Month Birth Child#1 Born
41
3
0
R13020.
Day Birth Child #1 Born
45
10
0
R13021.
Year Birth Child#1 Born
39
1
0
R13022.
Sex of Child#1 Born
3
0
0
R13115.
Total #Children Expect to Have
12
45
3
R13117.
#Years Expect Have 1st/Next Child
22
120
0
R13118.
Had Any Children/Expecting
0
7
0
R13119.
Current Pregnancy Planned
131
0
0
R13121.
Ever Had Sexual Intercourse
12
0
45
R13122.
Age @First Sexual Intercourse
28
19
23
R13123.
#Times Sexual Intercourse Past Month
11
68
167
R13124.
Is Partner Now Pregnant
0
1
0
R13125.
Use Any Birth Control During Last Month
15
2
0
R13126.
#Times Try Prevent Pregnancy
65
0
0
R13127.-R13141.
Method of Birth Control
16
0
0
R13142.
Ever Have a Sex Education Course
10
0
12
R13148.
Month Took Sex-Ed Course
73
564
0
R13149.
Year Took Sex-Ed Course
36
58
0
R13150.
Time When Pregnancy Most Likely
19
1480
20
Table 3.3. Amount of missing data per question in female Fertility section in 1984 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R13191.
#Pregnancies
8
0
0
R13251.
Use Any Birth Control before Preg#1
18
0
1
R13254.
Want Be Pregnant before Preg#1
20
0
0
R13255.
Husband/Partner Want Preg#1
19
20
0
R13283.
Get Prenatal Care Preg#1
57
0
0
R13286.
Frequency Alcohol Use Preg#1
58
0
0
R13288.
#Cigarettes Smoked Preg#1
56
0
0
R13297.
X-Rays Taken Preg#1
57
0
0
R13302.
Sonogram Preg#1
57
6
0
R13358.
Amniocentesis Preg#1
57
0
0
R13411.
Took Vitamins Preg#1
57
0
0
R13443.
C-Section Child#1 Born
52
0
0
R13445.
Weight at Delivery, Preg#1
53
5
1
R13446.
Weight before Preg#1
51
5
1
R13449.
Length Child#1 Born at Birth
53
20
0
R13667.
Weight of Child#1 @Birth Lbs
25
6
0
Table 3.4. Amount of missing data per question in feeding part of Fertility section in 1984 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R13670.
Child#1 Breastfed
27
0
0
R13672.
Month Age Child#1 Breast Fed Ended
27
1
0
R13674.
Month Age Child#1 Formula Fed
38
3
0
R13693.
Wk Age Child#1 Formula Fed Ended
57
0
0
R13694.
Month Age Child#1 Formula Fed Ended
57
6
0
R13696.
Months Age Child#1 - Cow's Milk
81
10
0
R13698.
Months Age Child#1 - Solid Food
86
10
0
Table 3.5. Amount of missing data per question in child part of Fertility section in 1984 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R13791.
Age Had 1st Menstrual Period
8
14
22
R13792.
Year 1st Menstrual Period
0
7
0
R13793.
Month Had 1st Menstrual Period
17
2207
1
R13794.
R Ever Been Pregnant
0
1
0
R13795.
Ever Had Sexual Intercourse
4
0
54
R13796.
Age First Sexual Intercourse
5
26
78
R13797.
Year 1st Sexual Intercourse
0
43
66
R13798.
Month Sexual Intercourse 1st Time
19
1410
75
R13799.
#Times Sexual Intercourse Past Month
9
104
135
R13802.
#Times Try Prevent Pregnant Past Month
17
0
2
Table 3.6. Amount of missing data per question in abortion questions of Fertility section in 1984 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R13827.
Ever Had An Abortion
135
0
28
R13828.
# of Abortions
143
0
0
R13830.
Year of 1st Reported Abortion
196
4
0
R13837.
Drop out School #1 Pregnant
155
0
28
R13839.
Year Left School 1st Time Pregnant
164
0
0
R13841.
Year Return School Time#1 after Pregnant
258
0
0
Tables 3.7 and 3.8 examine the "Income and Assets" section of the 1989 survey. While invalid skips are relatively rare in this section, refusals and don't know answers are fairly prevalent. The question with the highest amount of missing income data is R29822., which asks how much income was earned by other adults living in the household who were related to the respondent. While the previous questions showed that most respondents knew the type of income received by these family members, 958 could not come up with a specific amount. The second most problematic question with 11 invalid skips, 155 don't knows, and 113 refusals was R29714., which asked the respondent how much they earned from wages, salary, and tips.
Other questions with high numbers of don't knows are R29813., which asked about the amount of money received from other sources like interest and dividends, R29825., which asks about a partner's income, and R29827., which asks the number of exemptions used when filing a Federal tax return.
The asset table (Table 3.8) also shows invalid skips are rare but don't know and refusal rates are not. Surprisingly, one of the questions with the highest amount of missing data (315 missing answers) asks, "how much is your car worth (R29852.)?" Another question missing many observations asks the amount of the respondent's savings (R29835.). While the car worth question primarily elicits don't knows, the savings question resulted in 160 refusals. Three other questions elicited high numbers of don't knows: value of stocks and bonds (R29837.) - 219 don't knows; amount taken out of savings last year (R29842.) - 222 don't knows; and the market value of other items such as jewelry (R29854.) - 151 don't knows.
Table 3.7. Amount of missing data per question in Income section in 1989 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R29714.
Amount Rec from Wages/Salary/Tips
11
155
113
R29715.
In 1988 Receive Income from Own Business
1
0
11
R29717.
How Much Did R Receive after Expenses
6
49
23
R29732.
Amount Rec'd Per Week from Unemployment
0
5
1
R29736.
Amount Sp Rec'd 1988 from Wages
16
17
70
R29754.
How Much Did Sp Receive from Unemployment
8
12
0
R29758.
R/Spouse Rec'd Money for Child Support
1
1
10
R29759.
Amount R/Spouse Rec'd Child Support
2
14
2
R29760.
R/Spouse Rec'd AFDC Payments
0
4
9
R29774.
R/Spouse Rec'd Food Stamps
0
2
10
R29788.
R/Spouse Rec'd SSI/Public Assistance
0
4
9
R29808.
Rec'd Veteran Benefits
1
1
10
R29812.
R/Spouse Rec'd Money from Oth So
0
2
16
R29822.
Income Rec'd by Adults Related To R
7
958
8
R29825.
Total Income Rec'd before Deduct
2
200
4
R29826.
Sp File Federal Income Tax R
0
2
13
R29827.
R'S Filing Status on Federal Ret
11
8
2
R29828.
Exemptions Filed on 1988 Federal Tax
62
92
3
Table 3.8. Amount of missing data per question in Asset section in 1989 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R29831.
Amount Property Selling for on Today
5
53
10
R29832.
Amount R Owes on Property
4
85
25
R29833.
Amount Other Debt R Owes on Property
12
26
27
R29835.
Amount of Savings
7
166
160
R29837.
Current Market Value of Stocks
2
219
23
R29838.
R/Spouse Have Rights to Estate
2
3
18
R29839.
Total Value of Estate
3
90
6
R29840.
Put Money in/out of Savings
1
3
28
R29841.
How Much More Money Put in
6
110
53
R29842.
How Much More Money Take out
5
222
21
R29843.
R Have Business Investment
0
1
12
R29844.
R Have Investment in a Farm
4
0
0
R29847.
Total Market Value of Business
4
75
10
R29848.
Total Amount of Business Debt
1
55
8
R29851.
How Much Does R Owe on Vehicle
0
56
17
R29852.
Amount Vehicle Sells for Today
11
293
11
R29854.
Market Value of Other Items
5
151
25
R29856.
Total Amount R Owes
1
73
13
Table 3.9 and 3.10 examine the drug and alcohol use supplements in the 1994 survey. In these CAPI modules, there are no invalid skips. Interestingly, there are extremely low refusal and don't know rates within the "Alcohol" section (Table 3.9). The question with the highest refusals (nine respondents) asks if the individual had a drink since the 1989 interview. The typical question in the "Alcohol" section received only two refusals. Don't know rates are also low. The maximum number of don't knows at nine occurs in R49803., which asks if the respondent needs to drink more alcohol now in order to get drunk. On average, the "Alcohol" section records only 1.5 don't knows per question.
Table 3.9. Amount of missing data per question in Alcohol Use section in 1994 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R49791.
R Had Drink of Alcohol since 1989
0
3
9
R49792.
Had Alcoholic Beverage in Last 30
0
0
5
R49793.
Times Had 6/More Drinks Last
0
0
1
R49794.
How Many of Last 30 Days Drank A
0
6
2
R49795.
No. of Drinks on Avg. Day When R
0
8
3
R49803.
Need More to Get Drunk Than Before
0
9
0
R49808.
Arrested, in Police Trouble
0
0
3
R49809.
Drink More Than Before
0
4
3
These low numbers of refusals and don't knows are not seen in Table 3.10, which examines the "Drug Use" section. On average, the typical question in this supplement elicited 23 don't knows and 48 refusals. Readers should understand that this supplement was generally filled in directly by the respondent, not by the interviewer. To provide respondents with practice using a computer, the questionnaire asked them two practice questions not related to drug use. Refusal rates are even high for these two test questions, which ask how many more children the respondent expects to have and what type of entertainment, such as movies, concerts, or plays, the respondent went to last year.
The highest number of refusals (119) occurs in R50532., which asks the age the respondent first used marijuana. The second largest number of refusals occurs in a similar question, R50536., which asks the age of first cocaine use. These same questions have very high don't know responses (113 marijuana and 48 cocaine). One other question with a very high don't know rate is R50525., which asks if the respondent ever smoked cigarettes daily. Almost 80 individuals did not know the answer to this question. Given that the question wording is straightforward, it is likely a number of respondents are using don't know as a polite way of refusing to answer the question.
Table 3.10. Amount of missing data per question in Drug Use section in 1994 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R50524.
R Smoked at Least 100 Cigrtts in Life?
0
24
38
R50525.
R Ever Smoked Daily?
0
79
49
R50526.
Age When R 1st Started Smoking Daily?
0
33
12
R50531.
Total Occasion R Use Marijuana
0
33
89
R50532.
Age 1st Time Used Marijuana
0
113
119
R50533.
Most Recent Time Used Marijuana
0
35
89
R50535.
How Many Occasions Used Cocaine
0
19
86
R50536.
Age 1st Time Used Cocaine
0
48
103
R50537.
Most Recent Time Used Cocaine
0
15
78
R50539.
How Many Occasions Used Crack
0
15
77
R50540.
Age 1st Time Used Crack
0
33
82
R50541.
Most Recent Time Used Crack
0
16
74
R50553.
R Used Heroin w/o Doctor's Instr
0
9
53
The top ten questions show that a large number of respondents (ranging from 119 to 181 respondents, depending on the question) have difficulty with questions asking them about their spouse's rate and amount of pay, hours worked and weeks worked. In addition, questions which ask details about a spouse's previous marriage are also quite difficult for many respondents to answer.
Table 3.11. Amount of missing data per question in Marital History section in 1998 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R58067.
Rate of Pay for Spouse Main Job (Time Unit)
0
181
49
R58204.
Age of Spouse at 1st Marriage
0
213
2
R58125.
Spouse's Weekly Earnings at Main Job
0
159
29
R58068.
Spouse Receive Overtime at Main Job
0
151
26
R58127.
Estimate Spouse's Weekly Earning Main Job
0
149
26
R58178.
House Spouse Works Per Week Usually
0
170
1
R58177.
Number of Weeks Worked by Spouse in Last Year
0
140
24
R58179.
Number Weeks Not Working by Spouse Last Year
0
130
24
R58176.
Spouse Hourly Rate of Pay
0
119
28
R58208.
Duration of Spouse's Previous Marriage?
0
109
16
Table 3.12 examines the top questions with missing data problems from the health section in 2004. In this table, reference numbers starting with "R" are for questions asked of all respondents in the survey, while reference numbers starting with "H" represent questions in the "over 40 health module." This module was designed to provide researchers with more information about the health of the respondent when they turned 40 years old and is asked of respondents in the first interview after they turn 40.
While other data from the survey show that many people know if they are covered by health insurance, Table 3.12 reveals that many do not know details about this coverage. For example, one question with a large number of don't knows is R83036., which asks if the respondent's health insurance plan is an HMO, a preferred provider plan (PPO) or a network of affiliated doctors. This question had 428 missing responses out of 6,175 total responses (a 7% missing response rate). Other questions with high don't know rates ask if the respondent's children are covered by health insurance. The health question with the highest refusal rate asks the respondent how much they weigh, with 114 people refusing to divulge the number. Finally, in the 40+ health module a number of NLSY79 respondents have difficulty answering questions about the health and life status of their biological father. This is not surprising given a small but significant number of respondents stated in the past that they have never met their biological father.
Table 3.12. Amount of missing data per question in Health section in 2004 survey
Reference Number
Variable Title
Invalid
Don't Know
Refusal
R83036.
Primary Insurance Plan HMO, Network, PPO
0
426
2
R83037.
Is Primary Plan a PPO?
0
388
2
R83070.
Children Have Health/Hospitalization Plan?
0
328
15
R83038.
R's Primary Plan Need Authorization?
0
301
0
H00015.
Date Most Recent General Physical Exam
0
189
0
R82983.
How Much Does R Weigh?
0
50
114
H00014.
Ever Had A General Physical Exam?
0
147
2
H00017.
Cause Of Biological Dads Death
0
133
10
H00019.
Bio Dad Have Major Health Problems?
0
134
8
R82982.
Since What Date R Had This Health Limit
0
120
0
R82992.
Length Light Moderate Activities 10 Min
0
105
5
H00047.
Date Hypertension Diagnosed
0
91
0
H00016.
Is R's Biological Dad Living?
0
83
4
R82989.
Frequency of Light Mod Exercise 10 > Min
0
75
6
H00018.
Age Of Biological Dad At Death
0
68
1
H02445.
Date Most Recent Visit to Health Professional
0
52
11
H00012.
R Ever Visit Health Care Professional?
0
58
0
R83042.
Spouse Have Health/Hospital Plan
0
32
24
R83048.
Spouse Employer Pay All Health Plan Cost?
0
49
2
Note: Reference numbers that begin with the letter H are variables that are combined from different years of the over-40 health module. Researchers wanting to see the results from just the 2004 survey should use variable H00002.00, which is titled "Source Year for 40+ Health Module Data." Use this variable to select just those cases which answered the questions in 2004.
References
Mott, Frank L. "Patterning of Child Assessment Completion Rates in the NLSY: 1986-1996." CHRR, The Ohio State University, 1998.
Mott, Frank L. "The Patterning of Female Teenage Sexual Behaviors and Attitudes." CHRR, The Ohio State University, 1994.
Mott, Frank L. "Fertility-Related Data in the 1982 National Longitudinal Surveys of Work Experience of Youth: An Evaluation of Data Quality and Some Preliminary Analytical Results." CHRR, The Ohio State University, 1983.
Olsen, Randall J. "The Effects of Computer Assisted Interviewing on Data Quality." CHRR, The Ohio State University, 1992.
Each NLSY79 questionnaire includes an interviewer remarks section that interviewers complete after finishing the interview with the respondent. Some information is objective, such as the presence of another person during an in-person survey, while other details, such as rating how cooperative the respondent was, rely on the interviewer's subjective assessment.
Special circumstances
All survey rounds feature a series of questions about special circumstances that might have affected the quality of the data. The interviewers were asked to assess whether the respondent was hard of hearing, unable to see well, unable to read, lacking in basic social skills, mentally handicapped or retarded, physically handicapped, ill/injured, had a poor command of English.
Respondent's general demeanor and responsiveness
In all survey rounds, interviewers rated how informative and cooperative a respondent was during the interview. In addition, the interviews assessed the respondent's overall understanding (good, fair, poor) of the questions.
Presence of others during interview
All survey rounds include information about whether others were present (listening and/or participating) during in-person interviews and who the person or persons were (infant child, family member, etc.). Interviewers attempt to secure a private environment for all interviews, so the presence of another individual (other than a small child) is an exception and can be considered a disruption to the interview.
Interviewer characteristics
Interviewers provide information on their own ethnicity, age, sex, highest grade completed, and how much experience (measured in years) they had as an interviewer.
Interview methodology
Interviewers record whether any portion of the interview took place on the phone and indicate if the interview was in Spanish or English.
Interviewer retention
Interviewers indicate each survey round whether they had interviewed that respondent the previous survey year.