Skip to main content

NLSY79

NLSY79 Attachment 6: Other Kinds of Training Codes

Page 135, Section 14, Q.06A, 1979


Codes 01-25

  • 01 Bookkeeping
  • 02 Housing Inspector
  • 03 Water Safety
  • 04 Language
  • 05 Computer Training
  • 06 Music Training
  • 07 Typing
  • 08 Broadcasting
  • 09 Riding Lessons (Horses)
  • 10 Printing
  • 11 Reading Improvement
  • 12 Sales and Boat Crew (On-the-Job-Training)
  • 13 WECEP
  • 14 Welding Training
  • 15 Teacher's Aide
  • 16 Electrician
  • 17 Field Counselor
  • 18 Flight School
  • 19 Regional Opportunity Program
  • 20 Travel Agent
  • 21 4H Club
  • 22 Junior Achievement Training
  • 23 First Aid Training
  • 24 Career Education
  • 25 Senior Life Saving

Codes 26-50

  • 26 Metal Fabrication Testing
  • 27 ROTC
  • 28 Carpentry
  • 29 CPR (Cardiac Pulmonary Resuscitation)
  • 30 Machine Operator
  • 31 Modelling School
  • 32 Tutoring
  • 33 Beauty School
  • 34 Cement Finishing
  • 35 Minority Engineers & Advancement
  • 36 Center for Leadership Development
  • 37 Auto Mechanics & Electronics
  • 38 BOEC: Business
  • 39 Ameslan (American Sign Language)
  • 40 Franchise Manager Training
  • 41 Real Estate
  • 42 Working with Retarded Children
  • 43 Fire Training
  • 44 Business Machine Orientation
  • 45 Photography
  • 46 Business Management
  • 47 Junior Achievement
  • 48 Dance Lessons
  • 49 CVAE--Combined Vocational-Agricultural Education
  • 50 Sheetmetal Apprenticeship Program

Code 51-75

  • 51 Adult Education
  • 52 Technical Institute
  • 53 Food Sanitation
  • 54 Recreation Leader
  • 55 Upward Bound, Explorer's Program, etc.
  • 56 Certified Nurse's Aide
  • 57 Acting School
  • 58 Mining School
  • 59 Contract-writing
  • 60 Waste Water Treatment
  • 61 Bartending School
  • 62 Pre-Collegiate Program
  • 63 Tape-setting
  • 64 Bee-keeping
  • 65 Law Enforcement
  • 66 Clerical/General Office Work
  • 67 OEDE - Office Education and Distributive Education
  • 68 CCD - Christian Training
  • 69 Public Speaking
  • 70 Future Farmers of America Leadership Program
  • 71 Coastline Regional Occupation Program
  • 72 Guide to Better Living (given at reformatory)
  • 73 Carpet Installation
  • 74 Operation SER (English as a Second Language)
  • 75 Construction

Codes 76-96

  • 76 Seamanship
  • 77 Rehabilitation Program
  • 78 Careers Unlimited
  • 79 Keypunching
  • 80 Vocational School--English
  • 81 Dale Carnegie Course
  • 82 Machinist
  • 83 Social Work
  • 84 Cadet Teaching
  • 85 Working Reference Program
  • 86 Military Academy Training
  • 87 Lifesaving
  • 88 Parenting Classes
  • 89 Diesel Mechanic
  • 90 Farm Machinery Program
  • 91 Police Volunteer Trainee
  • 92 Human Resources Development Program
  • 93 Hospital Volunteer
  • 94 Alternative School
  • 95 Computer Duster
  • 96 OTHER

NLSY79 Attachment 4: Fields of Study in College

Codes for major fields of study and subspecialties


0100 Agriculture and Natural Resources

  • 0101 Agriculture, General
  • 0102 Agronomy
  • 0103 Soils Science
  • 0104 Animal Science
  • 0105 Dairy Science
  • 0106 Poultry Science
  • 0107 Fish, Game, and Wildlife Management
  • 0108 Horticulture
  • 0109 Ornamental Horticulture
  • 0110 Agricultural and Farm Management
  • 0111 Agricultural Economics
  • 0112 Agricultural Business
  • 0113 Food Science and Technology
  • 0114 Forestry
  • 0115 Natural Resources Management
  • 0116 Agriculture and Forestry Technologies
  • 0117 Range Management
  • 0118 Pest Control and Crop Protection
  • 0199 Other

0200 Architecture and Environmental Design

  • 0201 Environmental Design, General
  • 0202 Architecture
  • 0203 Interior Design
  • 0204 Landscape Architecture
  • 0205 Urban Architecture
  • 0206 City, Community, and Regional Planning
  • 0299 Other

0300 Area Studies

  • 0301 Asian Studies, General
  • 0302 East Asian Studies
  • 0303 South Asian (India, etc.) Studies
  • 0304 Southeast Asian Studies
  • 0305 African Studies
  • 0306 Islamic Studies
  • 0307 Russian and Slavic Studies
  • 0308 Latin American Studies
  • 0309 Middle Eastern Studies
  • 0310 European Studies, General
  • 0311 Eastern European Studies
  • 03l2 West European Studies
  • 0313 American Studies
  • 0314 Pacific Area Studies
  • 0315 French Studies
  • 0399 Other

0400 Biological Sciences

  • 0401 Biology, General
  • 0402 Botany, General
  • 0403 Bacteriology
  • 0404 Plant Pathology
  • 0405 Plant Pharmacology
  • 0406 Plant Physiology
  • 0407 Zoology, General
  • 0408 Pathology, Human and Animal
  • 0409 Pharmacology, Human and Animal
  • 0410 Physiology, Human and Animal
  • 0411 Microbiology
  • 0412 Anatomy
  • 0413 Histology
  • 0414 Biochemistry
  • 0415 Biophysics
  • 0416 Molecular Biology
  • 0417 Cell Biology
  • 0418 Marine Biology
  • 0419 Biometrics and Biostatistics
  • 0420 Ecology
  • 0421 Entomology
  • 0422 Genetics
  • 0423 Radiobiology
  • 0424 Nutrition, Scientific
  • 0425 Neurosciences
  • 0426 Toxicology
  • 0427 Embryology
  • 0428 Pre-med
  • 0429 Pre-vet
  • 0430 Pre-dentistry
  • 0431 Immunology
  • 0499 Other

0500 Business and Management

  • 0501 Business and Commerce, General
  • 0502 Accounting
  • 0503 Business Statistics
  • 0504 Banking and Finance
  • 0505 Investments and Securities
  • 0506 Business Management and Administration 0507 Operations Research
  • 0508 Hotel and Restaurant Management
  • 0509 Marketing and Purchasing
  • 0510 Transportation and Public Utilities
  • 0511 Real Estate
  • 0512 Insurance
  • 0513 International Business
  • 0514 Secretarial Studies
  • 0515 Personnel Management
  • 0516 Labor and Industrial Relations
  • 0517 Business Economics
  • 0518 Organizational Behavior
  • 0599 Other

0600 Communications

  • 060l Communications, General
  • 0602 Journalism
  • 0603 Radio - Television 0604 Advertising
  • 0605 Communication Media
  • 0606 Mass Communications
  • 0607 Public Relations
  • 0608 Group Communications
  • 0699 Other

0700 Computer and Information Sciences

  • 0701 Computer and Information Sciences, General
  • 0702 Information Sciences and Systems
  • 0703 Data Processing
  • 0704 Computer Programming
  • 0705 Systems Analysis
  • 0799 Other

0800 Education

  • 0801 Education, General
  • 0802 Elementary Education, General
  • 0803 Secondary Education, General
  • 0804 Junior High School Education
  • 0805 Higher Education, General
  • 0806 Junior and Community College Education
  • 0807 Adult and Continuing Education
  • 0808 Special Education, General
  • 0809 Administration of Special Education
  • 0810 Education of the Mentally Retarded
  • 0811 Education of the Gifted
  • 0812 Education of the Deaf 08l3 Education of the Culturally Disadvantaged
  • 0814 Education of the Visually Handicapped
  • 0815 Speech Correction and Communicative Disorders
  • 0816 Education of the Emotionally Disturbed
  • 0817 Remedial Education 0818 Special Learning Disabilities
  • 0819 Education of the Physically Handicapped 0820 Education of the Multiple Handicapped
  • 082l Social Foundations
  • 0822 Educational Psychology
  • 0823 Pre-Elementary Education
  • 0824 Educational Statistics and Research
  • 0825 Educational Testing, Evaluation and Measurement
  • 0826 Student Personnel
  • 0827 Educational Administration
  • 0828 Educational Supervision
  • 0829 Curriculum and Instruction and Educational Media
  • 0830 Reading Education 0831 Art Education
  • 0832 Music Education 0833 Mathematics Education
  • 0834 Science Education 0835 Physical Education
  • 0836 Driver and Safety Education
  • 0837 Health Education
  • 0838 Business, Commerce, and Distributive Education
  • 0839 Industrial Arts, Vocational & Technical Education
  • 0840 Guidance and Counseling
  • 0841 English Education
  • 0842 Foreign Languages Education
  • 0843 Social Studies Education
  • 0844 School Management
  • 0845 Speech and Drama Education
  • 0846 School Librarianship
  • 0847 Urban Education
  • 0848 Bilingual Education
  • 0849 Multicultural Education
  • 0850 Community Education
  • 0891 Agricultural Education
  • 0892 Education of Exceptional Children, Not Classified Above
  • 0893 Home Economics Education
  • 0894 Nursing Education
  • 0899 Other

0900 Engineering

  • 0901 Engineering, General
  • 0902 Aerospace, Aeronautical, Astronautical Engineer
  • 0903 Agricultural Engineering
  • 0904 Architectural Engineering
  • 0905 Bioengineering and Biomedical Engineering
  • 0906 Chemical Engineering
  • 0907 Petroleum Engineering
  • 0908 Civil, Construction & Transportation Engineering
  • 0909 Electrical, Electronics, Communications Engineering
  • 0910 Mechanical Engineering
  • 0911 Geological Engineering 0912 Geophysical Engineering
  • 0913 Industrial and Management Engineering
  • 0914 Metallurgical Engineering
  • 0915 Materials Engineering
  • 0916 Ceramic Engineering
  • 0917 Textile Engineering
  • 0918 Mining and Mineral Engineering
  • 0919 Engineering Physics
  • 0920 Nuclear Engineering
  • 0921 Engineering Mechanics
  • 0922 Environmental and Sanitary Engineering
  • 0923 Naval Architecture and Marine Engineering
  • 0924 Ocean Engineering
  • 0925 Engineering Technologies
  • 0999 Other

1000 Fine and Applied Arts

  • 1001 Fine Arts, General
  • 1002 Art 1003 Art History and Appreciation
  • 1004 Music (Performing, Composition, Theory)
  • 1005 Music (Liberal Arts Program)
  • 1006 Music History and Appreciation
  • 1007 Dramatic Arts
  • 1008 Dance
  • 1009 Applied Design and Graphic Design and Fashion Design
  • 1010 Cinematography
  • 1011 Photography
  • 1012 Applied Music
  • 1013 Studio Arts
  • 1014 Commercial Art
  • 1015 History of Architecture
  • 1099 Other

1100 Foreign Languages

  • 1101 Foreign Languages, General
  • 1102 French
  • 1103 German
  • 1104 Italian
  • 1105 Spanish
  • 1106 Russian
  • 1107 Chinese
  • 1108 Japanese
  • 1109 Latin
  • 1110 Greek, Classical
  • 1111 Hebrew
  • 1112 Arabic
  • 1113 Indian (Asiatic)
  • 1114 Scandinavian Languages
  • 1115 Slavic Languages (Other than Russian)
  • 1116 African Languages (Non-Semitic)
  • 1117 Portuguese 1199 Other

1200 Health Professions

  • 1201 Health Professions, General
  • 1202 Hospital and Health Care Administration
  • 1203 Nursing
  • 1205 Dental Specialties
  • 1207 Medical Specialties
  • 1208 Occupational Therapy
  • 1209 Optometry
  • 1211 Pharmacy
  • 1212 Physical Therapy
  • 1213 Dental Hygiene
  • 1214 Public Health
  • 1215 Medical Record Librarianship
  • 1216 Podiatry or Podiatric Medicine
  • 1217 Biomedical Communication
  • 1219 Veterinary Medicine Specialties
  • 1220 Speech Pathology and Audiology
  • 1221 Chiropractic
  • 1222 Clinical Social Work
  • 1223 Medical Laboratory Technologies
  • 1224 Dental Technologies
  • 1225 Radiologic Technologies
  • 1226 Rehabilitation
  • 1227 Expressive Therapy(ies)
  • 1228 Allied Health
  • 1299 Other

1300 Home Economics

  • 1301 Home Economics, General
  • 1302 Home Decoration and Home Equipment
  • 1303 Clothing and Textiles
  • 1304 Consumer Economics and Home Management
  • 1305 Family Relations and Child Development
  • 1306 Foods and Nutrition
  • 1307 Institutional Management and Cafeteria Management
  • 1399 Other

1400 Law

  • 1401 Law, General
  • 1402 Pre-law
  • 1499 Other

1500 Letters

  • 1501 English, General
  • 1502 Literature, English
  • 1503 Comparative Literature
  • 1504 Classics
  • 1505 Linguistics
  • 1506 Speech, Debate, and Forensic Science
  • 1507 Creative Writing
  • 1508 Teaching of English as a Foreign Language
  • 1509 Philosophy
  • 1510 Religious Studies
  • 1511 Literature, General (except English)
  • 1599 Other

1600 Library Science

  • 1601 Library Science, General
  • 1699 Other

1700 Mathematics

  • 1701 Mathematics, General
  • 1702 Statistics, Mathematical and Theoretical
  • 1703 Applied Mathematics
  • 1799 Other

1800 Military Sciences

  • 1801 Military Science (Army)
  • 1802 Naval Science (Navy, Marines)
  • 1803 Aerospace Science (Air Force)
  • 1891 Merchant Marine
  • 1899 Other

1900 Physical Sciences

  • 1901 Physical Sciences, General
  • 1902 Physics, General
  • 1903 Molecular Physics
  • 1904 Nuclear Physics
  • 1905 Chemistry, General
  • 1906 Inorganic Chemistry
  • 1907 Organic Chemistry
  • 1908 Physical Chemistry
  • 1909 Analytical Chemistry
  • 1910 Pharmaceutical Chemistry
  • 1911 Astronomy
  • 1912 Astrophysics
  • 1913 Atmospheric Sciences and Meteorology
  • 1914 Geology
  • 1915 Geochemistry
  • 1916 Geophysics and Seismology
  • 1917 Earth Sciences, General
  • 1918 Paleontology
  • 1919 Oceanography
  • 1920 Metallurgy
  • 1921 Industrial Chemistry
  • 1991 Other Earth Sciences
  • 1992 Other Physical Sciences

2000 Psychology

  • 2001 Psychology, General
  • 2002 Experimental Psychology
  • 2003 Clinical Psychology
  • 2004 Psychology for Counseling
  • 2005 Social Psychology
  • 2006 Psychometrics
  • 2007 Statistics in Psychology
  • 2008 Industrial Psychology
  • 2009 Developmental Psychology
  • 2010 Physiological Psychology
  • 2011 Behavioral Science
  • 2012 Comparative Psychology
  • 2013 Rehabilitation Counseling
  • 2014 Animal Behavior
  • 2099 Other

2100 Public Affairs and Services

  • 2101 Community Services, General
  • 2102 Public Administration
  • 2103 Parks and Recreation Management
  • 2104 Social Work and Helping Services
  • 2105 Law Enforcement and Corrections and Criminology and Criminal Justice
  • 2106 International Public Service
  • 2107 Administration of Justice
  • 2199 Other

2200 Social Sciences

  • 2201 Social Sciences, General
  • 2202 Anthropology
  • 2203 Archaeology
  • 2204 Economics
  • 2205 History
  • 2206 Geography
  • 2207 Political Science and Government
  • 2208 Sociology
  • 2209 Criminology
  • 2210 International Relations
  • 2211 Afro-American (Black Culture) Studies
  • 2212 American Indian Cultural Studies
  • 2213 Mexican-American Cultural Studies
  • 2214 Urban Studies
  • 2215 Demography
  • 2216 Group Studies
  • 2299 Other

2300 Theology

  • 2301 Theological Professions, General
  • 2302 Religious Music
  • 2303 Biblical Languages
  • 2304 Religious Education
  • 2399 Other

4900 Interdisciplinary Studies

  • 4901 General Liberal Arts and Sciences
  • 4902 Biological and Physical Sciences
  • 4903 Humanities and Social Sciences
  • 4904 Engineering and Other Disciplines
  • 4999 Other
  • 9994 Recreation, Outdoor Recreation
  • 9995 Counseling, n.s.
  • 9996 Other

NLSY79 Attachment 3: Industrial and Occupational Classification Codes

This attachment contains the following PDF files of sets of Industry and Occupation code schemes:

1970 Census 3-Digit Industry and Occupation Codes (PDF). All occupations and industries (except military occupational specialties) are coded with 1970 codes. This is the main set of codes.

1980 Census 3-Digit Industry and Occupation Codes (PDF). Beginning with the 1982 survey the respondent's current or last job only is coded with the 1980 codes in addition to the 1970 codes.

2000 Census 3-Digit Industry and Occupation Codes (PDF). The 2000 Census codes were used to code industry and occupation for all jobs in the 2002 NLSY79 survey. Census published slightly revised codes in 2002, and these revised codes were used to code all jobs in the 2004 survey. Census issued another revision in 2003, and these codes were used for the 2006 survey. This attachment lists the 2000 codes, followed by the 2002 and 2003 codes; the 2002 tables note the slight differences from the 2000 list.

1977 Department of Defense 3-Digit Enlisted Occupational Classification System (PDF). All military occupational specialties collected in the military section are coded with this scheme.

NLSY79 Documentation

This section describes these three primary components of the NLSY79 codebook system and discusses the important types of information found within each. An additional codebook supplement exists for the Geocode data file.

Codebooks

The codebook is the principal element of the NLSY79 documentation system and contains information intended to be complete and self-explanatory for each variable in a data file. The software accompanying the NLSY79 data sets allows easy access to each variable's codebook information and permits the user to print a codebook extract for preselected variables.

Every variable is presented within the NLSY79 documentation as a block of information called a "codeblock." Each codeblock entry depicts the following important information:

  • reference number
  • variable title
  • coding information
  • frequency distribution
  • location within the data file
  • reference to the questionnaire item or source of the variable
  • information on the derivation of created variables

Users will find that NLSY79 CAPI codeblocks present greater detail on each variable, including universe totals, universe skip patterns, and range of acceptable values information. Each of these terms is described more completely below. Codeblocks for many variables include special notes containing additional information designed to assist in the accurate use of data from that variable.

Codebooks are arranged in reference number order. As a general rule, raw questionnaire items appear first for a given survey year, followed by items from such instruments as the Information Sheet and Employer Supplement. Variables from the main body of the questionnaire are followed by created or constructed variables drawn from an external data source, such as the County & City Data Book.

Beginning with the 1993 CAPI surveys, questions relating to each job/employer, which were formerly located within the unique Employer Supplements, are merged with the main questionnaire items. A comparison of the reference number assignments used for the 1988 PAPI and 1993 CAPI variables appear in Tables 1 and provide users with a sample set of reference numbers. Users should note that not all survey year assignments will be ordered in precisely this manner.

Table 1. NLSY79 1988 and 1993 reference number assignment
Description 1988 PAPI Rnum 1993 CAPI Rnum
All Raw, Edited and Created Variables R25000.-R28927. R41001.-R44308.
Questionnaire Items R25000.-R27467. R41001.-R43988. (including the Employer Supplement series) Note 1.1
Information Sheet Items R27469.-R27501. R43989.-R44036.
Household Record R27506.-R27609. R44037.-R44126.
Employer Supplement (ES
Note 1.1
R27610.-R28254.  
Children's Record Form R28255.-R28371. R44127.-R44162.
Childhood Residence Calendar 
Note 1.2
R28372.-R28690.  
Created Variables R28704.-R28729. R44163.-R44205.
Supplemental Fertility File Variables R28735.-R28811.  
Geocode Variables R28825.-R28927. R44206.-R44308.

Note: PAPI refers to paper-and-pencil interviews which were conducted with the NLSY79 during 1979-92. CAPI or computer-assisted personal interviews began for the full NLSY79 cohort in 1993.

Note 1.1: Beginning in 1993, variables from the employer supplement series are included within the raw questionnaire items.

Note 1.2: The childhood residence retrospective was unique to 1988.

The following figures give users an example of codebook pages before (Figure 1) and after (Figure 2) CAPI implementation.

Figure 1. NLSY79 sample PAPI codeblock

PAPI codebook diagram

Figure 2. NLSY79 sample CAPI codeblock

CAPI codebook diagram

Coding information

Each codeblock entry presents the set of legitimate codes that a variable may assume along with a text entry describing the codes.

Dichotomous variables

Dichotomous or yes/no variables that are uniformly coded "Yes" = 1, "No" = 0. Other dichotomous variables have frequently been reformulated to permit this convention to be followed.

Discrete variables

Discrete (categorical), as in the case of the categories in 'Activity Most of Survey Week CPS Item':

  • WORKING
  • WITH A JOB, NOT AT WORK
  • LOOKING FOR WORK
  • KEEPING HOUSE
  • GOING TO SCHOOL
  • UNABLE TO WORK
  • OTHER

Continuous variables

Continuous (quantitative), as in the case of hourly rate of pay in the example above. These variables have continuous data but are presented in the codebook using a convenient frequency distribution. NLSY79 users will note that most valid data are positive numbers. Special cases are flagged by negative numbers in the NLSY79. See Appendix 13: Intro to CAPI Questionnaires and Codebooks in the NLSY79 Codebook Supplement for more detail on the handling of negative numbers in the data files. The following conventions have been used throughout the data:

  • Noninterview -5
  • Valid Skip -4
  • Invalid Skip -3
  • Don't Know -2
  • Refusal -1

Important information: Coding information

Coding information for a given variable in the NLSY79 codeblock is:

  1. not necessarily consistent with the codes found within the questionnaire, and
  2. not necessarily consistent for the same variable across years. Use only the codebook coding information for analysis.

Frequency distribution

In the case of discrete (categorical) variables, frequency counts are normally shown in the first column to the left of the code categories. In the case of continuous (quantitative) variables, a distribution of the variable is presented using a convenient class interval. The format of these distributions varies.

Derivations

The decision rules employed in the creation of main file constructed variables have been included, whenever possible, in the codebook under the title "DERIVATIONS." This information enables researchers to determine whether available constructs are appropriate to their needs. In the case of the example NLSY79 variable in Figure 1, no derivation is shown because these variables are picked up directly from the interview schedule. Certain variables will contain a reference to an appendix for the decision rules that were used in creating the variable.

Questionnaire item

"Questionnaire item" is a generic term identifying the printed source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item appearing within one of the survey instruments.

The questionnaire location for NLSY79 entries appears either in parentheses or brackets directly after the reference number, for example R04434. (SO6D1314). The five questionnaire item numbering conventions used in the codebook are described in the Survey Instruments section (see especially Table 2).

Before the adoption of CAPI if an NLSY79 variable was not taken directly from one of the survey instruments, the questionnaire location contained an asterisk (*) in the codebook. The following categories of variables had no questionnaire numbers:

  1. assigned identification numbers for the respondent, child, or family unit;
  2. all derived or constructed variables;
  3. variables from the following special surveys: Profiles (ASVAB), the School Survey, and the Transcript Survey;
  4. variables found on constructed data files such as the Supplemental Fertility File (area of interest "Fertility and Relationship History/Created"); and
  5. variables drawn from an external data source such as those found on the Geocode files.

In CAPI years, survey staff assign a question name that is not used in the questionnaire. This name remains the same in subsequent rounds, so similar created variables can be easily located.

Section, deck, and question numbers have been somewhat arbitrarily assigned to the information and questions found in special survey instruments such as the Household Screener, Information Sheet, Children's Record Forms, Household Interview Forms, and the Employer Supplements. The section and deck numbers for these special survey items were numbered sequentially after the main survey items and their specific order varies each year. The exception to this is the assignment of the deck numbers for the Employer Supplements. Question numbering is discussed earlier in the Survey Instruments section (see especially Table 3).

Universe information

Universe information was attached to select 1979-92 variables. Beginning with the 1993 CAPI interviews, the amount of universe information was expanded to include:

  1. Universe Totals: Two totals are presented:
    • the sum of the frequency counts for each coding category is presented below the individual codes; and
    • the sum of the valid responses plus missing response counts of "refusals," "don't knows," and "invalid skips" can be found in the TOTAL==========> field. The number of respondents who legitimately did not respond to a question, that is, "valid skips (-4)" and "noninterviews (-5)," are also depicted.
  2. Universe Skip Patterns: The following detailed universe information will enable researchers to easily trace the flow of respondents both backward and forward through various parts of the CAPI questionnaire items included in the codebook:

    "Go to Reference # XXXXX.," appended to certain coding categories, indicates that respondents selecting that answer category were routed to the next question specified.

    "Lead In(s) Reference # XXXXX." identifies the question or questions immediately preceding the codeblock question through which the universe of respondents was routed. Each lead-in reference number is followed by the relevant response value indicators, (Default), (ALL), [1:1], [1:6], and so forth. For example:

    • R41000. (All) This means that all cases where R41000. is asked will branch to the current question. This does not imply all respondents are asked question R41000.
    • R41000. (Default) This means that the default path of control from question R41000. is to branch to the current question, but there may be conditions under which a different path would be taken.
    • R41000. [1:6] This means that whenever the response category for question R41000. takes on the values one to six inclusive, the next question is the current question record.

    "Default Next Question" specifies the next question that all respondents of the current codeblock will be asked unless some other skip condition indicates otherwise.

Valid values range

Depicted below the frequency distribution is information relating to the range of valid values for that particular distribution. "MINIMUM" indicates the smallest recorded value exclusive of "NA" and "DK." "MAXIMUM" indicates the largest recorded value. The computer-assisted interview contains internal range checks that limit responses to those between predesignated values, alert interviewers to verify unusual values, and bolster the information provided by the traditional minimum and maximum fields (see, for example, Figure 2 above).

  • Maximum and Minimum Fields. The MIN and MAX fields define the range, that is, the lower limit and the upper limit, of data values for a given question. A MAX of $156,359 on an income question, for example, means that this value was the highest value recorded.
  • Hardmax and Hardmin Fields. Hard Maximum and Hard Minimum fields denote the highest and lowest values that were accepted by the CAPI program. A Hardmax of 500,000 and a Hardmin of 0 on an income question indicate that no values above $500,000 or values lower than zero (no income) can be accepted. Dates, such as month/day/year of the respondent's last interview [lintdate] and current interview [curdate], are used as Hardmin and Hardmax values in order to restrict responses to certain questions to values within that range. Responses outside this range must be entered by the interviewer in the comment field.
  • Softmax and Softmin Fields. Softmax and Softmin fields cover ranges where an answer may exceed reasonable limits yet remain within the absolute limits and are acceptable after verification. A Softmax set to $80,000 on an income question will cause the machine to "beep" and a warning to appear on the screen. Interviewers are thus alerted that the value is unusual and the respondent's answer should be verified.
  • Restricted Income Values. Confidentiality issues restrict release of all income values. To insure respondent confidentiality, the values of income variables exceeding particular limits are truncated and the upper limits converted to a set maximum value.
    1. From 1979 through 1984, the upper limit on income variables was $75,000, and any amounts exceeding $75,000 were converted to $75,001
    2. Beginning in 1985, the upper limit on income amounts was increased to $100,000 due to inflation and the advancing age of the cohort, and amounts exceeding $100,000 were converted to $100,001
    3. Beginning in 1996, the top two percent of respondents with valid values were averaged and that average value replaced all values in the top range
  • Users should be aware of these changes in the income ceiling if they are carrying out longitudinal analyses with these data. Upward trends in mean income statistics may reflect this change in the ceiling value. More information about truncation is available in the Income section.
  • Restricted Asset Values. Confidentiality issues also restrict release of all asset values. To insure respondent confidentiality, the values of asset variables exceeding particular limits are truncated and the upper limits converted to a set maximum value. The asset amounts have different upper limits, and the types of variables and limits for those variables are as follows:
    1. Starting in 1985 all mortgage, market value of residential property, debt on residential property, miscellaneous debt and total market value of assets worth more than $150,000 were converted to $150,001; the market value and debt on a farm or business and savings that was worth more than $500,000 was converted to $500,001; the market value and debt on vehicles that was more than $30,000 was converted to $30,001
    2. Beginning in 1989, the amounts exceeding the upper limits mentioned above were assigned the average value of all values exceeding the limits, in an effort to more accurately reflect the true range of income and asset values
    3. Beginning in 1996, the top two percent of respondents with valid values were averaged and that average value replaced all values in the top range
  • Users should be aware of these changes in the asset ceiling if they are carrying out longitudinal analyses with these data. Upward trends in mean asset statistics may reflect this change in the ceiling value. More information about truncation is available in the "Assets" section of this guide.

Verbatim

Generally during the PAPI years, when a NLSY79 variable was taken directly from the questionnaire, the verbatim of the question appeared beneath the variable title. If a question is the source for more than one variable, the first variable contains the verbatim while subsequent variables prompt the user to refer back to the variable containing the verbatim. The following verbatim responses appear for reference numbers R03194. and R03195. and demonstrate this convention.

  • R03194. 'In Which Months of 1979 Did You (or Your Husband/Wife) Receive Supplemental Security Income? January 80 INT'
  • R03195. 'See R (3194.) February'

Codebook supplements and other technical documentation

The Other Documentation section of the website includes several items that provide additional information about the NLSY79 survey. There are two NLSY79 codebook supplements. The first supplement, the NLSY79 Codebook Supplement, contains a series of attachments and appendices, variable creation procedures, supplementary coding categories, and derivations for selected variables on the main NLSY79 data files. Information provided within this document is not available in the NLSY79 codebooks, nor will it be found on the documentation files on the NLSY79 data sets. The other supplement contains comparable information specific to the NLSY79 Geocode data files. The Technical Sampling Report describes the selection of the NLSY79 sample and provides additional statistical information. Finally, the School & Transcript Surveys Documentation provides technical information about those special data collections.

Error updates

Prior to working with an NLSY79 data file, users should make every effort to acquire information on current data or documentation errors. A variety of methods are used to notify users of errors in the data files or documentation and to provide those persons who acquired an NLSY79 data set directly from the Center for Human Resource Research with corrected information.

When data errors are discovered within the data file, the correction is made and the date file is updated. These updated files then become the default files on NLS Investigator. NLSY79 Errata notices can be found in "Other Documentation" section.

NLS Investigator

NLSY79 variables (as well as the variables from other other NLS cohorts) are accessed using NLS Investigator, which is available as a Web application. The main application of NLS Investigator is to access NLS variables for the purposes of identifying, selecting, extracting, and/or running frequencies or cross-tabulations. This interface allows the researcher to connect to a database and perform variable extractions without installing any software on a local computer. Through a personal online account, a researcher's selected variable tag sets, frequencies, and extracts are available for a specified period of time from any computer location with Web access. Because there is one central data source for all users, researchers will have the assurance that they are always working with the most up-to-date data, and that any necessary corrections will be immediate and universal.

Need help with NLS Investigator?

  1. Access NLSY97 variables by connecting to NLS Investigator.
  2. Get help using NLS Investigator through the NLS Investigator User Guide.
  3. Learn how to perform efficient NLS Investigator searches with the tutorial, Variable Search in the NLS Investigator.

Item Nonresponse

This section examines and quantifies the extent of missing data, formally called item nonresponse, in the NLSY79. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.

This section focuses on the three types of missing data: refusals, invalid skips, and don't knows. Overall, the section shows that in these six rounds of the NLSY79, 20 million questions were asked. Out of all the questions asked to respondents, about 1.5 percent do not have valid answers and are missing data. Of the three missing data categories, about half the missing data are don't knows and about half are invalid skips. Given the vast majority of invalid skips occur in paper-and-pencil years, the percentage of problems attributed to this category has been steadily falling as more computer survey rounds are fielded.

Introduction

Missing data, or nonresponse, happens in a number of ways in the NLSY79. First, a number of respondents do not participate at all, causing all information in that particular survey to be missing. Participation rates and reasons for noninterview in each survey round are discussed in the section on Retention & Reasons for Noninterview.

A second reason missing data occurs is that respondents do not provide a valid answer to a question. When this happens, interviewers make a determination about whether to mark the answer as a refusal or don't know value. Users should be cautioned that the assignment of refusals and don't knows is likely to vary across interviewers. Moreover, some respondents may believe it is impolite to refuse a question and decline to answer by saying they do not know. Hence, whether a question is marked either a refusal or a don't know is somewhat arbitrary. Note: Financial questions may often elicit the "refusal" or "don't know" responses. For more information about nonresponse to financial questions, see Appendix 26: Non-Response to Financial Questions and Entry Points.

The last major way missing data can occur is when the interviewer incorrectly follows the survey instrument's flow. Incorrect flows result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked. Data archivists have removed from the data most of the extraneous question responses. While extra information can be removed, missing data is not imputed in the NLSY79. Missing data caused by this reason is flagged with a special "invalid skip" code. The number of invalid skipped drops precipitously beginning in 1993 with the introduction of CAPI. Nevertheless, invalid skips are still possible in CAPI data. If the CAPI survey contains a programming mistake, the instrument could incorrectly sequence a respondent. When these errors are found, the CAPI survey is patched in the field to prevent further invalid skips but the incorrect cases are not asked the questions again.

All missing data are clearly flagged in the NLSY79 data set. Five negative numbers are used to indicate to users that the variable does not contain useful information. The five values are (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview. These five numbers are reserved as missing value flags and, with a few exceptions (see Appendix 5: Supplemental Fertility and Relationship Variables), are rarely used in the NLSY79 for valid data values.

In the tables that follow, every attempt has been made to look at only variables in a given survey year that were filled in by either a respondent or an interviewer. The goal was to eliminate all created, machine check, date and time stamp, and variables generated in data post-processing from the analysis. Given there is no automatic way to check every question to see if it meets these criteria, the number of questions analyzed by the below tables overstates the number of questions actually filled in by the respondent or interviewer. The overstatement occurs because some questions with meaningful titles are actually hidden machine checks. While every effort was made to eliminate these questions it is impossible to eliminate all of them.

This section is not the only research on the extent of missing data in the NLS. Olsen (1992) investigated the effect of switching from PAPI to CAPI interviewing. His research shows fewer interviewer errors occur from navigating the instrument as well as fewer don't knows in the CAPI survey. More importantly, CAPI respondents appeared more willing to reveal sensitive material in the alcohol use section. Mott (1985, 1984, and 1983) examines the NLSY79's fertility data. In these reports, he examines the 1982 and 1983 surveys and finds very low refusal rates for the data in general. However, by shifting to a confidential abortion reporting method, the willingness to respond greatly increases. Mott (1998) examines the amount of missing data about the children of NLSY79 females. He finds that Hispanics or Latinos and, to a smaller extent blacks, have a much higher probability of not finishing the child assessments after starting the interview.

Additional nonresponse information

The Item Nonresponse by Section examines which sections of the NLSY79 have high nonresponse rates; the Item Nonresponse by Respondents examines how many times individuals do not respond to questions; and the Item Nonresponse within Problem Sections examines which particular questions in sections with high nonresponse rates are causing problems.

Click below to read more about each nonresponse topic.

This section examines and quantifies the extent of missing data, formally called item nonresponse, in each section of the NLSY79. The six tables below show which areas of the NLYS79 respondents are least likely to answer by tracking the total number and percentage of questions that have missing data for each group of respondents. To provide readers with a detailed view of this problem, six surveys are analyzed. Nonresponse rates are examined first in the 1979 survey and then in the surveys that occur at roughly five-year intervals (1984, 1989, 1994, 1998, and 2004). These years were chosen to capture the major changes in the NLSY79. Examining the 1979 survey shows the initial levels of nonresponse. Examining the 1984 survey shows the amount of nonresponse in the survey just before one part of the respondent pool was dropped. The 1989 data show nonresponse after the first set of NLSY79 respondents was dropped. The 1994 data show what occurred after users and interviewers were switched from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI). While no major survey changes occurred during the 1998 and 2004 surveys, these surveys show nonresponse rates after many respondents had participated around 20 times.

The first column of the tables contains the section names within the survey. The second column shows the total number of questions that all respondents and all interviewers should have answered in that section. This number is determined by first calculating within each section the number of questions each respondent should answer. A question is considered answerable if it does not have a valid skip (-4) or noninterview (-5) as its answer. A total for the section is obtained by summing up the answers for all NLSY79 respondents.

The third (don't know), fourth (refusal), and fifth (invalid skip) columns show the total number of nonresponses found in each section. Columns six, seven, and eight show the same information except in percentage form. The ninth column shows the total percentage of questions missed and is the sum of the previous three percentages. The last column, labeled rank, shows which sections have the most (closer to 1) and least (further from 1) amount of nonresponse.

The bottom row of each table combines the information and shows totals. For example, the bottom of the "Number Questions Asked" column in the 1979 survey shows that almost four million questions (3,975,146) were expected to be filled in by respondents or interviewers. While the 1979 survey contains many questions, other years are not far behind. In 1984, there were 3 million questions, 1989 had 1.8 million, 1994 had 3.7 million questions, 1998 had had 4.1 million questions and 2004 had 3.7 million. Readers are cautioned that each year of NLSY79 data contains far more data points since the tables exclude questions obviously labeled as machine checks, date and time stamps, and questions with valid skip or noninterview data flags.

The six tables show that the overall rate of missing data for many years dropped steadily over time. In 1979, 2.7 percent of the questions in the survey were not answered. This number drops to 1.9 percent in 1984 and then falls to 0.9 percent in 1989 and reaches a low point of 0.7 percent in 1994. After 1994 the number rises again with 0.92 percent in 1998 and 1.42 percent in 2004. Hence, nonresponse problems are of slightly less concern after the initial round of surveying.

Combining the data from all sections in all the tables shows the majority of nonresponse is caused by don't knows and invalid skips. The surveys examined asked a total of 20 million questions. Of these questions more than 140,000 or 0.7 percent were don't knows and slightly more than 127,000, or 0.6 percent were invalid skips. The last category, refusal, contains about 26,000 questions which is roughly 0.1 percent of all questions asked.

Examining the tables over time shows a steady decrease in the amount of data missing due to invalid skips. In 1979, invalid skips accounted for 2.1 percent of the questions asked. This number dropped sharply to 1.2 percent by 1984 and then down to 0.25 percent by 1989. Analysis indicated that CAPI dramatically lowered the problem of invalid skips with only 57 questions out of almost 3.7 million incorrectly skipped in 1994 and 75 questions out of 4 million in 1998.

While invalid skips fall over time, the percentage of refusals has increased slightly. Refusals accounted for 0.01 percent in 1979, 0.07 percent in 1984, 0.10 percent in 1989, 0.16 percent in 1994, 0.19 percent in 1998, and 0.20 percent in 2004. Nevertheless, while refusals steadily increase over time in absolute terms the numbers are still quite small.

While invalid skips fall and refusals are rising over time, the trend in don't knows is more complex. Don't knows accounted for 0.6 percent in 1979, 0.6 percent in 1984, 0.5 percent in 1989, 0.5 percent in 1994, 0.7 percent in 1998, and 1.1 percent in 2004. These figures suggest that don't knows are making a U-shaped pattern over time.

The last column, labeled rank, shows that missing data are not confined to a single section or area of the survey. Table 1.1 shows that in 1979 the work experience section, with 14.5 percent of the questions missing valid data, had the most problems. Fourteen percent of all questions asked in this section are labeled as invalid skips and only 0.5 percent of the questions were either refusals or don't knows. Military experience, the second most problematic section had almost half the rate of missing data (7.8 percent) as work experience. The table shows the problem of invalid skips is not related to subject matter since the section (rank 21 out of 21) with the least problems, titled "On Jobs," also focuses on labor market issues, like work experience.

While the "On Jobs" section of the survey consistently has the least problems in these surveys, the section with the most problems changes. Table 1.2, which examines the 1984 survey, shows the most problems in the "Fertility" section. Of the almost half-million questions asked in the fertility section, 5.6 percent contain missing data. While the majority of problems (3.4 percent) were due to invalid skips, a surprisingly large 2 percent of the missing responses are don't knows. The second most problematic section in the 1984 survey was "Drug Use", where 2.7 percent of the questions have missing data. Like "Fertility," the major portion of the problem is invalid skips (1.8 percent), but don't knows (0.8 percent) also account for a significant share. Interestingly, refusals account for only 0.1 percent, a relatively small proportion for a sensitive topic, suggesting that some of the don't knows were hidden refusals.

Scroll right to view additional table columns.

Table 1.1. Extent of refusals, don't knows, and invalid skips in 1979

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent  Refused Percent  Invalid Skipped Total Percent  Missed Rank

Family Background

660803 6196 90 12292 0.94% 0.01% 1.86% 2.81% 7

Marital Status

32995 131 25 467 0.40% 0.08% 1.42% 1.89% 14

Fertility

82141 679 23 624 0.83% 0.03% 0.76% 1.61% 17

Schooling

402134 994 14 5592 0.25% 0.00% 1.39% 1.64% 16

Pay

211504 22 0 3482 0.01% 0.00% 1.65% 1.66% 15

World of Work

220185 2220 31 2883 1.01% 0.01% 1.31% 2.33% 10

Military

145619 491 24 10885 0.34% 0.02% 7.47% 7.83% 2

CPS

396697 862 8 10969 0.22% 0.00% 2.77% 2.98% 5

On Jobs

230982 135 2 903 0.06% 0.00% 0.39% 0.45% 21

Employer Supplement

291836 2009 69 3575 0.69% 0.02% 1.23% 1.94% 13

Last Job

44504 31 0 261 0.07% 0.00% 0.59% 0.66% 20

Work Experience

67695 288 15 9476 0.43% 0.02% 14.00% 14.45% 1

Gov't Training

36728 62 28 2124 0.17% 0.08% 5.78% 6.03% 3

Other Training

103662 52 0 2936 0.05% 0.00% 2.83% 2.88% 6

Not at Work

90768 79 7 5019 0.09% 0.01% 5.53% 5.62% 4

Health

67869 358 2 545 0.53% 0.00% 0.80% 1.33% 18

Significant Others

58816 669 0 585 1.14% 0.00% 0.99% 2.13% 12

Residences

52845 94 7 1029 0.18% 0.01% 1.95% 2.14% 11

Rotter Scale

202976 1277 15 521 0.63% 0.01% 0.26% 0.89% 19

Income & Assets

321685 1667 216 6813 0.52% 0.07% 2.12% 2.70% 8

Expectations

252702 3824 20 2092 1.51% 0.01% 0.83% 2.35% 9

Total

3975146 22140 596 83073 0.56% 0.01% 2.09% 2.66% -
Table 1.2. Extent of refusals, don't knows, and invalid skips in 1984

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent Refused Percent Invalid Skipped Total Percent Missed Rank

Calendar

88462 8 0 4 0.01% 0.00% 0.00% 0.01% 15

Marital Status

50206 273 18 561 0.54% 0.04% 1.12% 1.70% 4

Schooling

324139 1031 469 2164 0.32% 0.14% 0.67% 1.13% 9

Military

123126 337 41 1352 0.27% 0.03% 1.10% 1.41% 7

CPS

333267 467 5 4270 0.14% 0.00% 1.28% 1.42% 6

On Jobs

140382 0 0 17 0.00% 0.00% 0.01% 0.01% 16

Gaps in Jobs

120601 15 0 175 0.01% 0.00% 0.15% 0.16% 13

Gov't Training

31226 38 0 59 0.12% 0.00% 0.19% 0.31% 12

Other Training

45002 7 0 736 0.02% 0.00% 1.64% 1.65% 5

Fertility

462288 9141 891 15739 1.98% 0.19% 3.40% 5.57% 1

Child Care

114317 201 13 1157 0.18% 0.01% 1.01% 1.20% 8

Health

52866 35 3 29 0.07% 0.01% 0.05% 0.13% 14

Alcohol

314511 33 47 2234 0.01% 0.01% 0.71% 0.74% 11

Drug Use

414007 3464 300 7454 0.84% 0.07% 1.80% 2.71% 2

Income & Assets

439646 2945 241 938 0.67% 0.05% 0.21% 0.94% 10

Attitudes

13427 214 2 29 1.59% 0.01% 0.22% 1.82% 3

Total

3067473 18209 2030 36918 0.59% 0.07% 1.20% 1.86% -

Table 1.3 shows the amount of nonresponse in the 1989 survey. The most problematic section is "Income", missing data in 1.3 percent of its questions, with the CPS section a close second with 1.2 percent. Unlike earlier years, the major missing data problem in both the "Income" (1 percent) and CPS (0.8 percent) sections are don't knows, not invalid skips (0.1 percent income and 0.4 percent CPS).

Table 1.3. Extent of refusals, don't knows, and invalid skips in 1989

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent Refused Percent Invalid Skipped Total Percent Missed Rank

Intro

14647 20 1 41 0.14% 0.01% 0.28% 0.42% 7

Marital Status

86563 372 121 450 0.43% 0.14% 0.52% 1.09% 3

Schooling

76999 179 39 217 0.23% 0.05% 0.28% 0.56% 6

Military

33579 1 1 40 0.00% 0.00% 0.12% 0.13% 10

CPS

406265 3320 52 1650 0.82% 0.01% 0.41% 1.24% 2

On Jobs

39749 0 0 1 0.00% 0.00% 0.00% 0.00% 12

Gaps in Jobs

91565 91 1 894 0.10% 0.00% 0.98% 1.08% 4

Gov't Training

49657 118 35 233 0.24% 0.07% 0.47% 0.78% 5

Fertility

152546 6 35 92 0.00% 0.02% 0.06% 0.09% 11

Health

154024 120 74 168 0.08% 0.05% 0.11% 0.24% 9

Alcohol

217441 74 400 201 0.03% 0.18% 0.09% 0.31% 8

Income

470686 4761 1124 439 1.01% 0.24% 0.09% 1.34% 1

Total

1793721 9062 1883 4426 0.51% 0.10% 0.25% 0.86% -

Table 1.4 shows that the most problematic area in the 1994 survey includes the asset questions, which are missing 2.5 percent of their answers (75 percent of those missing being don't knows). The second most problematic area includes income questions, which are missing 1.3 percent of their answers. While in the three previous surveys refusal rates were not an issue, the 1994 survey shows refusals are becoming significant. Slightly more than half a percent (0.6 percent) of the "Asset" section questions and more than one fifth of a percent (0.2 percent) of the "Income" section questions were refused.

Table 1.4. Extent of refusals, don't knows, and invalid skips in 1994

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent Refused Percent Invalid Skipped Total Percent Missed Rank

Intro

36251 62 14 0 0.17% 0.04% 0.00% 0.21% 12

Marital Status

137540 1522 193 0 1.11% 0.14% 0.00% 1.25% 3

School

60166 302 2 0 0.50% 0.00% 0.00% 0.51% 7

Military

27372 6 1 0 0.02% 0.00% 0.00% 0.03% 15

CPS

269452 28 9 0 0.01% 0.00% 0.00% 0.01% 17

On Jobs

79567 6 7 0 0.01% 0.01% 0.00% 0.02% 16

Employer Supplement

1060679 7092 1342 8 0.67% 0.13% 0.00% 0.80% 5

Training

194147 246 29 47 0.13% 0.01% 0.02% 0.17% 13

Fertility

450871 1859 763 0 0.41% 0.17% 0.00% 0.58% 6

Child Care

26453 109 12 0 0.41% 0.05% 0.00% 0.46% 9

Relationship

81477 285 113 0 0.35% 0.14% 0.00% 0.49% 8

Health

282702 623 199 0 0.22% 0.07% 0.00% 0.29% 11

Alcohol

164663 46 61 0 0.03% 0.04% 0.00% 0.06% 14

Income

305693 3176 672 1 1.04% 0.22% 0.00% 1.26% 2

Program Participation

118305 297 63 0 0.25% 0.05% 0.00% 0.30% 10

Assets

169301 3239 930 1 1.91% 0.55% 0.00% 2.46% 1

Drugs

204621 772 1626 0 0.38% 0.79% 0.00% 1.17% 4

Total

3669260 19670 6036 57 0.54% 0.16% 0.00% 0.70% -

Table 1.5 examines the 1998 survey. Since the survey is fielded every other year in the late 1990s there is no 1999 interview, which would exactly continue the every five-year pattern. The 1998 survey is used as the closest substitute. This table, like the one for 1994, shows that the most problematic area is again the asset questions, which are missing 3.6 percent of their answers (75 percent of those missing being don't knows). The second most problematic area is the marital history questions, which added a new section that asked detailed questions about the work history and past life of the respondent's spouse. This expanded section is missing 1.8 percent of its answers. In the 1998 survey only two sections have relatively high refusal rates; assets (almost 0.6 percent) and drug use (0.79 percent).

Table 1.5. Extent of refusals, don't knows, and invalid skips in 1998

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent Refused Percent Invalid Skipped Total Percent Missed Rank

Intro

10060 6 4 0 0.06% 0.04% 0.00% 0.10% 12

Marital Status

207805 3296 520 1 1.59% 0.25% 0.00% 1.84% 2

School

53928 197 45 0 0.37% 0.08% 0.00% 0.56% 10

Military

25691 0 0 0 0.00% 0.00% 0.00% 0.00% 15

CPS

301160 44 12 0 0.01% 0.00% 0.00% 0.02% 13

On Jobs

117144 2 0 1 0.00% 0.00% 0.00% 0.00% 14

Employer Supplement

1081493 10265 1441 1 0.95% 0.13% 0.00% 1.08% 3

Training

241013 1559 143 1 0.65% 0.06% 0.00% 0.71% 7

Fertility

578831 3180 1097 50 0.55% 0.19% 0.01% 0.75% 6

Child Care

23241 57 11 1 0.25% 0.05% 0.00% 0.30% 11

Relationship

86632 371 154 0 0.43% 0.18% 0.00% 0.61% 9

Health

350533 2460 223 0 0.70% 0.06% 0.00% 0.77% 5

Income

608849 3410 847 10 0.56% 0.14% 0.00% 0.70% 8

Assets

174570 4702 1566 10 2.69% 0.90% 0.01% 3.60% 1

Drugs

217175 419 1485 0 0.19% 0.68% 0.00% 0.88% 4

Total

4078125 29968 7548 75 0.73% 0.19% 0.00% 0.92% -

Table 1.6 examines the 2004 survey. This survey has two new sections that are not seen in the previous tables. The first section is found in the employer supplement and asks the respondent detailed questions about the pensions available from their employer and the respondent's participation in these pensions. This new section is ranked first in problems and has missing responses to 2.5% of all questions. The second new section is the over 40 health module. The goal of this section is to provide researchers with a baseline health measure that will be updated at ten year intervals. The health section is ranked 8th out of 13 sections and has a nonresponse rate slightly more than three-quarters of one percent.

Table 1.6. Extent of refusals, don't knows, and invalid skips in 2004

Section Name

Number Questions Asked Number Don't Knows Number Refused Number Invalid Skipped Percent Don't Knows Percent Refused Percent Invalid Skipped Total Percent Missed Rank

Intro

91277 39 16 4 0.04% 0.02% 0.00% 0.06% 12

Marital Status

77954 371 66 106 0.48% 0.08% 0.14% 0.70% 9

School

56716 554 39 4 0.98% 0.07% 0.01% 1.05% 7

Military

39772 20 5 0 0.05% 0.01% 0.00% 0.06% 13

Employer Supplement

734366 7729 1001 275 1.05% 0.15% 0.04% 1.23% 6

Pensions

189861 3753 508 485 1.98% 0.27% 0.26% 2.50% 1

Training

307708 2943 887 322 0.96% 0.29% 0.10% 1.35% 5

Fertility

521658 5801 733 1216 1.11% 0.14% 0.23% 1.49% 3

Child Care

34561 12 4 7 0.03% 0.01% 0.02% 0.07% 11

Relationship

1004 2 0 0 0.20% 0.00% 0.00% 0.20% 10

Over 40 Health

622644 4386 402 14 0.70% 0.06% 0.00% 0.77% 8

Income

412656 4382 1199 39 1.06% 0.29% 0.01% 1.36% 4

Assets

626393 12726 2634 233 2.03% 0.42% 0.04% 2.49% 2

Total

3716570 42718 7494 2705 1.15% 0.20% 0.07% 1.42% -

This section provides details on the amount of missing data associated with each respondent. Each table in this section shows the number of respondents who are missing data in one of the surveys. The tables are split into two parts. The left-hand part, columns one to four, shows the total number of questions that have missing data for each group of respondents. The right-hand part, columns five to nine, shows the percentage of questions that have missing data.

The top line of Tables 2.1.1 shows that in the 1979 survey, 12,527 respondents never refused to answer questions. While refusals are quite rare in this survey round, don't knows and incorrect skips are quite frequent. The top line shows that only 5,084 respondents had zero don't know responses and only 2,347 respondents were sent through the entire questionnaire without any sequencing errors. Subtracting these numbers from the 12,686 total respondents means that 60 percent, or 7,602 respondents, stated they did not know the answer to at least one question and 81. 5 percent, or 10,339 respondents, were incorrectly skipped somewhere in that questionnaire.

The top line of Table 2.1.2, which examines the percentage of questions missing data, shows a similar picture. Refusal rates are relatively low. There are 12,620 respondents who refused less than one percent of their questions, which means only 66 respondents refused one percent or more of the questions they were expected to answer. Thirty-five percent, or 8,185 respondents, answered don't know to less than one percent of their questions. Again, the largest group was respondents who were incorrectly skipped over questions. Only 4,313 respondents were incorrectly skipped over less than one percent of the questions, but 8,373 of the respondents were illegally skipped over one percent or more of their questions and 227 were skipped over more than 10 percent.

Refusal rates have increased steadily over time even though the more difficult respondents have presumably left the survey. Tables 2.2.1 and 2.2.2, which examine the 1984 survey, shows an increase over the 1979 refusal rates. While the number of respondents answering the survey is shrinking, the number refusing to answer questions is increasing. For example, while in 1979 only 10 respondents refused to answer more than 10 questions, in 1984 there were 41 respondents. This pattern of increase is evident in Tables 2.3.1 and 2.3.2, which examine 1989, through to Tables 2.6.1 and 2.6.2, which examine 2004. By 2004, there were 185 respondents who refused to answer more than 10 questions.

Increasing refusal rates are also seen in the percentage side of the table. In 1979, only 66 respondents refused to answer one percent or more of the questions they were asked. This increased in subsequent surveys to 320 respondents in 1984, 355 respondents in 1989, 480 respondents in 1994, 549 respondents in 1998, and 655 respondents in 2004.

"Don't know" rates have also risen over time. In the 1979 survey, 8,185 respondents had less than one percent of their questions labeled as don't knows. This number drops in 1984 to 7,003 respondents and further drops to 6,423 in 1989 and 5,942 in 1994, 4,741 in 1998 and 3,185 in 2004. While rates have risen, relatively few individuals have high levels of don't knows. In 1979, only 68 respondents didn't know the answer to more than five percent of the questions they were asked. This number falls to 19 respondents in 1984 and then rises to 66 in 1989 before falling back to 46 respondents in 1994 and then jumps back to 66 in 1998, and ends with 149 in 2004.

While don't know and refusal rates have risen, incorrect skip problems have clearly shrunk over time. In 1979, there were only 2,347 respondents who were correctly sequenced through the entire survey. In 1984, this number rises to 7,802 respondents, followed by a rise to 9,334 respondents in 1989. In 1994 and 1998 almost every respondent was correctly sequenced. Only 57 and 46 respondents were incorrectly skipped through part of the survey in each year respectively. Moreover, most of the respondents were only incorrectly skipped in a single question. In 2004 there were 349 respondents who were incorrectly skipped through one percent of their questions and 22 who were incorrectly skipped through 2 percent or more.

Nonresponse by Respondents in 1979 survey

Table 2.1.1 Number of respondents with missing data by number of questions in 1979 survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 12527 5084 2347
1 91 2974 1897
2 26 1723 1393
3 13 1016 1158
4 5 629 838
5 2 376 596
6 1 228 489
7 3 173 502
8 3 131 420
9 1 84 340
10 4 57 308
> 10 10 211 2398
Table 2.1.2 Number of respondents with missing data by percent of questions in 1979 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 12620 8185 4313
1% 43 3247 3421
2% 7 773 1733
3% 5 264 989
4% 5 101 621
5% 0 48 397
6% 2 27 312
7% 1 18 278
8% 1 6 206
9% 0 7 118
10% 0 2 71
> 10% 2 8 227

Nonresponse by Respondents in 1984 survey

Table 2.2.1 Number of respondents with missing data by number of questions in 1984 survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 11222 4549 7802
1 610 3012 1289
2 73 1901 622
3 44 1136 413
4 38 668 252
5 13 345 369
6 6 177 174
7 1 108 93
8 7 63 115
9 4 38 73
10 10 28 64
> 10 41 44 803

Note: Not included in this table are 617 respondents who did not answer the survey.

Table 2.2.2 Number of respondents with missing data by percent of questions in 1984 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 11749 7003 8956
1% 207 3807 1267
2% 44 944 674
3% 13 213 284
4% 15 62 133
5% 13 21 84
6% 10 11 139
7% 4 2 137
8% 5 3 107
9% 2 0 68
10% 2 3 36
> 10% 5 0 184

Note: Not included in this table are 617 respondents who did not answer the survey.

Nonresponse by Respondents in 1989 survey

Table 2.3.1 Number of respondents with missing data by number of questions in 1989survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 10221 6135 9334
1 171 2517 781
2 59 1036 189
3 37 395 35
4 20 194 20
5 21 131 16
6 7 75 7
7 10 34 125
8 10 24 18
9 4 10 9
10 7 6 3
> 10 38 48 68
10% 3 8 3

Note: Not included in this table are 2,081 respondents who did not answer the survey.

Table 2.3.2 Number of respondents with missing data by percent of questions in 1989 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 10250 6423 9461
1% 193 3221 843
2% 58 561 51
3% 35 219 69
4% 13 76 86
5% 10 39 24
6% 4 24 10
7% 4 17 10
8% 3 1 5
9% 3 3 9
> 10% 29 13 34

Note: Not included in this table are 2,081 respondents who did not answer the survey.

Nonresponse by Respondents in 1994 survey

Table 2.4.1 Number of respondents with missing data by number of questions in 1994 survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 7168 3559 8832
1 1129 1780 57
2 191 1082 0
3 87 693 0
4 41 443 0
5 28 334 0
6 29 232 0
7 22 171 0
8 21 115 0
9 17 105 0
10 18 72 0
> 10 138 303 0

Note: Not included in this table are 3,797 respondents who did not answer the survey.

Table 2.4.2 Number of respondents with missing data by percent of questions in 1994 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 8409 5942 8889
1% 246 2060 0
2% 81 558 0
3% 41 165 0
4% 31 79 0
5% 20 39 0
6% 19 16 0
7% 6 15 0
8% 10 4 0
9% 9 2 0
10% 4 2 0
> 10% 13 7 0

Note: Not included in this table are 3,797 respondents who did not answer the survey.

Nonresponse by Respondents in 1998 survey

Table 2.5.1 Number of respondents with missing data by number of questions in 1998 survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 7248 2497 8353
1 473 1355 21
2 162 1020 23
3 83 729 0
4 60 589 2
5 42 447 0
6 35 343 0
7 26 277 0
8 19 201 0
9 23 169 0
10 12 120 0
> 10 216 652 0

Note: Not included in this table are 4,287 respondents who did not answer the survey.

Table 2.5.2 Number of respondents with missing data by percent of questions in 1998 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 7850 4741 8385
1% 254 2441 13
2% 86 712 0
3% 58 283 1
4% 54 110 0
5% 27 46 0
6% 30 25 0
7% 14 11 0
8% 4 7 0
9% 8 9 0
10% 2 5 0
> 10% 12 9 0

Note: Not included in this table are 4,287 respondents who did not answer the survey.

Nonresponse by Respondents in 2004 survey

Table 2.6.1 Number of respondents with missing data by number of questions in 2004 survey
Number of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0 6531 1524 6539
1 298 993 440
2 194 755 334
3 171 624 145
4 78 592 42
5 45 486 98
6 51 387 29
7 45 360 13
8 29 314 3
9 23 235 5
10 11 178 7
> 10 185 1213 6

Note: Not included in this table are 5,025 respondents who did not answer the survey.

Table 2.6.2 Number of respondents with missing data by percent of questions in 2004 survey
Percent of Questions Number of Respondents
Refused Didn't Know Was Incorrectly Skipped Over
0% 7006 3185 7290
1% 384 2399 349
2% 106 1122 18
3% 48 477 2
4% 40 226 1
5% 18 103 0
6% 16 68 0
7% 10 29 0
8% 8 14 0
9% 8 17 0
10% 3 6 1
> 10% 14 15 0

Note: Not included in this table are 5,025 respondents who did not answer the survey.

How much missing data are associated with particular questions? This section provides readers with an in-depth view of the questions within survey sections having a high amount of missing data. Like the previous parts, this section provides tables for each of the selected survey years. The first table (Table 3.1) examines questions from the 1979 survey's "Work Experience" section. This section has more missing data (14.5 percent) than any other 1979 survey section. The second set of tables (Tables 3.2 through 3.6) examines the most problematic section of the 1984 survey, "Fertility and Abortion." The third set of tables (Tables 3.7 and 3.8) examines the most problematic 1989 survey section, "Income and Assets." Since the 1994 "Income and Asset" section again ranked first in missing data, the next set of tables (Tables 3.9 and 3.10) substitutes the "Drug and Alcohol Use Supplements," given the high degree of research interest in understanding nonresponse in these sections. Table 3.11 highlights nonresponse in 1998 in the Marital History section. Table 3.12 tracks nonresponse problems in the over-40 health section.

To ensure the sets of tables are not overwhelming, all sections that could be naturally divided are split (Fertility, for instance). Additionally, only the most important question or questions with high rates of nonresponse are shown. Table 3.1, which examines the amount of missing data in the 1979 survey, shows the highest amount of missing data are associated with a pair of retrospective questions that asked respondents to remember what happened two years earlier. Interviewers incorrectly skipped slightly less than 1,750 respondents over R01150., weeks worked in 1977, and R01153., hours worked per week in 1977. Examining the 1979 questionnaire shows that these questions appear at the bottom of a page. Prior to these questions is a fairly complicated half page of instructions and questions that the interviewer must read, understand, and partially speak. It seems likely that many interviewers did not understand the instructions and skipped to the next page.

Table 3.1. Amount of missing data per question in the Work Experience section in 1979 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R01150.

Weeks Work in 1977

1735 11 1

R01151.

Weeks Work in 1976

418 18 1

R01152.

Weeks Work in 1975

240 11 0

R01153.

Hours/Week Work in 1977

1749 13 0

R01154.

Hours/Week Work in 1976

459 16 0

R01165.

Industry of 1st Job after School

628 4 1

R01166.

Occupation at 1st Job after School

627 3 1

R01167.

Hours/Week Work at 1st Job after School

631 6 1

R01168.

Hours/Day at 1st Job after School

632 6 1

R01169.

Rate of Pay at 1st Job after School

632 32 2

Tables 3.2-3.6, which examine the "Fertility" section, show a much lower number of invalid skips in all parts except in the abortion questions. While invalid skips do not reach the level seen in Table 3.1, on average 190 female respondents were not asked each abortion question (190 is an average from all abortion questions, not just those shown in the tables). The table also shows a number of other trends. First, respondents have higher levels of don't know answers the more precise the question being asked. For example, in Table 3.2, when males were asked the date of birth of their first child, only one did not know the year, three did not know the month and 10 did not know the day. This phenomena is most clearly seen in Table 3.5, which shows the year and month of the respondent's first sexual encounter. Only 43 respondents did not know the year, but 1,410 respondents did not know the month. This problem with dates is also seen in the abortion data where only four respondents did not know the year when they had their first abortion, but 13 did not know the month.

Refusal rates in the "Fertility" section are quite low except for a number of key questions. Asking the number of times they had sex in the last month elicited high rates of refusal for males and females. This question elicited 167 male and 135 female refusals. Interestingly, most individuals were willing to answer if they ever had sex since only 45 males and 54 females refused to answer these questions. Birth control questions did not have exceptionally high rates of refusal. Seventeen female respondents and no males refused to answer the birth control questions. Table 3.6 shows that 28 females refused to answer if they ever had an abortion and 28 more refused to state if they dropped out of school before they terminated the pregnancy.

Table 3.2. Amount of missing data per question in male Fertility section in 1984 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13017.

Ever Had Any Children

0 3 0

R13019.

Month Birth Child#1 Born

41 3 0

R13020.

Day Birth Child #1 Born

45 10 0

R13021.

Year Birth Child#1 Born

39 1 0

R13022.

Sex of Child#1 Born

3 0 0

R13115.

Total #Children Expect to Have

12 45 3

R13117.

#Years Expect Have 1st/Next Child

22 120 0

R13118.

Had Any Children/Expecting

0 7 0

R13119.

Current Pregnancy Planned

131 0 0

R13121.

Ever Had Sexual Intercourse

12 0 45

R13122.

Age @First Sexual Intercourse

28 19 23

R13123.

#Times Sexual Intercourse Past Month

11 68 167

R13124.

Is Partner Now Pregnant

0 1 0

R13125.

Use Any Birth Control During Last Month

15 2 0

R13126.

#Times Try Prevent Pregnancy

65 0 0

R13127.-R13141.

Method of Birth Control

16 0 0

R13142.

Ever Have a Sex Education Course

10 0 12

R13148.

Month Took Sex-Ed Course

73 564 0

R13149.

Year Took Sex-Ed Course

36 58 0

R13150.

Time When Pregnancy Most Likely

19 1480 20
Table 3.3. Amount of missing data per question in female Fertility section in 1984 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13191.

#Pregnancies

8 0 0

R13251.

Use Any Birth Control before Preg#1

18 0 1

R13254.

Want Be Pregnant before Preg#1

20 0 0

R13255.

Husband/Partner Want Preg#1

19 20 0

R13283.

Get Prenatal Care Preg#1

57 0 0

R13286.

Frequency Alcohol Use Preg#1

58 0 0

R13288.

#Cigarettes Smoked Preg#1

56 0 0

R13297.

X-Rays Taken Preg#1

57 0 0

R13302.

Sonogram Preg#1

57 6 0

R13358.

Amniocentesis Preg#1

57 0 0

R13411.

Took Vitamins Preg#1

57 0 0

R13443.

C-Section Child#1 Born

52 0 0

R13445.

Weight at Delivery, Preg#1

53 5 1

R13446.

Weight before Preg#1

51 5 1

R13449.

Length Child#1 Born at Birth

53 20 0

R13667.

Weight of Child#1 @Birth Lbs

25 6 0
Table 3.4. Amount of missing data per question in feeding part of Fertility section in 1984 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13670.

Child#1 Breastfed

27 0 0

R13672.

Month Age Child#1 Breast Fed Ended

27 1 0

R13674.

Month Age Child#1 Formula Fed

38 3 0

R13693.

Wk Age Child#1 Formula Fed Ended

57 0 0

R13694.

Month Age Child#1 Formula Fed Ended

57 6 0

R13696.

Months Age Child#1 - Cow's Milk

81 10 0

R13698.

Months Age Child#1 - Solid Food

86 10 0
Table 3.5. Amount of missing data per question in child part of Fertility section in 1984 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13791.

Age Had 1st Menstrual Period

8 14 22

R13792.

Year 1st Menstrual Period

0 7 0

R13793.

Month Had 1st Menstrual Period

17 2207 1

R13794.

R Ever Been Pregnant

0 1 0

R13795.

Ever Had Sexual Intercourse

4 0 54

R13796.

Age First Sexual Intercourse

5 26 78

R13797.

Year 1st Sexual Intercourse

0 43 66

R13798.

Month Sexual Intercourse 1st Time

19 1410 75

R13799.

#Times Sexual Intercourse Past Month

9 104 135

R13802.

#Times Try Prevent Pregnant Past Month

17 0 2
Table 3.6. Amount of missing data per question in abortion questions of Fertility section in 1984 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R13827.

Ever Had An Abortion

135 0 28

R13828.

# of Abortions

143 0 0

R13830.

Year of 1st Reported Abortion

196 4 0

R13837.

Drop out School #1 Pregnant

155 0 28

R13839.

Year Left School 1st Time Pregnant

164 0 0

R13841.

Year Return School Time#1 after Pregnant

258 0 0

Tables 3.7 and 3.8 examine the "Income and Assets" section of the 1989 survey. While invalid skips are relatively rare in this section, refusals and don't know answers are fairly prevalent. The question with the highest amount of missing income data is R29822., which asks how much income was earned by other adults living in the household who were related to the respondent. While the previous questions showed that most respondents knew the type of income received by these family members, 958 could not come up with a specific amount. The second most problematic question with 11 invalid skips, 155 don't knows, and 113 refusals was R29714., which asked the respondent how much they earned from wages, salary, and tips.

Other questions with high numbers of don't knows are R29813., which asked about the amount of money received from other sources like interest and dividends, R29825., which asks about a partner's income, and R29827., which asks the number of exemptions used when filing a Federal tax return.

The asset table (Table 3.8) also shows invalid skips are rare but don't know and refusal rates are not. Surprisingly, one of the questions with the highest amount of missing data (315 missing answers) asks, "how much is your car worth (R29852.)?" Another question missing many observations asks the amount of the respondent's savings (R29835.). While the car worth question primarily elicits don't knows, the savings question resulted in 160 refusals. Three other questions elicited high numbers of don't knows: value of stocks and bonds (R29837.) - 219 don't knows; amount taken out of savings last year (R29842.) - 222 don't knows; and the market value of other items such as jewelry (R29854.) - 151 don't knows.

Table 3.7. Amount of missing data per question in Income section in 1989 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R29714.

Amount Rec from Wages/Salary/Tips

11 155 113

R29715.

In 1988 Receive Income from Own Business

1 0 11

R29717.

How Much Did R Receive after Expenses

6 49 23

R29732.

Amount Rec'd Per Week from Unemployment

0 5 1

R29736.

Amount Sp Rec'd 1988 from Wages

16 17 70

R29754.

How Much Did Sp Receive from Unemployment

8 12 0

R29758.

R/Spouse Rec'd Money for Child Support

1 1 10

R29759.

Amount R/Spouse Rec'd Child Support

2 14 2

R29760.

R/Spouse Rec'd AFDC Payments

0 4 9

R29774.

R/Spouse Rec'd Food Stamps

0 2 10

R29788.

R/Spouse Rec'd SSI/Public Assistance

0 4 9

R29808.

Rec'd Veteran Benefits

1 1 10

R29812.

R/Spouse Rec'd Money from Oth So

0 2 16

R29822.

Income Rec'd by Adults Related To R

7 958 8

R29825.

Total Income Rec'd before Deduct

2 200 4

R29826.

Sp File Federal Income Tax R

0 2 13

R29827.

R'S Filing Status on Federal Ret

11 8 2

R29828.

Exemptions Filed on 1988 Federal Tax

62 92 3
Table 3.8. Amount of missing data per question in Asset section in 1989 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R29831.

Amount Property Selling for on Today

5 53 10

R29832.

Amount R Owes on Property

4 85 25

R29833.

Amount Other Debt R Owes on Property

12 26 27

R29835.

Amount of Savings

7 166 160

R29837.

Current Market Value of Stocks

2 219 23

R29838.

R/Spouse Have Rights to Estate

2 3 18

R29839.

Total Value of Estate

3 90 6

R29840.

Put Money in/out of Savings

1 3 28

R29841.

How Much More Money Put in

6 110 53

R29842.

How Much More Money Take out

5 222 21

R29843.

R Have Business Investment

0 1 12

R29844.

R Have Investment in a Farm

4 0 0

R29847.

Total Market Value of Business

4 75 10

R29848.

Total Amount of Business Debt

1 55 8

R29851.

How Much Does R Owe on Vehicle

0 56 17

R29852.

Amount Vehicle Sells for Today

11 293 11

R29854.

Market Value of Other Items

5 151 25

R29856.

Total Amount R Owes

1 73 13

Table 3.9 and 3.10 examine the drug and alcohol use supplements in the 1994 survey. In these CAPI modules, there are no invalid skips. Interestingly, there are extremely low refusal and don't know rates within the "Alcohol" section (Table 3.9). The question with the highest refusals (nine respondents) asks if the individual had a drink since the 1989 interview. The typical question in the "Alcohol" section received only two refusals. Don't know rates are also low. The maximum number of don't knows at nine occurs in R49803., which asks if the respondent needs to drink more alcohol now in order to get drunk. On average, the "Alcohol" section records only 1.5 don't knows per question.

Table 3.9. Amount of missing data per question in Alcohol Use section in 1994 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R49791.

R Had Drink of Alcohol since 1989

0 3 9

R49792.

Had Alcoholic Beverage in Last 30

0 0 5

R49793.

Times Had 6/More Drinks Last

0 0 1

R49794.

How Many of Last 30 Days Drank A

0 6 2

R49795.

No. of Drinks on Avg. Day When R

0 8 3

R49803.

Need More to Get Drunk Than Before

0 9 0

R49808.

Arrested, in Police Trouble

0 0 3

R49809.

Drink More Than Before

0 4 3

These low numbers of refusals and don't knows are not seen in Table 3.10, which examines the "Drug Use" section. On average, the typical question in this supplement elicited 23 don't knows and 48 refusals. Readers should understand that this supplement was generally filled in directly by the respondent, not by the interviewer. To provide respondents with practice using a computer, the questionnaire asked them two practice questions not related to drug use. Refusal rates are even high for these two test questions, which ask how many more children the respondent expects to have and what type of entertainment, such as movies, concerts, or plays, the respondent went to last year.

The highest number of refusals (119) occurs in R50532., which asks the age the respondent first used marijuana. The second largest number of refusals occurs in a similar question, R50536., which asks the age of first cocaine use. These same questions have very high don't know responses (113 marijuana and 48 cocaine). One other question with a very high don't know rate is R50525., which asks if the respondent ever smoked cigarettes daily. Almost 80 individuals did not know the answer to this question. Given that the question wording is straightforward, it is likely a number of respondents are using don't know as a polite way of refusing to answer the question.

Table 3.10. Amount of missing data per question in Drug Use section in 1994 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R50524.

R Smoked at Least 100 Cigrtts in Life?

0 24 38

R50525.

R Ever Smoked Daily?

0 79 49

R50526.

Age When R 1st Started Smoking Daily?

0 33 12

R50531.

Total Occasion R Use Marijuana

0 33 89

R50532.

Age 1st Time Used Marijuana

0 113 119

R50533.

Most Recent Time Used Marijuana

0 35 89

R50535.

How Many Occasions Used Cocaine

0 19 86

R50536.

Age 1st Time Used Cocaine

0 48 103

R50537.

Most Recent Time Used Cocaine

0 15 78

R50539.

How Many Occasions Used Crack

0 15 77

R50540.

Age 1st Time Used Crack

0 33 82

R50541.

Most Recent Time Used Crack

0 16 74

R50553.

R Used Heroin w/o Doctor's Instr

0 9 53

The top ten questions show that a large number of respondents (ranging from 119 to 181 respondents, depending on the question) have difficulty with questions asking them about their spouse's rate and amount of pay, hours worked and weeks worked. In addition, questions which ask details about a spouse's previous marriage are also quite difficult for many respondents to answer.

Table 3.11. Amount of missing data per question in Marital History section in 1998 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R58067.

Rate of Pay for Spouse Main Job (Time Unit)

0 181 49

R58204.

Age of Spouse at 1st Marriage

0 213 2

R58125.

Spouse's Weekly Earnings at Main Job

0 159 29

R58068.

Spouse Receive Overtime at Main Job

0 151 26

R58127.

Estimate Spouse's Weekly Earning Main Job

0 149 26

R58178.

House Spouse Works Per Week Usually

0 170 1

R58177.

Number of Weeks Worked by Spouse in Last Year

0 140 24

R58179.

Number Weeks Not Working by Spouse Last Year

0 130 24

R58176.

Spouse Hourly Rate of Pay

0 119 28

R58208.

Duration of Spouse's Previous Marriage?

0 109 16

Table 3.12 examines the top questions with missing data problems from the health section in 2004. In this table, reference numbers starting with "R" are for questions asked of all respondents in the survey, while reference numbers starting with "H" represent questions in the "over 40 health module." This module was designed to provide researchers with more information about the health of the respondent when they turned 40 years old and is asked of respondents in the first interview after they turn 40.

While other data from the survey show that many people know if they are covered by health insurance, Table 3.12 reveals that many do not know details about this coverage. For example, one question with a large number of don't knows is R83036., which asks if the respondent's health insurance plan is an HMO, a preferred provider plan (PPO) or a network of affiliated doctors. This question had 428 missing responses out of 6,175 total responses (a 7% missing response rate). Other questions with high don't know rates ask if the respondent's children are covered by health insurance. The health question with the highest refusal rate asks the respondent how much they weigh, with 114 people refusing to divulge the number. Finally, in the 40+ health module a number of NLSY79 respondents have difficulty answering questions about the health and life status of their biological father. This is not surprising given a small but significant number of respondents stated in the past that they have never met their biological father.

Table 3.12. Amount of missing data per question in Health section in 2004 survey

Reference Number

Variable Title

Invalid Don't Know Refusal

R83036.

Primary Insurance Plan HMO, Network, PPO

0 426 2

R83037.

Is Primary Plan a PPO?

0 388 2

R83070.

Children Have Health/Hospitalization Plan?

0 328 15

R83038.

R's Primary Plan Need Authorization?

0 301 0

H00015.

Date Most Recent General Physical Exam

0 189 0

R82983.

How Much Does R Weigh?

0 50 114

H00014.

Ever Had A General Physical Exam?

0 147 2

H00017.

Cause Of Biological Dads Death

0 133 10

H00019.

Bio Dad Have Major Health Problems?

0 134 8

R82982.

Since What Date R Had This Health Limit

0 120 0

R82992.

Length Light Moderate Activities 10 Min

0 105 5

H00047.

Date Hypertension Diagnosed

0 91 0

H00016.

Is R's Biological Dad Living?

0 83 4

R82989.

Frequency of Light Mod Exercise 10 > Min

0 75 6

H00018.

Age Of Biological Dad At Death

0 68 1

H02445.

Date Most Recent Visit to Health Professional

0 52 11

H00012.

R Ever Visit Health Care Professional?

0 58 0

R83042.

Spouse Have Health/Hospital Plan

0 32 24

R83048.

Spouse Employer Pay All Health Plan Cost?

0 49 2

Note: Reference numbers that begin with the letter H are variables that are combined from different years of the over-40 health module. Researchers wanting to see the results from just the 2004 survey should use variable H00002.00, which is titled "Source Year for 40+ Health Module Data." Use this variable to select just those cases which answered the questions in 2004.

References

Mott, Frank L. "Patterning of Child Assessment Completion Rates in the NLSY: 1986-1996." CHRR, The Ohio State University, 1998.

Mott, Frank L. "Evaluation of Fertility Data and Preliminary Analytical Results from the 1983 (5th round) Survey of the National Longitudinal Survey of Work Experience of Youth." CHRR, The Ohio State University, 1985.

Mott, Frank L. "The Patterning of Female Teenage Sexual Behaviors and Attitudes." CHRR, The Ohio State University, 1994.

Mott, Frank L. "Fertility-Related Data in the 1982 National Longitudinal Surveys of Work Experience of Youth: An Evaluation of Data Quality and Some Preliminary Analytical Results." CHRR, The Ohio State University, 1983.

Olsen, Randall J. "The Effects of Computer Assisted Interviewing on Data Quality." CHRR, The Ohio State University, 1992.

Interviewer Remarks

Each NLSY79 questionnaire includes an interviewer remarks section that interviewers complete after finishing the interview with the respondent. Some information is objective, such as the presence of another person during an in-person survey, while other details, such as rating how cooperative the respondent was, rely on the interviewer's subjective assessment.

  • Special circumstances
    • All survey rounds feature a series of questions about special circumstances that might have affected the quality of the data. The interviewers were asked to assess whether the respondent was hard of hearing, unable to see well, unable to read, lacking in basic social skills, mentally handicapped or retarded, physically handicapped, ill/injured, had a poor command of English.
  • Respondent's general demeanor and responsiveness
    • In all survey rounds, interviewers rated how informative and cooperative a respondent was during the interview. In addition, the interviews assessed the respondent's overall understanding (good, fair, poor) of the questions.
  • Presence of others during interview
    • All survey rounds include information about whether others were present (listening and/or participating) during in-person interviews and who the person or persons were (infant child, family member, etc.). Interviewers attempt to secure a private environment for all interviews, so the presence of another individual (other than a small child) is an exception and can be considered a disruption to the interview. 
  • Interviewer characteristics
    • Interviewers provide information on their own ethnicity, age, sex, highest grade completed, and how much experience (measured in years) they had as an interviewer.
  • Interview methodology
    • Interviewers record whether any portion of the interview took place on the phone and indicate if the interview was in Spanish or English.
  • Interviewer retention
    • Interviewers indicate each survey round whether they had interviewed that respondent the previous survey year.

Standard Errors & Design Effects

This section contains information on standard errors and design effects for the NLSY79 sample, briefly discussing how to use these two statistical factors. It then includes tables for the first round and for 1996 through 2022. Users interested in the intervening years should review the Technical Sampling Report and Technical Sampling Report Addendum.

Standard errors have been explicitly computed for a number of statistics based upon the entire NLSY79 sample (total, civilian, and military) and a number of sex or race subclasses. Standard errors for other statistics (defined over the entire sample or the subclasses) may be approximated with use of the DEFT factors given in the linked tables. Users who examine the tables will note that CHRR has calculated standard errors for different variables over time.

Approximate standard errors: Percentages

The following formula approximates a standard error of a percentage:

se(P) approximately equal to DEFT times √P(100-P) divided by √n

where
se(P) = the approximate standard error for the percentage of P
P = the sample percentage (ranging from 0 to 100)
n = the actual unweighted sample size for the demographic subclass from which the percentage was developed
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the percentage was developed

For example, for 1996 the appropriate DEFT factor for estimating a standard error of the percentage of Hispanic or Latino males who were high school dropouts is 1.17744 (see proportion column, row seven of Table 2. Deft factors for round 17, 1996). Assuming the calculated sample (P) equals 22.19 percent and the unweighted sample estimate size is 946, then:

se(P) approximately equal to 1.17744 times √22.19(100-22.19) divided by √946

To approximate the standard error of the corresponding projected population total (NP/100), calculate:

se(NP divided by 100) approximately equal to N[se(P) divided by 100]

where
se(NP/100) = the approximate standard error of the projected population total corresponding to a percentage P within a particular demographic subclass and sample type
N = the appropriate projected total population base for the particular demographic subclass and sample type

For example, if the projected total population base for Hispanic or Latino males is 1,030,861, the projected number of civilian Hispanic or Latino male high school dropouts is equal to NP/100 or 1,030,861 * 22.19/100 = 228,748. Thus, the approximate standard error for the total number of Hispanic or Latino male high school dropouts is:

se(NP divided by 100) approximately equal to 1,030,861 times (1.5907 divided by 100) which is approximately 16,397.9

Note: 1.5907 came from the previous calculation.

Approximate standard errors: Means

One can compute approximate standard errors for means as follows:

se(X) approximately equal to DEFT times √(s squared divided by n)

where
se(X) = the approximate standard error of the mean
DEFT = the appropriate DEFT factor for the particular demographic subclass and sample type from which the mean was developed
S2 = the weighted element variance computed for the demographic subclass and sample type from which the mean was developed
n = the unweighted sample size for the particular mean

For example, for 1979 the DEFT factor for all Hispanics or Latinos is 1.45699 (see means column, row four of Table 1. Deft factors for round 1, 1979). To approximate the standard error of the mean number of years of education completed by this subclass, where the weighted element variance is .72955 and the sample size is 77, compute:

se(X) approximately equal to 1.45699 times √(.72955 divided by 77) which is approximately .1418

Design effects

Because the samples are multi-stage, stratified random samples instead of simple random samples, respondents tend to come in geographic clusters and clusters of persons tend to be alike in a variety of ways for a variety of reasons. (For more information on the sampling and screening process, users are referred to section on Sample Design & Screening Process in this guide.) For example, there may be cultural differences by locality or ecological differences in labor market conditions. Depending upon the degree of this homogeneity, the conventionally computed standard deviations for the variables, which assume a simple random sample, may be too small. However, by controlling the rate at which particular strata are sampled, multi-stage, stratified random samples can improve upon simple random samples. The ratio of the correct standard error to the standard error computed under the assumption of a simple random sample is known as the design effect. The technical sampling report for the NLSY79 (Frankel, Williams, and Spencer 1983) and its addendum (CHRR) provide design effects for the various strata.

A single design effect that can be broadly applied to regression analysis cannot be constructed. To illustrate the approximate size of design effects in regression analysis, a regression of rate of pay for the CPS job in 1979 was estimated using race, sex, marital status, and education as explanatory variables. Assuming each of the roughly 200 PSUs has the same number of respondents in the sample of 5,724 persons with observed wages, the design effect was calculated to be 1.52; that is, the true standard errors were larger than the naively computed standard errors by a factor of 1.52. When this exercise was repeated for rate of pay on the CPS job in 1986, the design effect had fallen to 1.37.

This reduction reflects the fact that mobility tends to mix the respondents more uniformly through the country, reducing the clustering of the sample. Many of the persons who started out in the same PSU will have moved to different areas and, hence, no longer share unobservable labor market conditions. These shared unobservable labor market conditions are likely responsible for the spatial correlation of the error terms which generate design effects. Thus, another advantage of longitudinal data is the lessening of design effects over time.

By examining the Geocode data for the NLSY79, it is possible to control for some of the environmental factors generating design effects or, if desired, to compute design effects based upon county or metropolitan area clusters which continue to be present. To facilitate study of design effects, scrambled PSU codes from the 1979 survey are available to persons with authorized access to the NLSY79 Geocode data.

The Technical Sampling Report and Technical Sampling Report Addendum also provide information on design effects.

Click below to view the DEFT and standard errors tables.

Table. Deft factors for round 1, 1979

Demographic Group

Proportions Means

All Youth

1.72547 1.71282

Males

1.46605 1.56808

Females

1.58029 1.49720

Hispanics or Latinos

1.44342 1.45699

Blacks

1.35303 1.43730

Non-black/non-Hispanics

1.58686 1.56996

Hispanic or Latino Males

1.24321 1.22329

Hispanic or Latino Females

1.40353 1.25095

Black Males

1.19457 1.21378

Black Females

1.24877 1.25243

Non-black/non-Hispanic Males

1.33775 1.45962

Non-black/non-Hispanic Females

1.46889 1.37581
Table. Deft factors for round 17, 1996

Demographic Group

Proportions Means

All Youth

1.35848 1.967232

Males

1.28523 1.667333

Females

1.24536 1.621727

Hispanics or Latinos

1.28275 1.584298

Blacks

1.19735 1.423025

Non-black/non-Hispanics

1.19087 1.713184

Hispanic or Latino Males

1.17744 1.407125

Hispanic or Latino Females

1.13217 1.264911

Black Males

1.16541 1.174734

Black Females

1.13258 1.319091

Non-black/non-Hispanic Males

1.13217 1.456022

Non-black/non-Hispanic Females

1.09545 1.405347
Table. Deft factors for round 18, 1998

Demographic Group

Proportions Means

All Youth

1.38301 1.96469

Males

1.30836 1.66433

Females

1.28311 1.60000

Hispanics or Latinos

1.21917 1.52807

Blacks

1.19164 1.40890

Non-black/non-Hispanics

1.17937 1.67481

Hispanic or Latino Males

1.19248 1.37659

Hispanic or Latino Females

1.13418 1.25100

Black Males

1.14336 1.12694

Black Females

1.12088 1.31529

Non-black/non-Hispanic Males

1.18195 1.43353

Non-black/non-Hispanic Females

1.11028 1.37133
Table. Deft factors for round 19, 2000

Demographic Group

Proportions Means

All Youth

1.36423 1.90919

Males

1.26007 1.61864

Females

1.21244 1.58588

Hispanics or Latinos

1.24544 1.48492

Blacks

1.19954 1.42127

Non-black/non-Hispanics

1.20052 1.62327

Hispanic or Latino Males

1.19722 1.31909

Hispanic or Latino Females

1.09240 1.22474

Black Males

1.20277 1.18322

Black Females

1.08282 1.34907

Non-black/non-Hispanic Males

1.12750 1.39462

Non-black/non-Hispanic Females

1.13908 1.34907
Table. Deft factors for round 20, 2002

Demographic Group

Proportions Means

All Youth

1.34578 1.82757

Males

1.29701 1.58430

Females

1.18181 1.52807

Hispanics or Latinos

1.24097 1.47986

Blacks

1.20692 1.35647

Non-black/non-Hispanics

1.15085 1.56844

Hispanic or Latino Males

1.12450 1.28841

Hispanic or Latino Females

1.09479 1.21861

Black Males

1.20830 1.12694

Black Females

1.18743 1.33604

Non-black/non-Hispanic Males

1.20468 1.37659

Non-black/non-Hispanic Females

1.06829 1.30958

Important information: Deft tables for rounds 21 through the current public release

Users are cautioned that the figures in the proportion column for the last six categories are becoming much less relevant over time. The proportion DEFT column is based on education, training, marriage, and employment variables. Over time categories, such as black females, have only a few respondents in school or training, which causes the Deft factors to change from survey to survey. Broader categories, like "All Youth," "Males," and "Females" are more accurate to use.

Table. Deft factors for round 21, 2004

Demographic Group

Proportions Means

All Youth

1.38789 1.83712

Males

1.27377 1.55563

Females

1.23592 1.55081

Hispanics or Latinos

1.30336 1.46969

Blacks

1.14782 1.35831

Non-black/non-Hispanics

1.18163 1.57003

Hispanic or Latino Males

1.27083 1.31149

Hispanic or Latino Females

1.12750 1.19164

Black Males

1.14455 1.10454

Black Females

1.02896 1.37113

Non-black/non-Hispanic Males

1.09373 1.35647

Non-black/non-Hispanic Females

1.08224 1.32098
Table. Deft factors for round 22, 2006

Demographic Group

Proportions Means

All Youth

1.35881 1.81246

Males

1.23472 1.55563

Females

1.25553 1.52315

Hispanics or Latinos

1.13710 1.48661

Blacks

1.15994 1.33041

Non-black/non-Hispanics

1.14455 1.53460

Hispanic or Latino Males

1.15195 1.31719

Hispanic or Latino Females

1.00995 1.23085

Black Males

1.15247 1.09772

Black Females

1.11221 1.35647

Non-black/non-Hispanic Males

1.09636 1.32288

Non-black/non-Hispanic Females

1.08082 1.30192
Table. Deft factors for round 23, 2008

Demographic Group

Proportions Means

All Youth

1.31106 1.83712

Males

1.25599 1.60468

Females

1.22474 1.52315

Hispanics or Latinos

1.13235 1.43353

Blacks

1.16726 1.38203

Non-black/non-Hispanics

1.10855 1.56365

Hispanic or Latino Males

1.14837 1.27083

Hispanic or Latino Females

1.03870 1.18322

Black Males

1.14182 1.12916

Black Females

1.11467 1.34907

Non-black/non-Hispanic Males

1.09030 1.38564

Non-black/non-Hispanic Females

1.09829 1.28841
Table. Deft factors for round 24, 2010

Demographic Group

Proportions Means

All Youth

1.34024 1.80278

Males

1.26293 1.58745

Females

1.23288 1.48829

Hispanics or Latinos

1.19284 1.46116

Blacks

1.21295 1.36015

Non-black/non-Hispanics

1.12639 1.54434

Hispanic or Latino Males

1.19284 1.28452

Hispanic or Latino Females

1.11867 1.20208

Black Males

1.16458 1.10905

Black Females

1.13137 1.34907

Non-black/non-Hispanic Males

1.07877 1.37659

Non-black/non-Hispanic Females

1.03983 1.26886
Table. Deft factors for round 25, 2012

Demographic Group

Proportions Means

All Youth

1.34604 1.77682

Males

1.26681 1.55921

Females

1.24255 1.48757

Hispanics or Latinos

1.21171 1.46095

Blacks

1.19992 1.35592

Non-black/non-Hispanics

1.17951 1.52438

Hispanic or Latino Males

1.16338 1.24213

Hispanic or Latino Females

1.05880 1.20750

Black Males

1.11229 1.16998

Black Females

1.15019 1.32479

Non-black/non-Hispanic Males

1.14991 1.36160

Non-black/non-Hispanic Females

1.12411 1.25952
Table. Deft factors for round 26, 2014

Demographic Group

Proportions Means

All Youth

1.33370 1.77496

Males

1.25238 1.56764

Females

1.19779 1.50041

Hispanics or Latinos

1.15607 1.41956

Blacks

1.13520 1.38628

Non-black/non-Hispanics

1.18624 1.50758

Hispanic or Latino Males

1.15649 1.25180

Hispanic or Latino Females

1.06414 1.20324

Black Males

1.12620 1.19193

Black Females

1.00051 1.34394

Non-black/non-Hispanic Males

1.15447 1.35138

Non-black/non-Hispanic Females

1.18466 1.26346
Table. Deft factors for round 27, 2016

Demographic Group

Proportions Means

All Youth

1.40369 1.73651

Males

1.36746 1.53267

Females

1.23931 1.47176

Hispanics or Latinos

1.28005 1.44627

Blacks

1.10852 1.34987

Non-black/non-Hispanics

1.26546 1.47732

Hispanic or Latino Males

1.19194 1.22472

Hispanic or Latino Females

1.16081 1.23085

Black Males

1.10918 1.15997

Black Females

1.04381 1.30468

Non-black/non-Hispanic Males

1.21767 1.32061

Non-black/non-Hispanic Females

1.17469 1.24867
Table. Deft factors for round 28, 2018

Demographic Group

Proportions Means

All Youth

1.36769 1.72280

Males

1.29963 1.57090

Females

1.18347 1.46229

Hispanics or Latinos

1.23085 1.43839

Blacks

1.06561 1.30877

Non-black/non-Hispanics

1.21787 1.46098

Hispanic or Latino Males

1.12575 1.25443

Hispanic or Latino Females

1.10262 1.19304

Black Males

1.05849 1.15098

Black Females

0.97723 1.31684

Non-black/non-Hispanic Males

1.12186 1.35481

Non-black/non-Hispanic Females

1.11219 1.22446
Table. Deft factors for round 29, 2020

Demographic Group

Proportions Means

All Youth

1.36387 1.72145

Males

1.35466 1.56630

Females

1.12285 1.12285

Hispanics or Latinos

1.15142 1.15142

Blacks

1.05324 1.28861

Non-black/non-Hispanics

1.22780 1.45744

Hispanic or Latino Males

1.00312 1.22750

Hispanic or Latino Females

1.02489 1.21003

Black Males

0.95852 1.09251

Black Females

0.96780 1.34382

Non-black/non-Hispanic Males

1.16393 1.36001

Non-black/non-Hispanic Females

1.06213
1.19797
Table. Deft factors for round 30, 2022

Demographic Group

Proportions Means

All Youth

1.11022 1.71275

Males

1.10061 1.57639

Females

0.93109 1.39730
 

Hispanics or Latinos

1.03198 1.42323

Blacks

0.94075 1.31054

Non-black/non-Hispanics

0.99426 1.44663

Hispanic or Latino Males

0.96821 1.25049

Hispanic or Latino Females

0.93765 1.18866

Black Males

0.97012 1.17667

Black Females

0.82893  1.31188

Non-black/non-Hispanic Males

1.01556   1.35862

Non-black/non-Hispanic Females

0.84741 1.17163

Scroll right to view additional table columns.

Table. Standard errors for round 1, 1979
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.00471 0.00627 0.00545 0.01385 0.00835 0.00527 0.01744 0.01814 0.01232 0.00928 0.00710 0.00619

Proportion Attending High School

0.00735 0.00893 0.01006 0.01554 0.01151 0.00904 0.02176 0.02146 0.01460 0.01628 0.01085 0.01233

Proportion Attending College

0.00597 0.00729 0.00778 0.01037 0.00784 0.00710 0.01230 0.01460 0.00919 0.01119 0.00862 0.00947

Proportion High School Grad

0.00658 0.00776 0.00905 0.01277 0.01033 0.00785 0.01440 0.01957 0.01217 0.01448 0.00926 0.01094

Mean Years of School Completed

0.02900 0.04000 0.03800 0.08200 0.05700 0.03400 0.10000 0.10500 0.06100 0.07400 0.04600 0.04400

Mean Years of School Expected

0.04600 0.05900 0.04700 0.10800 0.06400 0.05500 0.12500 0.11700 0.07900 0.07900 0.07100 0.05500

Proportion Living in South

0.02286 0.02353 0.02324 0.05641 0.04264 0.02544 0.04973 0.06060 0.04555 0.04084 0.02610 0.02601

Mean Numbers of Children Expected

0.02400 0.02700 0.03200 0.05800 0.04600 0.02800 0.06500 0.07000 0.05600 0.05500 0.03100 0.03700

Proportion Married

0.00454 0.00365 0.00686 0.01023 0.00533 0.00570 0.00923 0.01646 0.00440 0.00884 0.00448 0.00855
Table. Standard errors for round 17, 1996
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.004 0.002 0.009 0.001 0.007 0.003 0.003 0.001

Proportion High School Dropouts

0.006 0.008 0.006 0.014 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.001 0.001 0.002 0.001 0.001 0.002 0.002 0.001 0.002 0.001 0.000

Proportion Attending College

0.003 0.003 0.005 0.006 0.005 0.004 0.008 0.009 0.005 0.007 0.004 0.005

Proportion High School Grad

0.006 0.007 0.006 0.015 0.009 0.007 0.018 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.034 0.034 0.036 0.052 0.046 0.039 0.049 0.059 0.046 0.048 0.038 0.041

Proportion Currently Married

0.007 0.010 0.010 0.016 0.013 0.008 0.020 0.021 0.018 0.017 0.011 0.011

Proportion Employed at Present

0.006 0.007 0.009 0.015 0.009 0.007 0.017 0.020 0.014 0.013 0.007 0.010

Proportion Unemployed

0.002 0.003 0.003 0.006 0.005 0.003 0.007 0.009 0.008 0.008 0.004 0.004

Proportion in Labor Force

0.005 0.005 0.008 0.013 0.008 0.006 0.015 0.018 0.012 0.012 0.006 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.003 0.002 0.004 0.001 0.001

Average Number of Children

0.023 0.027 0.030 0.054 0.035 0.028 0.067 0.065 0.040 0.050 0.033 0.036

Average Highest Grade Completed

0.060 0.074 0.063 0.109 0.065 0.073 0.137 0.119 0.074 0.081 0.091 0.077

Proportion Currently Enrolled

0.003 0.004 0.005 0.006 0.005 0.004 0.008 0.008 0.005 0.007 0.004 0.006
Table. Standard errors for round 18, 1998
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.003 0.001 0.005 0.003 0.002 0.008 0.002 0.006 0.003 0.003 0.001

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion in High School or Less

0.000 0.000 0.001 0.000 0.001 0.000 0.000 0.001 0.001 0.001 0.000 0.001

Proportion Attending College

0.003 0.003 0.005 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.016 0.012 0.010 0.009 0.007

Proportion Living in South

0.035 0.034 0.037 0.051 0.045 0.039 0.047 0.058 0.044 0.047 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.015 0.012 0.008 0.021 0.021 0.018 0.016 0.011 0.010

Proportion Employed at Present

0.006 0.007 0.009 0.014 0.009 0.007 0.017 0.020 0.012 0.014 0.008 0.011

Proportion Unemployed

0.002 0.003 0.003 0.005 0.005 0.002 0.007 0.008 0.007 0.007 0.003 0.003

Proportion in Labor Force

0.005 0.006 0.009 0.013 0.008 0.006 0.016 0.019 0.011 0.011 0.006 0.011

Proportion Gov't Training

0.001 0.001 0.001 0.002 0.002 0.001 0.003 0.004 0.003 0.004 0.001 0.001

Average Number of Children

0.024 0.028 0.030 0.050 0.036 0.028 0.061 0.065 0.042 0.050 0.033 0.035

Average Highest Grade Completed

0.061 0.077 0.063 0.114 0.066 0.073 0.147 0.121 0.074 0.082 0.09. 0.074

Proportion Currently Enrolled

0.003 0.003 0.004 0.005 0.005 0.003 0.005 0.008 0.005 0.007 0.004 0.005
Table. Standard errors for round 19, 2000
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.003 0.003 0.001 0.006 0.001 0.005 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.001 0.000 0.001 0.002 0.002 0.000 0.000 0.000

Proportion Attending College

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.004 0.007 0.003 0.005

Proportion High School Grad

0.005 0.007 0.006 0.014 0.009 0.006 0.017 0.015 0.013 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.037 0.052 0.043 0.039 0.049 0.059 0.044 0.046 0.038 0.041

Proportion Currently Married

0.008 0.010 0.010 0.014 0.012 0.008 0.022 0.021 0.018 0.015 0.011 0.010

Proportion Employed at Present

0.006 0.006 0.009 0.012 0.009 0.007 0.014 0.018 0.014 0.012 0.007 0.010

Proportion Gov't Training

0.001 0.001 0.001 0.003 0.002 0.001 0.003 0.004 0.003 0.003 0.001 0.001

Average Number of Children

0.024 0.029 0.030 0.048 0.037 0.027 0.061 0.064 0.046 0.051 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.069 0.074 0.146 0.118 0.078 0.089 0.092 0.078

Proportion Currently Enrolled

0.003 0.003 0.004 0.006 0.004 0.003 0.008 0.009 0.005 0.007 0.003 0.005

R19 table note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small subsample sizes for "Proportion government training participant" and "Proportion in high school or less" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 20, 2002
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.000 0.004 0.002 0.003 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.001 0.001 0.000

Proportion Attending College

0.002 0.003 0.004 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.015 0.008 0.006 0.018 0.016 0.011 0.010 0.009 0.006

Proportion Living in South

0.035 0.034 0.036 0.053 0.042 0.039 0.050 0.060 0.043 0.045 0.039 0.041

Proportion Currently Married

0.009 0.010 0.011 0.015 0.013 0.009 0.023 0.022 0.018 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.009 0.012 0.011 0.008 0.016 0.015 0.016 0.014 0.008 0.011

Proportion Gov't Training

0.002 0.002 0.002 0.004 0.004 0.002 0.006 0.006 0.006 0.006 0.002 0.002

Average Number of Children

0.023 0.028 0.028 0.051 0.037 0.026 0.062 0.067 0.048 0.053 0.034 0.034

Average Highest Grade Completed

0.061 0.077 0.065 0.120 0.066 0.074 0.150 0.125 0.073 0.091 0.094 0.078

Proportion Currently Enrolled

0.002 0.003 0.003 0.004 0.004 0.002 0.005 0.006 0.005 0.006 0.003 0.004

R20 table note: Users are cautioned that by round 17 cohort changes have made some categories much less relevant. In particular, the extremely small sample sizes for "Proportion government training participant" and "Proportion in high school or less: make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 21, 2004
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.002 0.000 0.002 0.002 0.001 0.004 0.001 0.003 0.002 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.013 0.010 0.009 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.009 0.006 0.019 0.015 0.012 0.010 0.009 0.006

Proportion Living in South

0.034 0.034 0.036 0.053 0.044 0.039 0.051 0.059 0.044 0.045 0.039 0.041

Proportion Currently Married

0.008 0.010 0.011 0.014 0.012 0.008 0.021 0.020 0.018 0.014 0.010 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.009 0.008 0.018 0.018 0.012 0.013 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.001 0.003 0.006 0.004 0.003 0.002 0.002

Average Number of Children

0.024 0.029 0.031 0.053 0.037 0.028 0.069 0.065 0.049 0.051 0.035 0.036

Average Highest Grade Completed

0.061 0.076 0.065 0.115 0.069 0.074 0.149 0.119 0.074 0.096 0.093 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.006 0.003 0.003 0.006 0.009 0.004 0.006 0.002 0.004

R21 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 22, 2006
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.002 0.001 0.003 0.001 0.002 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.005 0.018 0.016 0.012 0.009 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.052 0.043 0.039 0.048 0.059 0.043 0.046 0.039 0.041

Proportion Currently Married

0.009 0.010 0.012 0.014 0.012 0.009 0.022 0.018 0.016 0.015 0.011 0.012

Proportion Employed at Present

0.007 0.007 0.010 0.014 0.010 0.008 0.020 0.017 0.014 0.015 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.002 0.004 0.004 0.005 0.002 0.002

Average Number of Children

0.023 0.029 0.030 0.055 0.037 0.027 0.069 0.068 0.048 0.052 0.034 0.035

Average Highest Grade Completed

0.061 0.076 0.065 0.114 0.067 0.074 0.145 0.126 0.072 0.096 0.093 0.078

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.003 0.005 0.005 0.006 0.002 0.004

R22 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 23, 2008
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.001

Proportion Attending College

0.002 0.002 0.003 0.004 0.003 0.002 0.005 0.005 0.005 0.006 0.002 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.032 0.031 0.034 0.050 0.043 0.035 0.046 0.058 0.042 0.046 0.034 0.038

Proportion Currently Married

0.009 0.010 0.011 0.015 0.012 0.008 0.022 0.020 0.017 0.015 0.011 0.012

Proportion Employed at Present

0.008 0.010 0.013 0.011 0.008 0.018 0.017 0.015 0.014 0.008 0.012

Proportion Gov't Training

0.001 0.002 0.002 0.002 0.003 0.001 0.003 0.004 0.003 0.004 0.002 0.002

Average Number of Children

0.023 0.030 0.030 0.054 0.038 0.027 0.068 0.067 0.049 0.052 0.036 0.035

Average Highest Grade Completed

0.062 0.078 0.066 0.109 0.070 0.075 0.141 0.117 0.076 0.094 0.096 0.079

Proportion Currently Enrolled

0.002 0.002 0.003 0.004 0.004 0.002 0.006 0.006 0.005 0.007 0.002 0.004

R23 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 24, 2010
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.002 0.001 0.001 0.000

Proportion High School Dropouts

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion in High School or Less

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Proportion Attending College

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.003

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.019 0.015 0.011 0.009 0.008 0.006

Proportion Living in South

0.034 0.033 0.037 0.051 0.042 0.039 0.047 0.058 0.042 0.044 0.038 0.041

Proportion Currently Married

0.009 0.010 0.011 0.016 0.012 0.008 0.021 0.023 0.017 0.016 0.010 0.012

Proportion Employed at Present

0.008 0.009 0.011 0.014 0.011 0.009 0.019 0.020 0.017 0.014 0.011 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.003 0.003 0.002 0.004 0.005 0.004 0.004 0.002 0.002

Average Number of Children

0.024 0.030 0.030 0.057 0.037 0.027 0.072 0.068 0.049 0.053 0.036 0.035

Average Highest Grade Completed

0.062 0.079 0.064 0.112 0.072 0.075 0.140 0.125 0.077 0.098 0.096 0.077

Proportion Currently Enrolled

0.002 0.002 0.003 0.003 0.004 0.002 0.005 0.004 0.004 0.007 0.002 0.004

R24 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small sample sizes for education related variables such as "Proportion in high school or less," "Proportion government training participant," "Proportion currently enrolled," and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity.

Table. Standard errors for round 25, 2012
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion Not on Active Duty

0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.001 0.000 0.000

Proportion High School Dropouts

0.007 0.005 0.014 0.009 0.005 0.020 0.015 0.012 0.009 0.009 0.006

Proportion in High School or Less

NA NA NA NA NA NA NA NA NA NA NA NA

Proportion Attending College

0.002 0.003 0.003 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

Proportion High School Grad

0.005 0.007 0.005 0.014 0.008 0.006 0.020 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.034 0.036 0.055 0.043 0.039 0.055 0.064 0.044 0.046 0.039 0.041

Proportion Currently Married

0.009 0.011 0.011 0.016 0.012 0.009 0.022 0.022 0.016 0.015 0.012 0.012

Proportion Employed at Present

0.008 0.010 0.011 0.015 0.011 0.009 0.020 0.018 0.016 0.015 0.010 0.013

Proportion Gov't Training

0.001 0.002 0.002 0.004 0.003 0.001 0.004 0.005 0.004 0.005 0.002 0.002

Average Number of Children

0.024 0.030 0.031 0.058 0.038 0.027 0.068 0.069 0.053 0.052 0.036 0.036

Average Highest Grade Completed

0.062 0.080 0.065 0.114 0.073 0.076 0.139 0.126 0.084 0.098 0.098 0.078

Proportion Currently Enrolled

0.002 0.003 0.004 0.004 0.005 0.003 0.003 0.007 0.004 0.008 0.004 0.004

R25 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was labeled "NA" since no NLSY79 respondent was in this category.

Table. Standard errors for round 26, 2014
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.005 0.007 0.006 0.014 0.009 0.006 0.021 0.016 0.012 0.010 0.009 0.007

Proportion Attending College

0.002 0.003 0.002 0.004 0.005 0.004 0.005 0.008 0.003 0.007 0.003 0.004

Proportion High School Grad

0.005 0.007 0.005 0.013 0.008 0.005 0.020 0.014 0.012 0.008 0.008 0.006

Proportion Living in South

0.034 0.033 0.036 0.056 0.042 0.038 0.059 0.061 0.044 0.046 0.038 0.041

Proportion Currently Married

0.009 0.011 0.012 0.016 0.012 0.009 0.022 0.021 0.017 0.016 0.012 0.012

Proportion Employed at Present

0.009 0.011 0.011 0.014 0.010 0.010 0.021 0.019 0.015 0.013 0.012 0.013

Proportion Gov't Training

0.001 0.001 0.002 0.002 0.003 0.001 0.003 0.003 0.004 0.003 0.001 0.002

Average Number of Children

0.024 0.029 0.032 0.055 0.039 0.027 0.066 0.070 0.054 0.054 0.035 0.037

Average Highest Grade Completed

0.064 0.084 0.067 0.114 0.077 0.078 0.145 0.129 0.088 0.100 0.102 0.080

Proportion Currently Enrolled

0.002 0.002 0.004 0.005 0.004 0.003 0.005 0.008 0.003 0.007 0.003 0.004

R26 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25, the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26, the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table. Standard errors for round 27, 2016
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.018 0.018 0.012 0.009 0.008 0.006

Proportion Attending College

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.002 0.006 0.003 0.004

Proportion High School Grads

0.0046 0.007 0.005 0.013 0.008 0.005 0.018 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0337 0.033 0.036 0.058 0.041 0.038 0.061 0.063 0.042 0.045 0.038 0.040

Proportion Currently Married

0.0093 0.011 0.011 0.016 0.012 0.009 0.023 0.021 0.017 0.016 0.012 0.011

Proportion Employed at Present

0.0084 0.010 0.011 0.015 0.011 0.009 0.023 0.018 0.016 0.015 0.011 0.013

Proportion Gov't Training

0.0012 0.001 0.001 0.004 0.002 0.001 0.007 0.004 0.004 0.004 0.002 0.002

Average Number of Children

0.0239 0.031 0.031 0.059 0.039 0.028 0.070 0.073 0.054 0.051 0.036 0.037

Average Highest Grade Completed

0.0624 0.080 0.067 0.118 0.075 0.076 0.142 0.134 0.085 0.103 0.098 0.080

Proportion Currently Enrolled

0.0022 0.003 0.003 0.004 0.003 0.003 0.002 0.009 0.003 0.006 0.003 0.004

R27 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category.

Table. Standard errors for round 28, 2018
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.016 0.012 0.008 0.008 0.006

Proportion Attending College

0.0010 0.001 0.002 0.002 0.002 0.001 0.004 0.003 0.000 0.004 0.002 0.002

Proportion High School Grad

0.0047 0.007 0.005 0.014 0.008 0.005 0.019 0.015 0.012 0.008 0.008 0.006

Proportion Living in South

0.0334 0.033 0.036 0.058 0.042 0.038 0.060 0.063 0.043 0.045 0.038 0.041

Proportion Currently Married

0.0094 0.011 0.011 0.017 0.012 0.009 0.025 0.019 0.016 0.016 0.012 0.012

Proportion Employed at Present

0.0086 0.010 0.012 0.016 0.012 0.010 0.020 0.022 0.017 0.016 0.011 0.014

Proportion Gov't Training

0.0008 0.009 0.001 0.002 0.002 0.001 0.000 0.003 0.003 0.003 0.001 0.001

Average Number of Children

0.0248 0.033 0.032 0.057 0.038 0.029 0.067 0.070 0.054 0.053 0.039 0.037

Average Highest Grade Completed

0.0610 0.081 0.066 0.117 0.074 0.074 0.151 0.126 0.084 0.105 0.100 0.078

Proportion Currently Enrolled

0.0011 0.001 0.002 0.003 0.002 0.001 0.004 0.004 0.000 0.004 0.002 0.002

R28 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Table. Standard errors for round 29, 2020
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.000 0.006

Proportion Attending College

0.0007 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

Proportion High School Grad

0.0048 0.007 0.005 0.014 0.008 0.006 0.019 0.015 0.012 0.008 0.009 0.006

Proportion Living in South

0.0332 0.034 0.035 0.058 0.042 0.038 0.062 0.062 0.044 0.044 0.039 0.040

Proportion Currently Married

0.0100 0.012 0.012 0.017 0.012 0.010 0.025 0.020 0.017 0.016 0.013 0.012

Proportion Employed at Present

0.0092 0.013 0.012 0.015 0.013 0.011 0.020 0.023 0.018 0.017 0.015 0.014

Proportion Gov't Training

0.0008 0.001 0.001 0.003 0.002 0.001 0.005 0.003 0.003 0.002 0.001 0.001

Average Number of Children

0.0250 0.034 0.032 0.055 0.039 0.029 0.068 0.071 0.055 0.057 0.040 0.037

Average Highest Grade Completed

0.0630 0.085 0.065 0.121 0.075 0.076 0.155 0.130 0.082 0.108 0.103 0.076

Proportion Currently Enrolled

0.0008 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.003 0.001 0.001

R29 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round 25 the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round 26 the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round 28, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year.

Table. Standard errors for round 30, 2022
Description All Male Female Hispanic or Latino Black Non-black, non-Hispanic Male Hispanic or Latino Female Hispanic or Latino Male Black Female Black Male Non-black, non-Hispanic Female Non-black, non-Hispanic

Proportion High School Dropouts

0.0047 0.007 0.005 0.014 0.008 0.005 0.018 0.015 0.012 0.009 0.009 0.005

Proportion High School Grad

0.0047 0.007 0.005 0.014 0.008 0.007 0.018 0.015 0.012 0.009 0.009 0.005

Proportion Living in South

0.0328 0.034 0.034 0.062 0.042 0.037 0.068 0.065 0.044 0.044 0.039 0.038

Proportion Currently Married

0.0100 0.012 0.011 0.016 0.013 0.010 0.025 0.020 0.018 0.017 0.013 0.011

Proportion Employed at Present

0.0078 0.012 0.010 0.017 0.012 0.009 0.026 0.022 0.018 0.015 0.014 0.012

Proportion Gov't Training

0.0009 0.002 0.001 0.002 0.002 0.001 0.002 0.003 0.003 0.003 0.002 0.001

Average Number of Children

0.0257 0.034 0.032 0.059 0.039 0.030 0.073 0.073 0.056 0.053 0.040 0.037

Average Highest Grade Completed

0.0623 0.087 0.062 0.118 0.079 0.074 0.156 0.122 0.095 0.110 0.105 0.072

R30 table note: Users are cautioned that cohort changes over time have made some categories much less relevant. In particular, the extremely small subsample sizes for education related variables such as "Proportion government training participant," "Proportion currently enrolled" and "Proportion attending college" make these categories statistically suspect. They have been kept in the table for historical continuity. In round XXV the variable "Proportion in high school or less" was removed from the table since no NLSY79 respondent was in this category. In round XXVI the variable "Proportion not on active duty" was removed from the table since no NLSY79 respondent remained in this category. Beginning in round XXVIII, the "Average highest grade completed" was the highest grade completed as of the date of most recent interview, not as of May in the year previous to survey year. No educational updates were collected in round XXX, eliminating variables depicting "Proportion attending college" and "Proportion currently enrolled."

Sample Weights & Clustering Adjustments

Sample weights

In each survey year a set of sampling weights is constructed. These weights provide the researcher with an estimate of how many individuals in the United States each respondent's answers represent. Weighting decisions for the NLSY79 are guided by the following principles:

  1. individual case weights are assigned for each year in such a way as to produce group population estimates when used in tabulations
  2. the assignment of individual respondent weights involves at least three types of adjustment, with additional considerations necessary for weighting of NLSY79 Child data

The interested user should consult the NLSY79 Technical Sampling Report (Frankel, Williams, and Spencer 1983) for a step-by-step description of the adjustment process. A cursory review of the process follows.

  • Adjustment One. The first weighting adjustment involves the reciprocal of the probability of selection at the first interview. Specifically, this probability of selection is a function of the probability of selection associated with the household in which the respondent was located, as well as the subsampling (if any) applied to individuals identified in screening.
  • Adjustment Two. This process adjusts for differential response (cooperation) rates in both the screening phase and subsequent interviews. Differential cooperation rates are computed (and adjusted) on the basis of geographic location and group membership, as well as within-group subclassification.
  • Adjustment Three. This weighting adjustment attempts to correct for certain types of random variation associated with sampling as well as sample "undercoverage." These ratio estimations are used to conform the sample to independently derived population totals.

Sampling weight readjustments

Sampling weights for the main survey are readjusted to account for noninterviews each survey year. The readjustments are necessitated by differential nonresponse and use base year sample parameters for their creation, employing a procedure similar to that described above. The only exception occurs in the final stage of post-stratification. Post-stratification weights in survey rounds two and above have been recomputed on the basis of completed cases in that year's sample rather than the completed cases in the base year sample.

Custom weights

Users looking for a simple method to correct a single year's worth of raw data for the effects of over-sampling, clustering and differential base year participation should use the weights include each round on the data release. Unfortunately, while each round of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years' worth of data. The NLS has a custom weighting program which provides the ability to create a set of customized longitudinal weights. These weights improve a researcher's ability to accurately calculate summary statistics from multiple years of data.

The custom weighting program calculates its weights by first creating a new temporary list of individuals who meet all of a researcher's criteria. This list is then weighted as if the individuals had participated in a new survey round. The weights for this temporary list are the output of the custom weighting program.

There are two options for the custom weighting program on the Custom Weights for the NLSY79 page. The first option allows researchers to specify the particular rounds in which respondents participated. Researchers can also select if "The respondents are in all of the selected years" or can select if "The respondents are in any or all of the selected years." The second option allows users to input a list of respondent ids to get the appropriate weights for just that list. For example, this second option allows researcher to weight only those people who ever reported smoking cigarettes in any survey or weight only people who needed extra time to graduate from college.

Important information: Custom Weighting Program

  • If you select all survey rounds available and also pick "The respondents are in any or all of the selected years," the weights produced are identical to round 1 survey weight. This result arises because the any selection combined with all survey rounds produces a list of every person who participated in the survey.
  • The output of the custom weight program has 2 implied decimal places just like the weights found in the data release. Dividing each custom weight output value by 100 results in the number of individuals the respondent represents.

Practical usage of weights

The application of sampling weights varies depending on the type of analysis being performed. If tabulating sample characteristics for a single interview year in order to describe the population being represented (that is, compute sample means, totals, or proportions), researchers should weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by persons born in 1957 through 1964, simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not correspond to the intended result include:

Samples generated by dropping observations with item nonresponses

Often users confine their analysis to subsamples for which respondents provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don't knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes only a small proportion of the data for variables under analysis, population estimates (that is, weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort under analysis. For more information on item nonresponse in the NLSY79, see the Item Nonresponse section of this guide.

Data from multiple waves

Because the weights are specific to a single wave of the study, and because respondents occasionally miss an interview but are contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, occasionally the weights for a respondent in different years may be quite dissimilar, leaving the user uncertain as to which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. In practice, if the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used (for more information on weighting see the section on Sample Weights & Clustering Adjustments).

Regression analysis

A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may not lead to correct estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership.

Users interested in calculating the population average effect of, for example, education upon earnings, should simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, and so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts.

Users uncertain about the appropriate grouping should consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would not be properly specified.

Clustering adjustments

Researchers use NLSY79 data to estimate a variety of statistics. Since NLSY79 data come from a sample instead of data from every age appropriate individual in the U.S. the statistics produced are only estimates of the "true" national values. When researchers use a computer package to compute a statistic such as a mean or a regression coefficient, the program automatically provides a second set of statistics, such as the standard error, standard deviation, or t-statistic, which tells researchers how precisely the mean or coefficient is measured.

Details

Instead of randomly selecting individuals located anywhere in the U.S. during 1978, only a random selection of areas were selected. By randomly selecting a fixed number of small areas, interviewers reduced the amount of time they spent traveling for each interview. In this way, costs were lowered and the survey was fielded faster yielding data more quickly. Like all other national data sets that use clustering, NLSY79 data has many groups or bunches of respondents who share similar characteristics because they lived in the same neighborhood during 1978. This makes survey results appear more homogeneous, or similar, than actually found in the US.

Researchers can use two different approaches to correct this problem. The first approach uses the tables found in the NLSY79 Technical Sampling Report. For each survey round there is a table that lists the "Design Effects" or DEFT factors. These DEFTs give users a simple method for determining approximately how much they should increase their standard errors when trying to measure the precision of their estimates. Using the DEFT factors is a simple method of adjusting standard errors to account for clustering. However, when using specialized subsamples, these tables provide no guidance for users on how to adjust regression coefficients being based on calculations from only a small subset of NLSY79 variables.

The more general method is to correct for clustering by using a specialized software package. Two of the most widely used packages to adjust surveys for clustering effects are Stata, sold by the Stata Corporation and Sudaan, sold by RTI International. This section describes how to adjust for clustering using Sudaan. Sudaan is used to generate the DEFT factors found in the Technical Sampling Report.

Important information: Clustering

If you do not have access to the Geocode data set, you cannot use Sudaan or Stata to adjust for clustering. The Geocode data set can only be accessed by individuals approved by BLS. See Geographic Residence and Neighborhood Composition for information about using the restricted-use Geocode file.

Table 1. Effect of clustering correction on a mean value's standard error, 1998 data, example one

Variable

Mean Value Uncorrected Std Error Corrected Std Error

Net Worth

$128,068 $3,403 $5,826

Family Income

$55,031 $536 $1,137

BMI

26.7 0.06 0.09

Table 2 shows how adjusting for clustering affects a simple regression. Using the same 1998 data, a simple unweighted least squares equation was run with both SAS and Sudaan using net worth as the dependent variable and six independent variables. Three of these independent variables (BMI, income and age) take a wide range of values, while the remaining three variables (black, Hispanic or Latino, and female) take the value of 1 if the respondent has the particular characteristic and 0 otherwise.

The table shows that adjusting for clustering changes many of the standard errors and associated t-values. The biggest effect is seen on the income line. The uncorrected standard error increases from 0.06 to 0.19, resulting in the t-value falling from 44.37 to 13.87. Smaller changes are seen for the other variables. The intercept, age, and female standard errors all increase in size while the BMI, black, and Hispanic or Latino variables all end up with slightly smaller standard errors.

Overall, both examples show that adjusting for clustering effects is important. The next subsection shows what variables are needed to adjust for clustering. The section ends with the specific Sudaan commands used to create the tables in this chapter.

Key variables needed for clustering correction

Two variables are needed to adjust the data set for clustering. Both variables are found only on the Geocode data set and are placed there because researchers can use these variables to determine where each civilian respondent lived in 1978.

Table 2. Effect of clustering correction on a mean value's standard error, 1998 data, example two

Variable

Coefficient Estimate Uncorrected Std Error Uncorrected t Value Corrected Std Error Corrected t Value

Intercept

186,808 43,534 4.29 52,166 3.58

BMI

1,091 466 2.34 457 2.39

Income

2.63 0.06 44.37 0.19 13.87

Black

40,394 5,938 6.80 4,259 9.48

Hispanic

41,382 6,617 6.25 4,554 9.09

Age

5,285 1,086 4.87 1,252 4.22

Female

2,814 4,891 0.58 5,064 0.56

As discussed above, the NLSY79 is a multi-stage clustered sample. The clusters were created by first dividing the entire U.S. into Primary Sampling Units, or PSUs. These PSUs were defined by NORC and were composed of Standard Metropolitan Statistical Areas (SMSAs), entire counties when the counties were small, parts of counties when the counties were large, and independent cities. NORC randomly selected two different sets of PSUs for inclusion in the study, each of which by itself randomly represents the U.S. This selection of two sets of PSUs means the NLSY79 is composed of two replicates or strata. Within each is a random selection of PSUs. The replicate or strata that a respondent belongs to is found in the Geocode data set only and is labeled variable R02191.46, entitled "Within Stratum Replicate Of Primary Sampling Unit." This variable takes either the value 1 or 2, for either the first or second replicate.

The variable, containing the PSU is labeled R02191.45, and is entitled "Stratum Number For Primary Sampling Units." R02191.45 ranges in value from 1 to 120. Researchers who want to know which geographic areas correspond to particular values should look at Attachment 104 of the Geocode Codebook Supplement for the crosswalk table. Respondents with a PSU code of 52 to 70 are part of the military sample and do not have any known geographic location.

Important information: Clarification on variable labeling

The label for variable R02191.46 found in SAS and SPSS programs that is automatically produced by NLS Investigator is confusing. The label reads "PRIMARY SAMPLNG UNIT PSU SCRAMBLED 79". This variable contains the scrambled replicate, or stratum number, not the PSU. PSU information is found in R02191.45. Users should be careful when adjusting geographic variables using the clustering corrections. The complete title for variable R02191.46 is "Within Stratum Replicate Of Primary Sampling Unit (PSU) - Scrambled." Because this variable is randomly scrambled, doing clustering corrections on some geographic variables produces incorrect results. Scrambling has no effect on variables that are not geographic, such as education, income, or training.

Using the key variables In Sudaan

The specific steps used to generate the tables above are covered in this section. While the tables were produced using the Windows Version 8.0 Standalone package, the steps and commands are similar for other versions of Sudaan. To adjust summary statistics such as means or regressions with Sudaan, the researcher needs to create three files: one containing the data, one telling Sudaan how to read the data, and one containing the specific commands. Any computer package can be used to create the data file. Data can even be written directly from NLS Investigator to a file. Figure 1 has the relevant portion of the SAS program used to create the data file used in Tables 1 and 2 above.

Figure 1. SAS commands to create Sudaan data file

Data obesity;
(SAS commands that generate variables like Age, Income, and BMI are placed here)
PSU =R0219145;
REPLICATE =R0219146;
proc sort; /* Sort the data since Sudaan can not handle unsorted */
by replicate psu;
Data;
Set obesity;
file 'C:\DesignEffects\ObesitySudaanAdjustment.dbs'
put ID     5.
PSU         3.
REPLICATE   2.
WGHT       7.
BLACK      2.
HISPANIC    2.
AGE        3.
SEX        2.
INCOME      9.
BMI        4.1
NETASSET    9

Run;

One of the key things to note is that the data are sorted by the PSU and replicate variables before being written to the file. For most operations, Sudaan requires the data to be in this order before processing.

The second file is the "label" file. This file is used to read the data into Sudaan. The label file, called "ObesitySudaanAdjustment.lab," is shown in Figure 2. The label file has five parts. The first column on the left is the variable's name, followed by a letter which tells Sudaan if the variable contains numeric or character data. The third and fourth columns contain the number of bytes (characters) taken up by the variable and the number of decimal places in the number. The last column contains the label. Sudaan expects the label file to follow a precise format with columns starting and ending in very specific places.

Figure 2. Sudaan label file

ID

N 5 0

ID# (1-12686)

PSU

N 3 0

# OF PSU

REPLICAT

N 2 0

REPPLICATE SCRAMBLED

WGHT

N 7 0

SAMPLING WEIGHT

BLACK

N 2 0

T/F BLACK

HISPANIC

N 2 0

T/F HISPANIC

AGE

N 3 0

AGE OF RESPONDENT

SEX

N 2 0

MALE 0 - FEMALE 1

TOTINC

N 9 0

TOTAL INCOME

BMI

N 4 1

BODY MASS

NETASS

N 9 0

TOTAL NET WORTH

The third file is the set of commands used to run Sudaan. Many versions of Sudaan allow commands to be typed directly into the program so researchers are not forced to create command files. Figures 3 and 4 provide the Sudaan commands that were used to create Tables 1 and 2 above. Figure 3 has three sections. The top section below the "Proc Descript" command tells Sudaan where to find the raw data and what variable contains the basic survey weights. The nest command defines which variables contain the replicate and PSU information. The middle section, beginning with "Var," tells Sudaan which variables will have descriptive statistics created. The final section, beginning with "Print," specifies the types of output that are shown.

The first section of Figure 4 is similar to commands seen above in Proc Descript. The large difference is that the "weight" command has the reserved name "_ONE_" after it instead of the NLSY79 weight, "wght." Putting the "wght" variable after the weight command would cause Sudaan to run weighted least squares. By using "_ONE_" instead, Sudaan weights all variables with the same 1.0 value, resulting in Sudaan running unweighted least squares. The second part of the command, which begins with "Model," shows the exact regression to run.

Figure 3. Sudaan commands used to create summary statistics in Table 1

Proc Descript
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr mean DEFT1est_no=12686;
weight wght;
nest REPLICAT PSU / MISSUNIT;
Var NETASS BMI TOTINC BLACK HISPANIC AGE SEX;
Print nsum="Sample Size" WSUM="Population Size" Mean
semean="Std. Err." DEFFMEAN="Design Effect" / style=nchs
nsumfmt=f6.0 wsumfmt=f10.0 deffmeanfmt=f6.2 semeanfmt=f11.2;


Figure 4. Sudaan commands used to create regression values in Table 2

Proc Regress
Data="C:\DesignEffects\ObesitySudaanAdjustment.dbs"
filetype=asciidesign=wr DEFT1est_no=12686;
weight ONE;
nest REPLICAT PSU / MISSUNIT;
Model NETASS = BMI TOTINC BLACK HISPANIC AGE SEX;

Related Variables The 1979 Geocode data also contain the State, county, and metropolitan statistical area where the respondent lived in 1979.
Documentation Additional information can be found in Standard Errors and Design Effects section of this User's Guide, in the NLSY79 Technical Sampling Report, and in Attachment 104 of the Geocode Codebook Supplement.
Data Files Data on clustering can be found only in the NLSY79 Geocode files under the "GEOCODE" 1979 area of interest.
Subscribe to NLSY79