Skip to main content

NLSW -Mature and Young Women

Young Women Variables by Survey Year: Respondents Ages 14 to 24 in 1968

The selected variables for the Young Women asterisk tables are grouped into three main categories:

  1. Labor market experience variables
  2. Human capital and other socioeconomic variables
  3. Environmental variables

Important information: Viewing asterisk tables

  • Click a topic below to expand and collapse the corresponding asterisk table.
  • For large tables, scroll right to view additional table columns.

I. Labor market experience variables

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Survey week labor force and employment status * * * * * * * * * * * * * * * * * * * * * *
Hours worked in survey week * * * * * * * * * * * * * * * * * * * * * *
Weeks worked (time frames vary) * * * * * * * * * * * * * * * * * * * * * *
Usual hours worked during weeks worked * *         * *                   * * * * *
Weeks unemployed (time frames vary) * * * * * * * * * * * * * * * * * * * * * *
Spells of unemployment in past year * * * * * *     *     *           * * * * *
Weeks out of labor force (time frames vary) * * * * * * * * * * * * * * * * * * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Occupation, industry, class of worker * * * * * * * * * * * * * * * * * * * * * *
Start date and stop date * * * * * * * * * * * * * * * * * * * * * *
Hours per week usually worked * * * * * * * * * * * * * * * * * * * * * *
Work schedule (worked and preferred)                   *           *   *        
Shift worked         * *     *   * * * * * * * * * * * *
Fringe benefits available                 *     * * * * * * * * * * *
Detailed fringe benefit series                               *   * * * * *
Hourly rate of pay *   * * * * * * * * * * * * * * * * * * * *
Work at home for employer                             * * * * * * * *
Promotions (any, effects)                             * * * * * * * *
Firm size                             * * * * * * * *
Supervises others                             * * * * * * * *
Displaced worker                             * * * * * * * *
Commuting time, costs *         *     *     *           *        
Type of training for this job                     *                      
Covered by collective bargaining     * * * *   * * * * * * * * * * * * * * *
Is R union member     * * * *   * * * * * * * * * * * * * * *
Did R ever hold unionized job                           *                

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Occupation and industry of job held during last year of high school *                                          
Occupation, industry, class of worker, start date, stop date, and reason for leaving first job after school *                                          

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Interfirm mobility (details vary)   * * * * * * * * * * * *                  
Occupation, industry, class of worker, hours per week, start date, stop date, and reason for leaving intervening jobs (through 1983) or employers (beginning in 1988) (details vary)   * * * * *     *     *     * * * * * * * *

II. Human capital and other socioeconomic variables

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Age or date of birth * * * * * * * * *   *       * *   * * * * *
Nationality or ethnicity *                               * * * *    
Type of residence at age 14 and age 18 *                                          
Person(s) R lived with at age 14 *       *                                  
Occupation of household head when R was 14 *                                          
Highest grade completed by father *               *                          
Highest grade completed by mother *                                          
Were magazines, newspapers, and library cards available in home at age 14 *                                          
Parental encouragement to continue education past high school       * *                                  

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Years at current residence *                                          
Geographic mobility (details vary) * * * * * *     *     *     * * * * *      
Comparison of birthplace to current residence *                                          

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Current enrollment status * * * * * * * * * * * * * * * * * * * * * *
Highest grade completed * * * * * * * * * * * * * * * * * * * * * *
Reason stopped attending high school * * * * * *                                
Is current school public * * * * * *     *                          
High school curriculum * * * * * *                                
High school subjects enjoyed most and least *                                          
High school activities *                                          
Index of high school quality *                                          
Index of college quality     *   *                                  
College attended, highest degree received, field of study * * * * * * * * * * * * * * * * * * * * * *
College tuition (full-time amount) * * * * * *     *                          
College financial aid types, amount * * * * * * * * *                          
Reason R left college   * * * * *                                
Reason R's college plans have changed   * * * * *                                
Math courses in high school                       *                    

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Any training or educational program (did R take, did R complete, type, sponsor, reason took, duration, hours per week, reason not completed) * * * * * * * * *                 * * * * *
Other training or educational program (did R take, did R complete, type, apprenticeship program, sponsor, reason took, duration, hours per week)                   * * * * * * * * * * * * *
On-the-job training (did R take, did R complete, duration, hours per week attended)                   * * * * * * * * * * * * *
Program enrolled in at last interview (type, did R complete, duration)                     * * * * * * * * * * * *
Training used on current job (universes vary) * * * * * * * * * * * * * * * * * * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Comparison of R's condition with past     * *   *     *     *     * * * * * * * *
Does health limit work *   * *   * * * * * * * * * * * * * * * * *
Does health limit school activity *   * *   *                                
Does health limit housework *     *   *       * *   * *                
Duration of health limitations *   * *   * * * *     *     * * * * * * * *
Problematic activities (stooping, kneeling, and so forth)       *         *     *     * *   *       *
Problematic working conditions (noise, heat, and so forth)                 *     *     * *            
Accidents (on-the-job, how, when)                 *                          
Does health permit going outdoors, using public transportation, or personal care       *         *     *     *              
Does others' health limit R's work           *     *     *     *   *   * * * *
Insurance coverage of R and family members                             * * * * * * * *
Cigarette; alcohol use                               * * * * * * *
Height and weight (details vary)                               *   *        
Menopausal status and hormone use                                   * * * * *
Extent to which R drives an automobile                                   *        
Types of health conditions (cancer, diabetes, and so forth)                               *   * * * * *
Prescription drug expenses                                           *
Any medication for mental conditions                                           *
Any exercise during past month                                           *
Pet ownership                                           *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Marital status * * * * * * * * * * * * * * * * * * * * * *
Husband's attitude toward R's working *       *       *     *                    
Marital history           *     *   * * * * * * * * * * * *
New information or update on all children born or adopted           *     *     * * * * * * *       *
Number of dependents * * * * * *     *   * * * * * * *          
Parents (weeks worked, full-time, occupation) * * *   * *                 *              
Number and ages of children in household * * * * * * * * * * * * * * * * * * * * * *
Any children in college last 12 months; amount of support from R and spouse                               * * * *   * *
Childcare arrangements (type, cost) (universes and details vary) * * * * *   * * *     *     * * * *        
Number of children R expects and number R considers ideal       *   *     *   * * * * * *            
Family or household (starting in 1988) members: Relationship to R, sex, age, education, employment status * * * * * * * * * * * * * * * * * * * * * *
Unrelated household members: Relationship to R, age, sex                 *   * * * *                
Household activities: Responsibility, hours per week spent             *   *   * *   *                
Did R's husband ever have a unionized job                           *       *        
Did R's father ever have a unionized job                           *                
Responsibility for care of chronically ill or disabled                           *   * * * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Current labor force status                         * * * * * * * * * *
Usual weeks worked                                 *          
Firm size                                 *          
Covered by Social Security or Railroad Retirement                                 *         *
Covered by collective bargaining or union contract                                 * * * * * *
Is spouse or partner union member                                 * * * * * *
Job search activity in past month                                   * * * * *
Retirement plans, expectations, status                                 * * * * * *
Weeks and hours worked 1990-92                                 *          
Detailed information on employers since 1987 or since last interview (occupation, industry, class of worker, rate of pay, start and stop date, hours worked, shift worked)                                 * * * * * *
Unemployment of husband (weeks)                 * * * * * * * * * * * * * *
Husband's health limits work, limitations *   * *   *     *     *     *   * * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Husband's medical care in 12 months before death                                   * * * * *
How medical costs were paid                                   * * * * *
R's care of husband                                   * * * * *
Financial assistance to R from family members                                   * * * * *
Death benefits paid to R (amount, source, lump sum or periodic payment)                                   * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Total net family assets *     * * *     *     *     *   * * * * * *
Total family income * * * * * * * * * * * * * * * * * * * * * *
Income from farm or business * * * * * * * * * * * * * * * * * * * * * *
Wage or salary income * * * * * * * * * * * * * * * * * * * * * *
Unemployment compensation income * * * * * * * * * * * * * * * * * * * * * *
Supplemental unemployment benefits income                 *     *     * * * * * * * *
Disability income                 *     *     * * * * * * * *
Rental income                 *     *     * * * * * * * *
Interest or dividend income                 *     *     * * * * * * * *
Total market value of Food Stamps received                 * * * * * * * * * * * * * *
Income from AFDC/TANF                 *     *     * * * * * * * *
Income received from public assistance                 * * * * * * * * * * * * * *
Income from Social Security or Railroad Retirement                       *     * * * * * * * *
Pension income                                 * * * * * *
Alimony payments                   * * * * * * * * * * * * *
Child support payments                   * * * * * * * * * * * * *
Financial assistance received from others * * * * * *       * *   *       * * * * * *
Income from other sources * * * * * * * * * * * * * * * * * * * * * *

H = Respondent's husband

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Life status of R's parents, age *                           * * *   *   * *
Cause of death of R's parents                               *            
Life status of H's parents, age *                               *   *   *  
Health status of R's and H's parents                                 *   *   *  
Do R's or H's parents live in nursing home                                 *   *   *  
Marital status of R's and H's parents                                 *   *   *  
Distance R's and H's parents live from R                                 *   *   *  
Yearly income of R's and H's parents                                 *   *      
Do R's and H's parents own home; value                                 *   *      
Amount of R's and H's parents' assets and debts                                 *   *   *  
Transfers of time to R's and H's parents                                 *   *   *  
Transfers of money to R's and H's parents                                 *   *   *  
Transfers of time from R's and H's parents                                         *  
Transfers of money from R's and H's parents                                         *  
Did R's parents have will                                     *   *  
Amount of parents' estate                                     *   * *
Sex, age and date of birth, highest grade completed of R's and H's children                                       *   *
Relationship of child(ren) to R                                       *   *
Residence of child(ren) and distance from R                                       *   *
Do child(ren) and child(ren)'s spouse own home; value                                       *   *
Amount of child(ren)'s assets and debts                                       *   *
Transfers of time to and from child(ren)                                       *   *
Transfers of money to and from child(ren)                                       *   *
Does R have will; who are beneficiaries                                       *   *
If R has a mother in the Mature Women cohort:

Mother's marital status

                                      *    

Amount of mother's and mother's husband assets and debts

                                      *    

Transfers of time to and from R and mother

                                      *    

Transfers of money to and from R and mother

                                      *    

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
How R feels about job * * * * * *     * * * * * * * * * * * * * *
What R likes best and least about job * * * * * *     * * * *     *              
Attitude toward homemaking                 * *   *     *              
Would R continue to work if had enough money to live on     *   *       *     *     *              
Which is more important: high wages or liking work *         *           *                    
Attitude toward women working *       *       *     *     *              
Facet-Specific Job Satisfaction Index                   *                        
Would R like more education or training * * * * * *     *                          
Educational goal * * * * * *     *                          
What would R like to be doing when 35 years old * * * * * * * * * * * * *                  
What would R like to be doing when 50 years old and 5 years from now                     * * * *                
Knowledge of World of Work score   *                                        
Rotter Internal-External Locus of Control score (shortened version in 2001)     *     *     *     *     *           *  
CES-Depression Scale                                 * * * * * *
Way feeling these days                   * *   * * * * * * * * * *
IQ score *                                          
Discrimination ever experienced, type (expanded in 1988)         *       * * * *     *     *     *  
Has R progressed, held own, or moved backward           *     *     *                    
Attitudes toward retirement                                   * * * * *
Opinions on hypothetical Social Security reform                                           *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Would R accept * * * * * * * * *     *     *              
Hours per week would work * * * * * * * * *     *     *              
Rate of pay, kind of work required * * * * * * * * *     *     *              

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Did any unpaid volunteer work           *     *           * *           *
Hours per week worked, organization           *     *           * *           *
Why volunteered           *     *                         *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Expected age at retirement                                   *        
Characteristics of current employer's pension plan                               *   * * * * *
R's knowledge of employer's pension plan                                   * * * * *
Eligible for other pensions, type, number of years worked on jobs                                   *        
Eligible for spouse's benefits                                   * * * * *
Spouse eligible for other retirement benefits, type                                   *        
R and spouse have personal retirement plan                                   * * * * *
Sources of retirement income                                   *        
Retirement health insurance coverage                                   * * * * *
Detailed pension plan coverage                                   * * * * *

III. Environmental variables

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Region of residence (South or non-South) * * * * * * * * * * * * * * * * * * * * * *
Does R live in SMSA * * * * * * * * * * * * * * *              
Mover or nonmover status * * * * * * * * * * * * * * * * * * * * * *
Comparison of State, county, SMSA * * * * * * * * * * * * * * *              
Comparison of State, county                               * * * * * * *

Variable

68 69 70 71 72 73 75 77 78 80 82 83 85 87 88 91 93 95 97 99 01 03
Size of local area labor force * * * * * * * * * * * * * * *              
Local area unemployment rate * * * * * * * * * * * * * * *              
Index of demand for female labor * * * * * *                                
Accredited college in local area * * *                                      

Appendix C: How to Unpack Multiple Entries

Responses to multiple entry questions found in early years of the surveys of the four Original Cohorts were coded in a geometric progression format to conserve space on the tape. Variables such as 'Method of Seeking Employment,' 'Method of Finding Current or Last Job,' 'Type of Financial Aid Received,' 'Type of Child Care Arrangement,' and numerous health-related questions have been formatted in this way since the surveys began. Multiple entry items are identified by an asterisk under the source code box in the questionnaire and by a special detailed codeblock in the documentation. These responses need to be "unpacked" before they can be used in analysis.

In later survey rounds, choose-all-that-apply items are coded as a series of yes-no variables in the data set. For example, although the respondent would see a list of possible fringe benefits and select all that were available to her, in the data this question is represented as a series of questions like "Did your employer make medical benefits available," "Did your employer make paid vacation available," and so on.

The examples below are applicable to all Original Cohort multiple entry variables.

Example: Fringe Benefits variable

Codes for the Mature Women's variable R03380., 'Fringe Benefits at Current Job 77,' range from 1 (the respondent reported only one such benefit, "medical insurance") to 259 (the respondent reported "medical insurance," "life insurance," and "paid sick leave") to 1023 (the respondent reported that she had access to all of the benefits listed). Although there are several different ways to sort out which respondent has positive answers on which components, this appendix provides one example in SAS and one example in SPSS.

Program 1: Unpacking Mature Women cohort fringe benefits data in SAS

This SAS program unpacks fringe benefits from the variable "fringe." It creates 10 (dichotomous) dummy variables indicating the presence or absence of each of the 10 benefits. Each dummy is set to missing if fringe is missing (coded at -998 or -999). Note that the variables are created in reverse order from the codeblock, i.e., MEDICAL is code 1 on the tape and FRINGE10 in the program. The program statements listed below can be modified by the user to include the expanded set of fringe benefits available in later survey years as well as to unpack other multiple entry variables by extending the dummy, the counter, and the number of variables to agree with the total number of responses listed in the codeblock in the documentation.

Figure 1. SAS sample code

data benefits;
infile 'D:\documents\requests\unpack.dat' lrecl=4;
input
 R0338000 4.;
 if R0338000 = -998 then R0338000 = .;
 if R0338000 = -999 then R0338000 = .;
label R0338000 = "FRINGE BNFTS CUR_JOB_77";
array fringe fringe01-fringe10;
do over fringe; if R0338000 ne . then fringe=0; end;
all=R0338000;
if all ge 512 then do; fringe10=1; all=all-512; end;
if all ge 256 then do; fringe09=1; all=all-256; end;
if all ge 128 then do; fringe08=1; all=all-128; end;
if all ge  64 then do; fringe07=1; all=all- 64; end;
if all ge  32 then do; fringe06=1; all=all- 32; end;
if all ge  16 then do; fringe05=1; all=all- 16; end;
if all ge   8 then do; fringe04=1; all=all-  8; end;
if all ge   4 then do; fringe03=1; all=all-  4; end;
if all ge   2 then do; fringe02=1; all=all-  2; end;
if all ge   1 then do; fringe01=1; all=all-  1; end;
label fringe01='medical,surgi';
label fringe02='life insuranc';
label fringe03='a retirement ';
label fringe04='training/educ';
label fringe05='profit sharin';
label fringe06='stock options';
label fringe07='free'¦.meals';
label fringe08='free'¦..mdse';
label fringe09='paid sick lea';
label fringe10='paid vacation';
run;

Program 2: Unpacking Young Men cohort fringe benefits data in SPSS

The SPSS program works in the same way as the SAS program. Users of this alternative package can follow this template.

Figure 2. SPSS sample code

/* UNPACKING 1981 YOUNG MEN FRINGE BENEFITS: SPSS/
compute FB1=0
variable labels FB1 '81 NONE'
compute FB2=0
variable labels FB2 '81 FLEX HRS'
compute FB3=0
variable labels FB3 '81 PAID VACATION'
compute FB4=0
variable labels FB4 '81 PD SICK'
compute FB5=0
variable labels FB5 '81 FR MERCH'
compute FB6=0
variable labels FB6 '81 FR MEALS'
compute FB7=0
variable labels FB7 '81 STOCK'
compute FB8=0
variable labels FB8 '81 PROFT'
compute FB9=0
variable labels FB9 '81 TRED'
compute FB10=0
variable labels FB10 '81 RETR'
compute FB11=0
variable labels FB11= '81 LIFE'
compute FB12=0
variable labels FB12 '81 HLTH'
compute FB81a=FB81
variable labels FB81a 'VARIABLE FOR NONE'
do if (2048 le FB81)
compute FB1=1
compute FB81=FB81-2048
else
compute FB1=-4
end if
do if (1024 le FB81)
compute FB2=1
compute FB81=FB81-1024
else
compute FB2=-4
end if
do if (512 le FB81)
compute FB3=1
compute FB81=FB81-512
else
compute FB=-4
end if
do if (256 le FB81)
compute FB4=1
compute FB81=FB81-256
else
compute FB4=-4
endif
do if (128 le FB81)
compute FB5=1
compute FB81=FB81-128
else
compute FB5=-4
end if
do if (64 le FB81)
compute FB6=1
compute FB81=FB81-64
else
compute FB6=-4
end if
do if (32 le FB81)
compute FB7=1
compute FB81=FB81-32
else compute FB7=-4
end if
do if (16 le FB81)
compute FB8=1
compute FB81=FB81-16
else
compute FB8=-4
end if
do if (8 le FB81)
compute FB9=1
compute FB81=FB81-8
else
compute FB9=-4
end if
do if (4 le FB81)
compute FB10=1
compute FB81=FB81-4
else
compute FB10=-4
end if
do if (2 le FB81)
compute FB11=1
compute FB81=FB81-2
else
compute FB11=-4
end if
do if (1 le FB81)
compute FB12=1
compute FB81=FB81-1
else
compute FB12=-4
end if

Mature and Young Women Errata

NLS Investigator contains the most recent release of each NLS cohort. Known problems are found below. Corrections have been made to items noted in the Errata of prior releases. For further questions, please contact NLS User Services.

Young Women, incorrect codebook

The Young Women R12196.00 (1988 survey, question 113) has the incorrect description in the codebook. The data are correct and match the answer categories in the question text. This item will be corrected in the next release.

Mature and Young Women, incorrect title(s)

One variable in 2001 for the Mature and Young Women is incorrect.

  • R62437.0 TYPE OF DISCRIMINATION EXPERIENCED AT WORK SINCE LAST INT - RACE

Should correctly read:

  • R62437.0 TYPE OF DISCRIMINATION EXPERIENCED AT WORK SINCE LAST INT - DISABILITY, 2001

Titles for two 1989 Mature Women variables have been found in error. Currently, the titles indicate R (respondent), however, the variables actually refer to the husband. The variable reference numbers and correct titles are:

  • R09864.00 MAIN PENSION, HUSBAND - REDUCED BENEFITS RECEIVED AFTER EARLY RETIREMENT, 89 (MONTHLY)
  • R09866.00 MAIN PENSION, HUSBAND - BASED ON YEARS OF SERVICE/BALANCE IN ACCOUNT? 89

Problematic codebook distributions and frequencies

There are some variables for which the continuous code distribution in the codebook shows more cases than a frequency count of the actual data. In each of these instances the data are correct. The error appears only in the codebook.

Undocumented codebook skip patterns

There are a number of variables for which the codebook does not document all the possible skip patterns for that question. Similarly, information on the lead-in question is missing for some variables in the codebook. These codebook anomalies do not impact on the data or the questionnaire; the correct skip patterns are present in the questionnaire and the data reflect the correct skip patterns. If users feel that a skip instruction may be missing or incorrect for a variable not listed, they are urged to check the questionnaire for more details.

Implausible values

Some 1995 to 2001 data items may contain what appear to be implausible or unreasonable values. While these values may not be incorrect, they seem unusual. In the past, when a respondent had an unusual value for an item the archivist could refer to the respondent's paper questionnaire to determine if the data was the result of a data entry error. However, this is no longer possible with a CAPI instrument. Instead, the archivist has retained the values rather than blanking or revising them, leaving the researcher to decide how to treat such values.

1989 Mature Women pension data file

A problem has been found and corrected with the Mature Women 1989 supplemental pension data file.

The pension data was originally created by Michigan's ISR and is taken from the actual pension descriptions all large companies and government agencies are required to file with the Department of Labor. Using the cross-walk information included on the main MW data-set and the supplemental pension file it is possible to know the exact details of the pension's that cover many of the respondents.

The problem was that for 26% of the cases the data dictionary was incorrect. It appears that ISR wrote some (3/4) of the pension data using one output format and some (1/4) using a slightly different output statement.

The areas of the cases that were problematic contain no real data. Each of the 815 pension descriptors is a huge (19,000 bytes) record and large swaths of each record are empty. For example, the defined benefit plans have no data in areas reserved for tracking defined contribution plans. All of the problems occurred in the middle of a section that for those cases were filled with zeros and ".". However, to avoid any confusion we initialized the misaligned data and then wrote out a corrected data file that matches the data dictionary.

The corrected files are currently available on NLS Investigator and contain the word "fixed" in the file name.

Mature Women packed health variables

The Mature Women's health variables, R0164700, R0164800, R0376900, and R0377700 are available on the next release.

Mature Women Codebook Supplement

The Mature Women's Codebook Supplement consists of the following Appendices and Attachments. Click a link to view the corresponding file.

Please contact NLS User Services to obtain copies of Appendices 4, 6-9, 11-12, 19-22, and 37-43.

Young Women Codebook Supplement

The Young Women's Codebook Supplement consists of the following Appendices and Attachments. Click a link to view the corresponding file.

Please contact NLS User Services to obtain copies of Appendices 4, 6-7, 9-12, 20-21, 23-24, 26, 30, 32-34, and 37-43.

NLSW Documentation

All variables present on a main file data set (accessed through NLS Investigator) are documented via: (1) a cohort-specific codebook and (2) an accompanying codebook supplement. This section describes these components and discusses the important types of information found within each.

Codebook

The codebook is the principal element of the documentation system and contains information intended to be complete and self-explanatory for each variable in a data file. Codebook information can be viewed with the use of NLS Investigator by clicking on a variable's reference number once a list of variables has been selected.

Every variable is presented within the documentation as a block of information called a "codeblock." Codeblock entries depict the following information: a reference number, variable title, coding information, frequency distribution, reference to the questionnaire item or source of the variable, and information on the derivation for created variables. The codeblocks of many variables include special notes containing additional information designed to assist in the accurate use of data from that variable.

Codebooks are arranged by reference number. Variables are first grouped according to survey year. Within each survey year, those variables related to the interview (e.g., interview method, interview date, reason for noninterview, sampling weight, etc.) appear first, followed by variables picked up directly from the questionnaire and Information Sheet. In general, created and edited variables appear last, although the created environmental variables are grouped with variables related to the interview in the early survey years.

Important information: Codebooks and questionnaires

NLS codebooks are not a substitute for the questionnaires. Although these two pieces of documentation contain similar information, the questionnaires should be used to determine precise universe information.

Coding information

Each codeblock entry presents the set of legitimate codes that a variable may assume along with a text entry describing the codes. Users should note that coding information for a given variable in the NLS codeblock is not necessarily consistent with the codes found within the questionnaire or for the same variable across years. Use only the codebook coding information for analysis. The following types of code entries occur in NLS codeblocks:

Dichotomous variables

Dichotomous or yes/no variables that are uniformly coded "Yes" = 1, "No" = 0. Other dichotomous variables have frequently been reformulated to permit this convention to be followed.

Discrete variables

Discrete (categorical), as in the case of the categories in 'Activity Most of Survey Week 93':

  • 1 = Working
  • 2 = With a job, not at work
  • 3 = Looking for work
  • 4 = Going to school
  • 5 = Keeping house
  • 6 = Unable to work
  • 7 = Retired
  • 8 = Other

Continuous variables

Continuous (quantitative), as in the case of 'Hourly Rate of Pay at Current or Last Job 83 *KEY*.' These variables have continuous data, but the codebook presents a frequency distribution as in the sample codeblocks above for ease of use.

Combined quantitative-qualitative variables

Combined quantitative-qualitative, variables that are ostensibly quantitative but may have nonquantitative (categorical) responses, utilize integers equaling the actual values for the quantitative responses and 999 for the qualitative (categorical) response. For example, "YEAR STOPPED WORKING AT 1ST MOST RECENT JOB INTRVNG & LAST" is coded as follows:

  • 60 thru 73 = actual year
  • 999 = still working there

Multiple responses

In the early years of the surveys, response categories to multiple entry questions found in certain job search, child care, discrimination, or health questions were coded in a geometric progression. For example, more than one response to the question "Method of seeking employment to be used in next year" was possible. The response categories to that question were each assigned a value as follows:

  • 1 = Checked with public employment agency
  • 2 = Checked with private employment agency
  • 4 = Checked with employer directly
  • 8 = Checked with friends or relatives
  • 16 = Placed or answered ads
  • 32 = Other method

Multiple responses were then coded for each respondent by adding the individual codes, which yields a unique value for each combination. Such multiple entry variables were identified by an asterisk (*) next to the answer categories in the questionnaire. If a multiple entry has only a few unique combinations, the codebook will specify the exact combinations; those with many combinations need to be unpacked. See Appendix C: How to Unpack Multiple Entries to learn more about this process. After the 1989 (Mature Women) and  1991 (Young Women) surveys, this multiple entry practice was discontinued and all responses were coded as yes/no.

Important information: Geometric progression discontinued

After the 1989 survey, the practice of coding multiple entry variables in a geometric progression was discontinued and all responses were coded as yes/no. In this system, the question above would have six corresponding variables in the codebook, one for each response category. Codes of 1 and 0 would indicate whether the respondent answered positively for each category. Respondents who do not know or refuse to answer the question receive the appropriate missing value for all the variables that correspond to that question. Respondents who do not know or refuse to respond to just one category receive the appropriate missing value for the corresponding variable. The system for coding missing values in multiple response questions changed slightly in 1999. There are still separate variables for each response category, and respondents who do not know or refuse to respond to just one category are coded with the correct missing value for the corresponding variable. The difference is that, at the end of the series of variables, a new variable indicates that it is the final record for the series. In this variable, respondents who answered any or all of the category questions receive either a -8 or a 0 code, depending on the series, to indicate that they are done selecting response categories. In this variable, respondents who replied "don't know" to the entire series are coded as -2 and those who refused to answer the entire series are coded as -1. For some series, this final variable may have other options in addition to those described above.

Missing responses

Negative numbers are used to indicate that a respondent does not have a valid value for a particular variable. Different numbers indicate different reasons for nonresponse:

  • "Refusal" indicates that the respondent refused to answer a given question. These respondents are assigned a value of -1. This code is used for all interviews of this cohort.
  • "Don't know" indicates that the respondent did not know the answer to a given question. These respondents are assigned a value of -2. This code is used for all interviews of this cohort.
  • "Invalid skip" indicates that the respondent was not asked a question that she should have answered, usually due to programming or interviewer error. These respondents are assigned a value of -3. This code is only used consistently for CAPI interviews (1995-2003). CAPI is short for Computer-Assisted Personal Interviews.
  • "Valid skip" has slightly different meanings depending on survey year. In CAPI interviews (1995-2003), this code indicates that the respondent was skipped past the question intentionally, because she was not in the universe of respondents to whom that question applied. These respondents are assigned a value of -4. In paper and pencil interviews (PAPI), which were used from 1967-92, this code indicates either that the respondent is not in the applicable universe or there was some other error that resulted in a missing response (which generally would have resulted in an invalid skip code in a CAPI survey).
  • Finally, a "noninterview" value of -5 indicates that a respondent was not interviewed in that survey year. This code is used for all interviews of this cohort.

Important information: Missing values

The missing value codes described above are accurate for the 1999-2003 Mature Women and 1995-2003 Young Women data releases. In previous years, a more complicated system was used to indicate missing data in the PAPI interviews. Beginning in 1995, the missing values were reassigned using a standardized system that matches the Young Women's CAPI data as well as the other NLS cohorts. Beginning in 1999, the same process was applied to the Mature Women data. This standardization should make it easier to use the data in analysis. However, researchers using programs written for a previous release of the Mature and Young Women data may need to change the parts of their programming code related to missing values. Users who need more information about the codes previously used in order to make these adjustments should contact NLS User Services.

Three additional negative codes are used only with the Mature and Young Women's cohorts for particular types of nonresponse.

  • In questions dealing with usual hours per week worked, if the respondent reported that her hours varied, she was assigned a code of -6.
  • Women who had been widowed since the last survey were asked a series of questions regarding their husband's care and their financial situation since his death. A code of -7 was assigned to women whom the interviewer judged to be emotionally unable to answer these questions.
  • Some variables in multiple response question series include codes of -8, indicating that the respondent was done with the series.

Important information: Valid and invalid skips

In computer-assisted surveys, respondents are initially assigned a default code of -4 (valid skip) for all questions in the interview. Then the -4 codes are replaced by valid data. The -3 (invalid skip) codes must be inserted into the data as hand-edits when data archivists uncover skip pattern errors during the data cleaning process. Therefore, some respondents classified as valid skips may actually have skipped a question incorrectly. If researchers need to know the exact reason a question was not answered, they can examine the skip patterns and universes in the questionnaire to determine whether any additional respondents should have been identified as invalid skips.

Derivations

The decision rules employed in the creation of constructed variables have been included, whenever possible, in the codebook under the title "DERIVATIONS." This information is designed to enable researchers to determine whether available constructs are appropriate for their needs. In the 'Hourly Rate of Pay at Current or Last Job 83 *KEY*' example, the derivation describes in detail the items of the interview schedule used to create the variable. If the derivation is too lengthy to include in the codebook, the codeblock will instead refer users to the supplemental documentation item that contains variable creation information.

Frequency distribution

In the case of discrete (categorical) variables, frequency counts are normally shown in the first column to the left of the code categories. In the case of continuous (quantitative) variables, a distribution of the variable is presented using a convenient class interval. The format of these distributions varies.

Questionnaire item

"Questionnaire item" is a generic term identifying the source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item appearing within one of the survey instruments. Questionnaire item identifications are located in the extreme right hand column of the codebook. The question number, when available, is copied exactly from the questionnaire.

During PAPI interview years, all created variables have a question name of simply "CV." Created variables in CAPI survey years usually include the letters CV in the question name and usually have the word *KEY* in their title.

Valid values range

Depicted below the frequency distribution are the maximum and minimum fields, which define the range of valid values (the upper and lower limits) for a given question. "MINIMUM" indicates the smallest recorded value exclusive of nonresponse codes; "MAXIMUM" indicates the largest recorded value. In the case of the 'Hourly Rate of Pay' example, the maximum, or highest value recorded, is 9815 with two implied decimal places, or $98.15.

Topcoding income and asset values

Confidentiality issues restrict release of all income and asset values. To ensure respondent confidentiality, income variables exceeding particular limits are truncated each survey year so that values exceeding the upper limits are converted to a set maximum value. These upper limits vary by year, as do the set maximum values. From 1968 through 1971, upper limit dollar amounts were set to 999999. From 1972 through 1980, upper limit variables were set to maximum values of 50000, and in 1982 and 1983 the set maximum value was 50001. Beginning in 1985, income amounts exceeding $100,000 were converted to a set maximum value of 100001.

From the cohort's inception, asset variables exceeding upper limits were truncated to 999999. Beginning in 1983, assets exceeding one million were converted to a set maximum value of 999997. Starting in 1993, the Census Bureau also topcoded selected asset items if it considered that the release of the absolute value might aid in the identification of a respondent. This topcoding was conducted on a case-by-case basis with the mean of the top three values substituted for each respondent who reported such amounts.

Codebook supplements

Variable creation procedures and supplemental coding information are provided within each cohort's Codebook Supplement. There are separate codebook supplements for the Mature Women and Young Women cohorts. Choose a cohort below to review the corresponding codebook supplement:

NLS Investigator

Mature and Young Women cohort variables (as well as the variables from the other NLS cohorts) are accessed using NLS Investigator, which is available as a Web application. The main application of NLS Investigator is to access NLS variables for the purposes of identifying, selecting, extracting, and/or running frequencies or cross-tabulations. This interface allows the researcher to connect to a database and perform variable extractions without installing any software on a local computer.

Through a personal online account, a researcher's selected variable tagsets, frequencies, and extracts are available for a specified period of time from any computer location with Web access. A tagset is a collection of specific variables saved by the user for use at a later date. Because there is one central data source for all users, researchers will have the assurance that they are always working with the most up-to-date data, and that any necessary corrections will be immediate and universal.

Need help with NLS Investigator?

  1. Access Mature and Young Women variables by connecting to NLS Investigator.
  2. Get help using NLS Investigator through the Investigator User Guide.
  3. Learn how to perform efficient NLS Investigator searches with the tutorial: Variable Search in the NLS Investigator.

Sample Weights

This section is divided into a description of the procedures used to develop sample weights and a discussion of the practical application of these weights. Before using NLS data in an analysis, the user should consult the practical usage discussion to determine when weighting of data is appropriate. Sample-based weights are designed to reflect the underlying population in the year in which the cohort was initially surveyed. Individual weights are assigned after each interview; these weights produce group estimates that are demographically representative of each cohort's base-year population when used in tabulations. Sampling weights for each respondent can be found on the corresponding public data release. For the 2003 release (Young Women) the cross sectional weights were revised because some respondents who were originally coded as "can't locate" were later found to be deceased.

Important information: NLS Custom Weights

  1. Researchers should note that like the cross-sectional weights in the data file, the longitudinal weights have two implied decimal places. This means that before using either type of weight, researchers should divide the number by 100 to know how many people each respondent represents.
  2. A custom weighting program is available for the Mature Women and Young Women cohorts. Users can create longitudinal weights across multiple survey rounds by either choosing survey Years or by entering a list of respondent IDs.

Base-year sampling weights

Population data derived from the NLS are based on multi-stage ratio estimates. The first step was to assign each sample case a basic weight consisting of the reciprocal of the final probability of selection. This probability reflects the differential sampling by race within each stratum. The base-year weights for all those interviewed were adjusted to account for the overrepresentation of blacks in the sample as well as for persons selected after screening who were not interviewed in the initial survey. This adjustment was made separately for each of:

  • Mature Women. 16 groupings based on the four Census regions (Northeast, North Central, South, and West), race (non-black/black), and urban/rural residence.
  • Young Women. 24 groupings based on the four Census regions (Northeast, North Central, South, and West), race (non-black/black), and three place of residence groupings (urban, rural farm, and rural non-farm).

In the first stage of ratio weight adjustment, differences at the time of the 1960 Census between the distribution by race and residence of the population as estimated from the sample PSUs and that of total population in each of the four major regions of the country were taken into account. Using 1960 Census data, estimated population totals by race and residence for each region were computed by appropriately weighting the Census counts for PSUs in the sample. Ratios were then computed between these estimates (based on sample PSUs) and the actual population totals for the region as shown by the 1960 Census.

In the second stage ratio adjustment, sample proportions were adjusted to independent current estimates of the civilian noninstitutionalized population by age, sex, and race. These estimates were prepared by carrying forward the most recent Census data (1960) to take account of subsequent aging of the population, mortality, and migration between the United States and other countries (Census Bureau 1966). The adjustment was made by race within three Mature Women age groups and five Young Women age groups.

Sampling weight nonresponse adjustment

Since the initial interview, reductions in sample size have occurred due to noninterviews. To compensate for these losses, the sampling weights of the individuals who were interviewed have been revised. The Mature and Young Women cohort is a panel of individuals into which no new individuals were added after the base year. As a result, all reweighting after the initial survey was calibrated to base-year population parameters. This revision was done in two stages. First, out-of-scope noninterviews in each year were identified by the Census Bureau and eliminated from the sample of noninterviews. This group consisted of individuals who were institutionalized, had died, were members of the armed services, or had moved outside the United States--that is, individuals who were no longer members of the U.S. noninstitutionalized civilian population. (Note: In 2003, an attempt was made to interview some of the institutionalized respondents).

The second stage in the adjustment acknowledges the possible nonrepresentative characteristics of the in-scope interviews. For each survey year, those who are eligible but not interviewed, as well as those who are interviewed, were distributed into:

  • Mature Women. 24 nonresponse adjustment cells based on race (non-black/black), length of residence in the United States at first interview (nine or fewer years, ten or more years, N/A), and education (N/A, eight or fewer years, nine to eleven years, twelve or more years) reported in 1967.
  • Young Women. 30 nonresponse adjustment cells based on race (non-black/black), length of residence in the United States at first interview (nine or fewer years, ten or more years, N/A) and father's occupation (white collar, service, blue collar, farm, N/A) reported in 1968.

Within each of the cells, the base-year sampling weights of those interviewed were increased by a factor equal to the reciprocal of the reinterview rate (using base-year weights) in that year.

In 1991, NLS staff began investigating the effects of differential nonresponse on sampling weights as then calculated. The original weighting routine was designed to minimize an increase in variance caused by large weights for individuals with certain characteristics. One effect of this original procedure was that certain subsegments of the sample were assigned identical sampling weights. NLS staff adjusted the weights to avoid this problem.

Practical usage

The Mature and Young Women cohorts were based upon stratified, multi-stage random samples with an oversample of blacks. Each case in each interview year was assigned a weight specific to that year. This weight can be interpreted as an estimate of the number of people in the corresponding population that the individual in the sample represents. This section discusses some ramifications of the weights when used for data analysis.

To tabulate characteristics of the sample (i.e., sample means, totals, or proportions) for a single interview year in order to describe the population being represented, it is necessary to weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by women age 14-24 as of December 31, 1967, researchers would simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not produce the intended result, include:

Samples generated by dropping observations with item nonresponses

Users often confine their analysis to subsamples of respondents who provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don't knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes a small proportion of the variables under analysis, population estimates (i.e., weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort.

Data from multiple waves

Because the weights are specific to a single wave of the study, and because respondents occasionally missed an interview but were contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, the weights for a respondent in different years may occasionally be quite dissimilar, leaving the user uncertain about which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. If the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used. Users with a more complex sample selection often can obtain reasonably accurate results by using the base-year weights.

Regression analysis

A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may lead to incorrect estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership. If one wishes to compute the population average effect of, for example, education upon earnings, one may simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts.

Researchers unsure of the appropriate grouping may wish to consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would be misspecified.

Custom weighting program

Every Mature and Young Women survey contains a created variable that is the respondent's cross-sectional weight. Using these weights provides a simple method for users to correct the raw data for the effects of over-sampling of blacks and the initial clustering of respondents at the survey's beginning. Unfortunately, while each set of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years' worth of data. Users analyzing more than one year of Mature and Young Women's data should use longitudinal weights, which improve a researchers' ability to accurately calculate summary statistics from multiple years of data.

Users can create longitudinal weights for the Mature and Young Women by going to the Custom Weighting page. To create a set of custom weights, users select the survey years corresponding to their research and pick the "Download" button. The custom weighting program will generate a set of longitudinal weights and open a download dialog box so that users can save the weights to their computer. The resulting file contains two columns of data, with the columns separated by a blank space. The first column is the public identification (ID) number of each respondent. The second column is the weight. If the respondent did not participate in every survey checked off, then the respondent is given a weight of zero. If the respondent did participate, she is given a positive longitudinal weight.

The custom weighting program is an Internet version of the program used to create the cross-sectional weights for the original cohorts since the 1990s. The primary difference between the cross-sectional and longitudinal weighting programs is in how the list of respondents is created. In the cross-sectional case the weighting program is given a list of all people who participated in a particular survey round. In the longitudinal case the weighting program creates a "dummy" survey round where the user specifies who participated and who did not. This "dummy" round is based on the set of surveys selected. It then calculates which respondents participated in every survey round chosen by the researcher and uses that list to generate weights.

The original cohorts weighting is derived from the base year weights via a two-step process. First, all out-of-scope noninterviews, which are respondents who have died, been institutionalized, or moved outside the U.S. are eliminated from the pool of respondents who are classified as noninterviews. Second, those who are in-scope, whether or not they do an interview, are distributed into 24 cells based on race (black/non-black), length of residence at the time of the first interview (nine or less years, ten or more years, or unknown) and education (eight or less years, nine to eleven years, twelve or more years, or unknown).

These cells are then examined to see if the cells have too few respondents. If a cell has too few respondents, it is collapsed with an adjoining cell. Once the optimal number of cells is created, all of the weights associated with respondents in a particular cell are totaled. These totals are then divided to create an adjustment factor. This adjustment factor is then multiplied by each respondent's base year weight, which results in the custom longitudinal weight for a respondent.

Reference

Census Bureau. Current Population Reports. Series P-25, No. 352, November 18, 1966.

Subscribe to NLSW -Mature and Young Women