Skip to main content

Custom Weighting Program Documentation

Overview

Every NLS data release includes a set of cross-sectional weights. These weights allow users to adjust the raw data for over-sampling, clustering, and differential base year participation. While these weights accurately adjust data for individual years, they do not provide an accurate method for adjusting data across multiple years.

The Custom Weighting Program allows researchers to create a set of customized longitudinal weights, which improves their ability to accurately calculate summary statistics across multiple years of data. In addition to creating customized weights by year, the program offers the option to retrieve weights for a specific set of respondent IDs.

After each field period, a set of round-specific survey weights are produced. The custom weighting program simply creates a temporary list of individuals who meet the selected criteria from each cohort's custom weighting page (links are provided in the Details section below). This list is then weighted as if the individuals had participated in a survey round. The weights for this temporary list are the output of the custom weighting program.

Details

The weight calculation program is based on the existing weighting algorithms and data that create the round specific weights. Conceptually, using a specific list of individuals is identical to calculating weights in non-base year survey rounds. In non-base year rounds, some individuals participate and others do not.

The text below uses the NLSY79 as an example to describe how the custom weighting program works. Additional information about the construction of the base year sample weights for each of the NLS cohorts can be found in each cohort's section of the website using the links below:

Adjustments to the base-year sampling weights for each cohort are then made in the fashion described below for the NLSY79. If additional information is needed about a particular cohort, please contact NLS User Services.

The NLSY79 technical sampling report describes the six-step process used for computing the survey's weights.

  1. Computation of a base weight, reflecting the case's selection probability for the screening sample
  2. Adjustment for nonresponse to the screener
  3. Adjustment of the weights resulting from the second step to reflect any subsampling following the screener, such as for race/ethnicity
  4. Development of a combination weight to allow the black and Hispanic cases from the cross-sectional sample to be merged with those from the supplemental sample
  5. Adjustment of the weights for nonresponse to the main interview(s)
  6. Post-stratification of the nonresponse-adjusted weights

While these steps seem complicated, the actual process of weighting a particular round or the custom sample is relatively simple since the program only deals with step 6. Steps 1 to 5 were all done to calculate the base year (1979) set of survey weights.

To do the sixth step each NLSY79 respondent is given two weights. The first is called the target weight and the second is called the preliminary weight. The sum of the target weights for all people in a particular group (for instance: Hispanic Males age 20 in 1978) are survey staff's best estimate for the size of this group during 1978. The preliminary weights are more complicated but are basically a number that reflects all the adjustments created in steps 1 through 5.

To create a custom weight, the NLSY79 respondents are broken into fine groups which try to partition people based on race, sample group, age, military service and other factors. The goal is to group respondents into units of at least 20 people for the civilian sample and 15 people for the military sample. Much of the custom weighting program is devoted to automating the creation of these small groups, which are also called cells.

Once the optimal number of cells are created, all of the target and preliminary weights associated with respondents in a particular cell are totaled. These totals are then divided to create an adjustment factor. This adjustment factors is then multiplied by each respondent's preliminary weight calculated in the base year. This adjustment of the base year weights results in the custom weight for a respondent.

After running the custom weighting program an output file is created. This file has two variables separated by a space; the respondent's id and custom weight. All custom longitudinal weights, just like the cross-sectional weights, have an implied 2 decimal points. Hence, if you want to know how many people one person represents you must divide everything by 100 to get the real number). A value of zero (0) means the respondent is not included/out of the survey.