Sample Design and Screening Process

Sample design
Screening process
Sampling process

Sample design

Each of the original NLS samples was designed to represent the civilian noninstitutionalized population of the United States at the time of the initial survey. The Older Men cohort includes individuals who were ages 45-59 as of March 31, 1966, and the Young Men cohort consists of respondents ages 14-24 as of the same date.

Each cohort is represented by a multi-stage probability sample originally drawn by the Bureau of the Census from 1,900 primary sampling units (PSUs) that had originally been selected from the nation's counties and cities for the experimental Monthly Labor Survey, conducted between early 1964 and late 1966. A primary sampling unit consists of Standard Metropolitan Statistical Areas (SMSAs), counties (or parishes in some states), parts of counties (parishes), and independent cities. A total of 235 sample areas comprising 485 counties and independent cities were chosen to represent every state and the District of Columbia.

From the sample areas, 235 strata were created of one or more PSUs that were relatively homogeneous according to socioeconomic characteristics. Within each of the strata, a single PSU was selected to represent the stratum. Finally, within each PSU, a probability sample of housing units was selected to represent the civilian noninstitutionalized population. Because the addresses for the sample frame came from the 1960 Census, respondents are covered by Title 13 confidentiality restrictions. Variables linked to geographic residence, including county and state, are available for use at Census Data Centers.

Restricted-use data

Information about access to restricted-use geographic and school survey data is available on the Accessing Data page.

Screening process

The initial sample of about 42,000 housing units for all four NLS Original Cohorts was selected and screening interviews took place in March and April 1966. Of this number, about 7,500 units were found to be either vacant, occupied by persons whose usual residence was elsewhere, changed from residential use, or demolished. On the other hand, about 900 additional units were found created within existing living space or changed from what had been nonresidential space. A total of 35,360 housing units were available for interview, from which usable information was collected for 34,662 households, for a completion rate of 98.0 percent.

The original plan called for using the initial screening to select all four Original Cohorts. However, after the sample members for the Older Men were chosen, the sample was rescreened in September 1966 before the initial interview of the Young Men. This decision was made because a seven-month delay between the screening and first interview seemed inordinate due to the mobility of Young Men in their late teens and early twenties. To increase efficiency, it was decided to stratify the sample for the rescreening by the presence or absence of a 14- to 24-year-old male in the household. The probability was high that a household that contained a 14- to 24-year-old in March would also have such a member in September. However, to insure that the sample also represented persons who had moved into sample households in the intervening period, a sample of addresses that previously had no 14- to 24-year-old males was also included in the screening operation. Since a telephone number had been recorded for most households at the time of the initial interview, every attempt was made to complete the short screening interview by telephone. The sample of households obtained through rescreening for young men was subsequently used to obtain the two samples of women ages 30-44, the Mature Women, and ages 14-24, the Young Women (Shea, Roderick, Zeller and Kohen 1971).

Important information: Screening process

During the screening process a large number of multiple respondent households were designated for interview; more than half of respondents in the Mature Women, Young Women, and Young Men cohorts and one third of respondents in the Older Men cohort originated from multiple respondent households (i.e., a household with at least one other respondent). For more information on multiple respondent households and on the types of relationships that existed between respondent pairs (e.g., spouse, sibling, etc.), see the Household Composition section.

Sampling process

The sample was designed to provide approximately 5,000 respondents--about 1,500 blacks and 3,500 non-blacks--for each of the men's cohorts. The men were sampled differentially within four strata: whites in predominantly white enumeration districts (EDs), blacks in predominantly black EDs, whites in predominantly black Eds, and blacks in predominantly white EDs. The sampling rate of households in predominantly black EDs was between three and four times that for households in predominantly white EDs in order to meet the survey requirement of providing separate reliable statistics for black respondents; the sample design called for oversampling of blacks at twice the expected rate in the total population. An enumeration district is a geographical area considered to be an appropriate size for an interviewer to complete all the necessary interviews within a prescribed time frame.

Following the initial household interview and screening operation, 5,518 men ages 45-59 as of March 31, 1966, were designated to be interviewed. After rescreening, 5,713 young men ages 14-24 as of March 31, 1966 were designated for interview. Initial interviews with both of the men's cohorts occurred in 1966. Among the individuals designated for interview, 5,020 or 91.0% of the Older Men and 5,225 or 91.5% of the Young Men were interviewed in 1966.

Important information: Sampling process

Initially, 5,027 Older Men respondents were interviewed. When the data were reviewed, it was discovered that the data file contained 5,034 records and that 7 men had duplicate records. These men were dropped from the sample. Due to technical considerations related to the use of data tapes, survey staff did not remove their records from the data set but rather assigned a value of "not available" (-128 or -999) for all variables for all 14 records. Therefore, although the cohort contains 5,020 respondents, the data actually include 5,034 observations. In other words, although a given variable may include data for 5,034 observations, 14 of these are dropped cases and will have a value of "not available." The dropped cases have the following identification numbers (R00001.): 693, 694, 809, 810, 903, 904, 1146, 1147, 1237, 1238, 3436, 3437, 5010, and 5011. For every variable except R00001., all of these cases have missing values.