Types of Variables

Types of Variables

Four types of variables are present in the Older Men and Young Men data files. The type of variable affects the title or variable description which names each variable and the physical placement of the variable within the codebook. Types of variables include:

  1. Direct raw responses from a questionnaire or other survey instrument.
  2. Edited variables constructed from raw data according to consistent and detailed sets of procedures (e.g., occupational codings, *KEY* variables, etc.).
  3. Constructed variables based on responses to more than one data item either cross-sectionally or longitudinally and edited for consistency where necessary (e.g., highest grade completed). Note: In general, the NLS does not impute missing values or perform internal consistency checks across waves. Data quality checks most often occur in the process of constructing cumulative and current status variables.
  4. Variables provided by the Census Bureau or another outside organization based on sources not directly available to the user (e.g., characteristics of respondents' geographical areas).

This section describes the organization of variables within the data files and explains how to use reference numbers and variable titles while navigating the data set.

Reference Numbers

Every variable within the main NLS data set has been assigned an identifying number that determines its relative position within the data file and documentation system. Persons contacting NLS User Services should be prepared to discuss their question or problem in relationship to the reference number(s) of the variable(s) in question.

Reference numbers, once assigned, remain constant through subsequent revisions of the files. Reference numbers are assigned sequentially, with variables from the first survey year having a lower reference number than those variables specific to the second year, and so forth. Occasionally, variables are created sometime after the year in which the data were actually collected. These variables are frequently given a reference number that reflects the year in which the actual data were gathered rather than the year the created variable was constructed. Table 1 lists reference numbers for each survey year since 1966 for the Older Men and Young Men.

Table 1. Reference Numbers by Survey Year

Survey Year Reference Numbers Survey Year Reference Numbers
1966 R00001.-R00585. 1966 R00001.-R00633.
1967 R00635.-R01075. 1967 R00635.-R01149.
1968 R01100.-R01147. 1968 R01150.-R01734.
1969 R01155.-R01626. 1969 R01736.-R02312.01
1971 R01629.-R02540. 1970 R02315.-R03018.
1973 R02541.-R02688.75 1971 R03021.-R03914.
1975 R02689.-R02850.25  1973 R03920.-R04115. 
1976 R02857.-R03714. 1975 R04126.-R04357.
1978 R03726.-R04059. 1976 R04375.01-R05456.50
1980 R04064.-R04462. 1978 R05468.10-R05918.
1983 R05485.-R05994. 1980 R05955.-R06818.
1990 (Sample Person Questionnaire) R06001.-R07098.  1981  R06820.-R08118.
1990 (Widow Questionnaire) R07101.-R07871.    

Variable Titles

Every variable within NLS main file data sets has been assigned a summary title that serves as the verbal representation of that variable throughout the hard copy and electronic documentation system. Variable titles were assigned by CHRR archivists who endeavored, within the limitations described below, to capture the core content of each variable and to incorporate within the title: (1) Key words that facilitate easy identification of comparable variables; (2) universe identifiers that specify the subset of respondents for which each variable is relevant; and (3) for some variables, reference periods that indicate the period of time (e.g., survey year or calendar year) to which these data refer.

User Notes

In the 1990 Older Men survey, the original respondent is referred to as the sample person, to distinguish between the original male respondent and the widows who also responded to the survey. In the 1990 variable titles, this was originally abbreviated as "SP." However, this caused confusion, because in other NLS data sets and in common practice, SP is an abbreviation for spouse. For the final release of the Older Men data, survey staff changed the variable titles so that, instead of SP, they now use "R" for "respondent." Users should bear in mind that this refers to the original respondent and not necessarily to the person who actually answered the question.

Universe Identifiers: If two ostensibly identical variables differ only in that they refer to different universes, the variable title will include a reference to the applicable universe.

Example 1: Universe identifiers are particularly important in the 1990 survey of Older Men. In this survey, "R" is used to indicate the sample person (the original respondent) and "W" indicates that the widow answered the question. All widow questions are marked as such; if there is no identifier, then the question was addressed to the sample person.

'Year Started Working at Current or Last Job 90' contains the sample person's report about the start date of his most recent job.

'Year Started Working at Last Job R Held, 90 (W)' contains the widow's report about the start date of the sample person's last job before his death.

'Year Started Working at Current or Last Job 90 (W)' contains the widow's report about the start date of her most recent job.

Reference Periods: Variable descriptions may include a phrase indicating the time period to which these data refer. The following general conventions apply:

Survey Year: When the variable title includes either the phrase XX INT (81 INT) or the year (e.g., 76) without the year being preceded by the preposition "IN," this indicates the survey year in which that variable was measured, not necessarily the year to which it applies.

Example 2: 'Move to Current Residence - Year of (Last) Move, 81' (Young Men) refers to a residential move described during the 1981 interview.

Example 3: '# of Weeks Worked in Past Year, 76' (Older Men) refers to the weeks worked in the 12 month period preceding the 1976 survey.

Calendar Year: When a date follows a verbal description of a variable and is part of the prepositional phrase "in XX," the date identifies the calendar year for which the relevant information was collected.

Example 4: 'Household Record - Family Member # 2: Occupation in 66 (Age 14+) 67' (Young Men) reports the occupation of the family member during calendar year 1966 as reported during the 1967 interview.

Example 5: 'Income from Social Security in 70 - R' (Older Men) refers to payments the respondent received in calendar year 1970 and reported during the 1971 survey.

User Notes

Searches for NLS variables are essentially searches for variable descriptions or titles. Electronic searches of NLS variables via NLS Investigator ultimately produce listings of variables by their reference number and variable description or title.

Flexibility in variable title assignment for raw data items is restricted by (1) the actual wording of the question as it appears within the survey instrument; (2) precedent, i.e., how that type of variable has been titled in previous survey years; and (3) in early years, a shorter allowable length for variable titles. An attempt is also made to include key phrases in variable titles so that large groups of variables with similar or related subject matter can be easily identified.

Users should be careful not to assume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier and reference period conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s).

Variables with similar content (e.g., information on respondents' labor force status) may have completely different titles, depending on the type of variable (raw versus created).

Example 6: 'Employment Status Recode' (ESR) is the created or reconstructed version of the 'Activity Most of Survey Week' raw variable. The 'Activity' variable is derived from the first item of the full series of questions used by the Department of Labor (DOL) to obtain employment status; the title reflects questionnaire content. ESR, on the other hand, reflects the procedure used to recode the 'Activity' variable. This produces a constructed variable for all NLS respondents based upon responses to the 'Activity' question and all other questions used by the DOL to obtain employment status. These other questions serve to qualify and refine employment status beyond the answer to the initial 'Activity' question.

Finally, different archivists over a period of three decades performed the task of assigning variable descriptions to data from the NLS cohorts. While every effort has been made to maintain consistency, users may find some differences in variable titles. Two primary sources of variation exist in Original Cohort variable title assignment. The first is systematic error in which identical questions may have the same question wording across the four Original Cohorts but slightly different variable titles. The rule was to make title consistency within a cohort of highest priority. The second variation is attributed to spacing or punctuation errors. The sorting process that produces variable title listings usually places these variables near if not next to the series of interest.

How Mode of Interview Affects Question Documentation

There are important differences between the content of telephone and personal interviews. In the late 1960s and early 1970s, most of the interviews were conducted in person, usually at the respondent's home. There was one attempt at a mail survey in 1968 for the Older Men and the Mature Women; however, the low response rate led to dropping that type of contact. After the first five years, the decision was made to conduct a major survey every five years and two telephone surveys during the five-year span so that problems of recall could be avoided and contact could be maintained with the respondents.

Differences in what appear to be comparable variables reflect variations in the wording of the question or the fact that the reference period for an identically worded question may be different in a personal versus a telephone interview. Questions that refer to the last five years were usually found in a personal (or five-year) interview. This difference means that some questions were only asked in the five-year surveys and some were asked only in the telephone surveys. Users conducting longitudinal analysis need to change their variable creation procedures to account for the differences in data collection between the early years of uninterrupted personal interviews and subsequent survey years when telephone interviews were used.

When analyzing data, users should remember that not all surveys were conducted during the same season of each survey year. Responses to labor force status questions, for example, may differ significantly if fielding occurred during the summer versus winter months.