Types of Variables

Types of Variables

 

Young Women Types of Variables

Types of Variables

Four types of variables are present in Young Women data files. The type of variable affects the title or variable description which names each variable and the physical placement of the variable within the codebook. Types of variables include:

  1. Direct raw responses from a questionnaire or other survey instrument.
  2. Edited variables constructed from raw data according to consistent and detailed sets of procedures (e.g., occupational codings, *KEY* variables, etc.).
  3. Constructed variables based on responses to more than one data item either cross-sectionally or longitudinally and edited for consistency where necessary (e.g., highest grade completed). Note: In general, the NLS does not impute missing values or perform internal consistency checks across waves. Data quality checks most often occur in the process of constructing cumulative and current status variables.
  4. Variables provided by the Census Bureau or another outside organization based on sources not directly available to the user (e.g., characteristics of respondents' geographical areas).

Variable Documentation

Reference Numbers

Every variable within the main NLS data set has been assigned an identifying number that determines its relative position within the data file and documentation system. Persons contacting NLS User Services should be prepared to discuss their question or problem in relationship to the reference number(s) of the variable(s) in question.

Reference numbers, once assigned, remain constant through subsequent revisions of the files. Reference numbers are assigned sequentially, with variables from the first survey year having a lower reference number than those variables specific to the second year, and so forth. Occasionally, variables are created sometime after the year in which the data were actually collected. These variables are frequently given a reference number that reflects the year in which the actual data were gathered rather than the year the created variable was constructed.

Table YW1 lists reference numbers for each survey year since 1968 for the Young Women.

Table YW1. Young Women Reference Numbers by Survey Year

Survey Year Reference Numbers   Survey Year Reference Numbers
1968 R00001.00-R00811.0 1983 R08019.00-R09452.00
1969 R00851.00-R01405.00 1985 R09461.00-R10609.00
1970 R01451.00-R02312.01 1987 R10616.00-R11069.00
1971 R02518.00-R03323.13 1988 R11080.00-R12313.00
1972 R03331.00-R04149.55 1991 R12315.00-R13629.00
1973 R04150.00-R05100.00 1993 R13640.00-R15804.00
1975 R05175.00-R05451.00 1995 R16007.00-R34923.00
1977 R05467.00-R05857.00 1997 R34950.00-R42503.00
1978 R05860.00-R07052.00 1999 R42527.00-R54394.00
1980 R07061.00-R07547.00 2001 R54400.00-R63391.00
1982 R07550.00-R08018.00 2003 R65000.00-R90510.00

Variable Titles

Every variable within NLS main file data sets has been assigned a summary title that serves as the verbal representation of that variable throughout the hard copy and electronic documentation system. Variable titles are assigned by CHRR archivists who endeavor, within the limitations described below, to capture the core content of each variable and to incorporate within the title (1) common words that facilitate easy identification of comparable variables; (2) UNIVERSE IDENTIFIERS that specify the subset of respondents for which each variable is relevant; and (3) for some variables, REFERENCE PERIODS that indicate the period of time (e.g., survey year or calendar year) to which these data refer. Universe identifiers and reference periods are discussed below.

Universe Identifiers: If two ostensibly identical variables differ only in that they refer to different universes, the variable title will include a reference to the applicable universe.

Example 1: 'Rate of Pay Required To Accept a Job (Unemployed 68)'
                 'Rate of Pay Required To Accept a Job (OLF 68)'

Reference Periods: Variable descriptions may include a phrase indicating the time period to which these data refer. The following general conventions apply:
Survey Year: When the variable title includes either the phrase XX INT (83 INT) or the year (e.g., 68) without the year being preceded by the preposition "IN," this indicates the survey year in which that variable was measured, not necessarily the year to which it applies.

Example 2: 'Move to Current Residence - Prior Region, 83 INT' refers to a residential move described during the 1983 interview.

Example 3: '# of Weeks Worked in Past Year, 68 *KEY*' refers to the weeks worked in the 12-month period preceding the 1968 survey.

Calendar Year: When a date follows a verbal description of a variable and is part of the prepositional phrase "in XX," the date identifies the calendar year for which the relevant information was collected. The title in Example 4 refers to occupation in 1968, with the data collected in the 1969 survey.

Example 4: 'Household Record - Family Member #1: Occupation in 68 (Age 14+) 69.'

Note that survey staff began using 4-digit years in question titles in 1995.

User Notes

Searches for NLS variables are essentially searches for variable descriptions or titles. Electronic searches of NLS variables via NLS Investigator ultimately produce listings of variables by their reference number and variable description or title.

Flexibility in variable title assignment for raw data items is restricted by (1) the actual wording of the question as it appears within the survey instrument; (2) precedent, i.e., how that type of variable has been titled in previous survey years; and (3) in early years, a shorter allowable length for variable titles. An attempt is also made to include key phrases in variable titles so that large groups of variables with similar or related subject matter can be easily identified.

Users should be careful not to assume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier and reference period conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s).

Variables with similar content (e.g., information on respondents' labor force status) may have completely different titles, depending on the type of variable (raw versus created).

Example 1: 'Employment Status Recode' (ESR) is the created or reconstructed version of the 'Activity Most of Survey Week' raw variable. The 'Activity' variable is derived from the first item of the full series of questions used by the Department of Labor (DOL) to obtain employment status; the title reflects questionnaire content. ESR, on the other hand, reflects the procedure used to recode the 'Activity' variable. This produces a constructed variable for all NLS respondents based upon responses to the 'Activity' question and all other questions used by the DOL to obtain employment status. These other questions serve to qualify and refine employment status beyond the answer to the initial 'Activity' question. (Note that ESR has been replaced by a similar variable, MLR, beginning in 1995; see the Labor Force Status section for details.)

Finally, different archivists over a period of three decades have performed the task of assigning variable descriptions to data from the NLS cohorts. While every effort has been made to maintain consistency, users may find some differences in variable titles. Two primary sources of variation exist in Original Cohort variable title assignment. The first is systematic error in which identical questions may have the same question wording across the four Original Cohorts but slightly different variable titles. The rule before 1995 was to make title consistency within a cohort of highest priority. Starting in 1995, joint fielding forced the archivist to choose one title and cross-reference the other cohort's title in the archivist notes. For this and other technical reasons associated with the switch from PAPI (paper and pencil interviews) to CAPI (computer-assisted personal interviews), the same questions from PAPI and CAPI years may not have identical titles; notes have been added to some codeblocks to assist users in finding these common questions. The second variation is attributed to random error due to spacing or punctuation errors. The sorting process that produces variable title listings usually places these variables near if not next to the series of interest.

How Mode of Interview Affects Question Documentation

There are important differences between the content of telephone and personal interviews. In the late 1960s and early 1970s, most of the interviews were conducted in person, usually at the respondent's home. After the first five years, the decision was made to conduct a major survey every five years and two telephone surveys during the five-year span so that problems of recall could be avoided and contact could be maintained with the respondents.

Differences in what appear to be comparable variables reflect variations in the wording of the question or the fact that the reference period for an identically worded question may be different in a personal versus a telephone interview. Questions that refer to the last five years were usually found in a personal (or five-year) interview. This difference means that some questions were only asked in the five-year surveys and some were asked only in the telephone surveys. Users conducting longitudinal analysis need to change their variable creation procedures to account for the differences in data collection between the early years of uninterrupted personal interviews and subsequent survey years when telephone interviews were used.