There are six types of variables present in the NLSY79 data. Some are the raw answers provided by the respondent, while others are constructed. Types of variables include:
- Direct (or raw) responses from a questionnaire or other survey instrument
- Edited variables constructed from raw data according to consistent and detailed sets of procedures, such as occupational codes, KEY variables, and so forth
- Constructed variables based on responses to more than one data item, either cross-sectionally or longitudinally, and edited for consistency where necessary, such as variables on the NLSY79 Supplemental Fertility File ("Fertility and Relationship History/Created" area of interest in NLS Investigator)
- Constructed variables from other sources, such as the County & City Data Book information present on the NLSY79 Geocode data files
- Variables provided by an outside organization based on sources not directly available to the user, such as the high school survey and transcript data, scores from the Armed Services Vocational Aptitude Battery, and so forth
- Data collected from or about one universe of respondents reconstructed with a second universe as the unit of observation, such as variables on the NLSY79 Child File
The type of variable impacts:
- the title or variable description naming each variable,
- physical placement of each variable within the codebook, and
- location of a variable within a given area of interest.
Reference numbers
Every variable in the main NLSY79 data files has been assigned a reference number or identifier that determines its relative position within the data file and NLS documentation system. Persons contacting NLS User Services should be prepared to discuss their question or problem in relationship to the reference number(s) of the variable(s) in question.
Important information about data consistency processes
In general, the Center for Human Resource Research (CHRR) does not impute missing values or perform internal consistency checks across waves. Exceptions to this general rule occur when financial support is available, as is the case with the consistency edits performed since 1982 on the NLSY79 fertility data. When bounded interviewing methods are used, responses from the previous interview appear in the text of a question, both to verify that past information and as a point from which to update current information. Bounded interviewing techniques, using data from the Information Sheets or flap items, are intended to impose consistency across waves. Data quality checks most often occur in the process of constructing (1) cumulative and current status variables, such as 'Highest Grade Completed,' and (2) NLSY79 employment-related variables, such as 'Weeks Working in Past Calendar Year,' 'Total Tenure with Employer,' and so forth. More information on NLSY79 instruments can be found in the Survey Instruments section.
Once assigned to variables within the NLSY79 data files, reference numbers remain constant through subsequent revisions of the files. Reference numbers are assigned sequentially, with variables referring to the first survey year having a lower reference number than those variables specific to the second year and so forth.
Occasionally variables are created in a year later than that in which the data were actually collected. These variables are frequently given a reference number with a decimal value that reflects the year in which the actual data were gathered rather than the year the created variable was constructed, for example, R01461.01. Beginning with the 1993 survey, decimals are also used to indicate that more than one variable has been derived from a single question.
Important information about reference numbers
Reference numbers in the main and Geocode data files have traditionally begun with the letter "R." Beginning with the 2000 data release, the work history variables are incorporated with the main data on the same data set. However, these work history variables are assigned reference numbers beginning with "W" for easy identification. Beginning in 2006, government program participation or recipiency variables are assigned reference numbers beginning with "G,", health module variables are assigned reference numbers beginning with "H," and all other variables are assigned reference numbers beginning with "T."
Variable descriptions or variable titles
Each variable within NLSY79 main file data files has been assigned an 80 character summary title that serves as the verbal representation of that variable throughout the documentation.
Variable titles are assigned by CHRR archivists who endeavor, within the limitations described below, to capture the core "content" of the variable and to incorporate within the title:
- "NLS Investigator areas of interest" that facilitate easy identification of related variables,
- "Universe identifiers" that specify the subset of respondents for which each variable is relevant, and
- "Reference periods" that indicate the specific period of time (e.g., survey year, calendar year) to which the data pertain for some variables. Universe identifiers and reference periods are discussed below.
Universe Identifiers. If two ostensibly identical variables differ only in that they refer to different universes, the variable title will include a reference to the applicable universe by either appending in parentheses to each title the appropriate universe (Example 1) or by identifying the universe before the variable title (Example 2).
- Example 1: 'Did R Have Any Job since Last Int? (Unemployed or OLF) (1994)'
- Example 2: 'Female - Number of Children R Has Had since Last Interview'
Reference Periods. Variable descriptions may include a phrase indicating the time period to which the data refer. When a date follows a verbal description of a variable and is preceded by the prepositional phrase "in 19XX," the date identifies the calendar year for which the relevant information was collected.
- Example: 'Received Income from Child Support in 1991?' This 1992 survey question refers to child support payments received in calendar year 1991.
Important information about verifying variable details
Do not presume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s) for the cohort. In addition, variables with similar content may have completely different titles, depending on the type of variable (raw versus created).
Variables with similar content, such as information on respondents' labor force status, may have completely different titles, depending on the type of variable (raw versus created). In addition, such variables may be located within different NLSY79 areas of interest.
- Example 1: 'Employment Status Recode' (ESR), in 1979-98 and 2006, is the created or reconstructed version of the 'Activity Most of Survey Week' raw variable. The 'Activity' variable is derived from the first question of the full series of questions used by the Department of Labor (DOL) to obtain employment status; the title reflects questionnaire content. ESR, on the other hand, reflects the procedure used to recode the 'Activity' variable. This produces a constructed variable for all respondents based upon responses to the 'Activity' question and all other questions used by the DOL to obtain employment status. These other questions serve to qualify and refine employment status beyond the answer to the initial 'Activity' question.
- Example 2: NLSY79 raw fertility variables appear within the various "Children," "Birth Record," or "Birth Record xxxx" areas of interest while edited and constructed versions of these variables appear within the "Fertility and Relationship History/Created" area of interest.
Finally, different archivists, for a period of more than 20 years, have performed the task of assigning variable descriptions to data. While every effort has been made to maintain consistency, users may find some differences in variable title and area of interest assignment.
New variables created by researchers
Researchers sometimes use the NLS public datasets to generate a new variable to use in their research. In some cases, researchers like to make that new variable publicly available (through their own data repository) so that it can be easily accessed for follow-up studies. This is permissible as long as researchers are using public NLS data (rather than restricted) and that they make it clear they are the author of the variable rather than the NLS team.