Types of Variables

Several types of variables are present in the NLSY97 data, including:

Direct (or raw) responses from a questionnaire or other survey instrument.
Symbols and roster items, which are used to guide the interview.
Created variables based on responses to more than one data item. These items are edited for consistency where necessary.
Created variables from data provided on a non-NLS data set.
Variables provided by NORC or an outside organization.

This section will help users understand:

Variable descriptions or titles
Symbols and roster items
Created variables

Important information: Missing values

Survey personnel do not, in general, impute missing values or perform internal consistency checks across waves. Exceptions will be noted.

Variable descriptions or variable titles

Each variable within NLSY97 main file data sets has been assigned an 80-character summary title that serves as the descriptive representation of that variable throughout the hard copy and electronic documentation system. Variable titles are assigned to capture the core content of the variable and to incorporate universe identifiers that specify the subset of respondents for which each variable is relevant within the limitations described below. Some titles indicate the reference periods (e.g., survey year or calendar year) of the variables as well.

Universe identifiers

If two ostensibly identical variables differ only in their respondent universes, the variable title will include a reference to the applicable universe. The appropriate universe will either be appended in parentheses or identified before the variable title.

Example 1: R00029. "R Do Any Work for Pay Last Week? (R Does Not Own Bus/Farm)"
R00030. "R Do Any Work for Pay or Profit Last Week? (R Owns Bus/Farm)"
Example 2: R01075. "Compensation Received (Start <16) EMP 01"
R01803. "Compensation Received (Start 16+) EMP 01"

Important information: Universe identifier conventions

Do not presume that two variables with the same or similar titles necessarily have the same (1) universe of respondents or (2) coding categories or (3) time reference period. While the universe identifier conventions discussed above have been utilized, users are urged to consult the questionnaires for skip patterns and exact time periods for a given variable and to factor in the relevant fielding period(s) for the cohort. In addition, variables with similar content may have completely different titles, depending on the type of variable (raw versus created).

Symbols and roster items

There are two main types of survey variables not necessarily represented by a single item in the questionnaire: symbols and roster items. These items are used by the CAPI system during the interview to organize, display, and store information collected during the interview; to determine which question paths the respondent should follow; and to fill in respondent-specific text in various questions. For example, rather than asking about a respondent's "current employer," the CAPI software fills in the actual employer name reported earlier in the interview. Many of these symbols and roster items are provided in the data set for user reference; researchers should be aware of the differences between the two types and the uses of each.

Symbols

Symbols are variables that are used by the NLSY97 CAPI software to determine the flow of the interview. Symbols may contain real-time information captured during the survey, or they may be created in advance of the interview by survey staff. For example, before the income section for rounds 1-5, the survey program created a symbol that states whether the respondent is independent (Y12!INDEPEN). This symbol is later used to determine whether the youth is asked certain income and asset questions. Similarly, before the survey round starts, survey staff create a symbol indicating the respondent's sex (SYMBOL!KEY!SEX) which is used throughout the interview to make sure that the respondent is asked appropriate questions about sex-specific topics such as pregnancy.

All symbol variables have "Symbols" as their variable type. In general, question names for round 1 symbol variables begin with "KEY!"; symbols in subsequent rounds generally have "SYMBOL!" or "SYMBOL_" to start their question names.

Rosters

The NLSY97 uses rosters in various sections in which information is collected on a number of persons, schools, or employers. Rosters are an important part of the NLSY97 data set. These grids of information help researchers to analyze data in an efficient and accurate way. However, the structure and use of rosters may be somewhat confusing, so it is vital that researchers understand how they are constructed.

A detailed explanation of rosters can be found in Appendix 8: Instrument Rosters.

Important information: Rosters

In addition to the detailed roster discussion in the following paragraphs, another example of a roster can be found in Employment: An Introduction. Although that example pertains specifically to employers, the basic concepts apply to other NLSY97 rosters. Researchers using any roster data may find the example helpful. More information about using specific rosters is found in the various topical sections. Researchers may be particularly interested in:

Created variables

Created variables generally start with "CV_" or "CVC_" in the codebook. The "CV" variables are designated by survey year in the codebook, while the "CVC" variables are created as a "cross round" (XRND) variable, meaning the information used came from the respondent's latest interview, regardless of what survey round it was.

A few created variables have a prefix different from "CV" or "CVC." Sampling weight variables, for instance, have the variable names SAMPLING_WEIGHT and CS_SAMPLING_WEIGHT. Other exceptions to note include the validation variables for rounds 4 and 5, which have question name VALIDR_, and the timing variables (rounds 5 and up) with question names R5_TIM, R6_TIM, R7_TIM, and so forth. In addition, the family process variables constructed by Child Trends (see Appendix 9: Family Process and Adolescent Outcome Measures) have question names beginning with "FP_" in the codebook. In the Event History data, all variables are created (reference numbers for event history variables begin with the letter "E.")

Beginning in round 5 (2001), timing variables were created to measure the length of time a respondent took to complete the entire interview, along with a breakdown of the amount of time taken to complete each main questionnaire section. Each timing variable is tabulated in seconds, with one implied decimal place. Timing data can be found under the "Timing" Area of Interest in the NLS Investigator. In round 7, timing variables were expanded to show the length of time it took to complete subsections. Because of confidentiality concerns with the Welfare Knowledge section, round 7 timings are available only through the geocode release.

In addition to the variables created by CHRR, Child Trends, Inc., an organization involved in the NLSY97 questionnaire design process, has created a number of scales and indexes from several groups of variables described in this section. These scales and indexes are intended to aid researchers in using the various data items relating to attitudes and behaviors. For these variable descriptions, see the Created Variables listings at the beginning of the following sections:

Although these Child Trends created variables are described only briefly in this guide, interested researchers may obtain a detailed discussion of the creation procedures in Appendix 9 of the NLSY97 Codebook Supplement. This document also summarizes statistical analyses of the scales and indexes, as well as related data items, performed by Child Trends researchers. These variables contain the prefix "FP_" in their question names (FP stands for Family Process Measures).

New variables created by researchers

Researchers sometimes use the NLS public datasets to generate a new variable to use in their research. In some cases, researchers like to make that new variable publicly available (through their own data repository) so that it can be easily accessed for follow-up studies. This is permissible as long as researchers are using public NLS data (rather than restricted) and that they make it clear they are the author of the variable rather than the NLS team.