Variable Search in the NLS Investigator

Tutorial objective and prerequisites

  1. Part 1: Common search problems and tips for resolving them
  2. Part 2: How to systematically look for information on a specific research topic in the NLS documentation, and then find these variables via the NLS Investigator


This tutorial has two parts. The first gives search hints for finding variables on a particular topic of interest in the NLS Investigator. The second shows how one can research information on particular topics using available NLS documentation and then locate the variables in the NLS Investigator.

Knowledge assumed

This tutorial assumes that you already know how to use the NLS Investigator to search for variables. If you need assistance with the NLS Investigator before starting this tutorial, please review the Investigator User Guide.

Background information about searching in the NLS Investigator:

Eight search methods are available in the NLS Investigator, which can be combined using and/or logic operators:

  • Area of Interest: the main research area of the variable, helps narrow down variables to a particular topical area
  • Word in Title: any word that appears in the variable title
  • Question Text: search for any word that appears in the text of the question. Can also type in a portion of a word
  • Question Name: a code assigned to identify each question, which gives the location of the variable in the questionnaire or identifies it as a created variable. When possible, question names remain the same for the same question across survey years.
  • Reference Number: abbreviated RNUM, a letter and number combination uniquely assigned to each variable
  • Survey Year: restrict your search to a particular survey interview or group of survey interviews
  • Codebook: search for any text appearing anywhere on the codebook page
  • Variable Type: pick a class of variable such as created variables or roster variables (NLSY97 only)

Background information about key NLS documentation:

The NLS surveys contain extensive documentation. Each cohort contains its own set of documents. Here are three types of documentation that are most useful when trying to figure out what has been collected in the surveys.

  1. Cohort-specific User's Guides are always a great place to start, as they contain a topic-by-topic guide to the surveys, and describe the majority of topics covered in the survey.
  2. Asterisk Tables show topics covered and rounds in which the topics are covered.
  3. Questionnaires show the questions asked of the respondents. They include universe restrictions (explanation of which respondents get asked certain questions) and skip patterns (flow of questions a given respondent will get during the survey depending on his answers and characteristics).
    • Questionnaires can be accessed in the Other Documentation section of each cohort's User's Guide

Part 1: Common search problems and tips for resolving them

I can't seem to find variables on my topic of interest.

Try one or more of the following:

  • Use the "Word in Title" "pick from list" option rather than the "enter search term" option. The drop-down list includes every word that appears in a question title.
  • When searching using "Word in Title," use a related word for the same concept, like "drink" instead of "alcohol" or "smoking" instead of "cigarettes."
  • When searching using "Word in Title, enter just the first part of the word in the "enter search term" box. Entering "smok" will find variable titles that contain "smoke," "smoking," or "smoked."
  • Use the "Codebook Search" instead of just searching on "Word in Title." It looks for a word anywhere on the codebook page, including the question text and answer categories.
  • After finding one variable related to your research, use the "Reference Number" search option to see if related variables appear next to it in the data.
  • Note that some variables are not linked to a specific survey year, but instead are classified as "Survey Year" equals "XRND" (cross-round). XRND variables often present cumulative information for a respondent (such as highest grade completed in the NLSY97) or group age-related information (such as the series on health status at age 40 in the NLSY79).
  • Get familiar with "Areas of Interest," as shown in the NLS User's Guides and used as a search option in the NLS Investigator.
  • If the "Variable Preference Level" is set to "show primary only" (the default), change the setting to "show all."

My search returned too many variables, how can I get a smaller, more targeted list?

Try one or more of the following:

  • Limit your search to just one or two survey years. Variables often appear in multiple rounds of the surveys. If a variable is repeated, it will have the same question name (in the NLSY97) and variable title (in all NLS cohorts). Once you find the variables you need, you can search for the "Question Name" or a few unique words using the "Word in Title" search option to find the variable in more survey years.
  • Sort the search results by clicking on the heading at the top of each column. If you sort by question name or variable title, it will be easier to find sets of variables that repeat across survey years.
  • Limit your search to just the first loop. Use the search option "Question Name" "enter search term" "contains" ".01". Once you find the variables you need, you can use the question name to get the rest of the loops more quickly.
  • Use the "enter search term" option instead of "pick from list" in a particular search. Then you can enter a whole phrase (for example, "hours worked," "past calendar year") instead of entering one word at a time. It will limit your results to variables that have the exact phrase.
  • Make use of "Not" searches, in which you can exclude rather than include the desired term. Depending on the type of search, the phrase might be "doesn't equal" or "doesn't contain", etc. This can allow you to quickly exclude chunks of variables.

How can I be sure my topic is really NOT in the data set?

The best way to demonstrate is with a couple of examples:

Does the NLSY97 contain questions about respondents' participation in high school athletics?

  1. First look in the NLSY97 Asterisk Tables, NLSY97 User's Guide Topical Guide, as well as the first few rounds of NLSY97 Questionnaires (when the respondents were still in high school). In this example, there is no information about "high school athletics" in these sources. Continue to the next recommendation for your search.
  2. Next try searching in the NLS Investigator using words such as "athletic" and other synonyms such as "sport" and "football." Use the singular, or part of a word to be more inclusive. Use the "Codebook" search option with OR, or enter each in separate searches. Sometimes the list of NLSY97 variables you get back is long, and you have to scroll down to make sure none of the questions apply to your topic of interest. Once again, there is no information about "high school athletics" in this option. Continue to the next recommendation for your search.
  3. Next try "Codebook" "contains" "extra" AND "Codebook" "contains" "curricular" in the NLS Investigator. Once again, there are no results for this alternative search option.

At this point, you would be able to conclude that the NLSY97 does not ask questions about respondents' participation in high school sports or any extra-curricular activities for that matter. If you still have questions about whether a cohort was asked about a specific topic, please contact NLS User Services.

Does the NLSY97 have any information about respondents' personality type?

  1. First look in the NLSY97 User's Guide. In this example, there is no information about "personality type" even in the Attitudes sub-section, where it would potentially fall. Continue to the next recommendation for your search.
  2. Next, look in the Asterisk Tables. Under Section VIII, "Attitudes, Behaviors, and Time Use," we learn that round 6 (2002 survey) contained a sequence of questions about the "Youth's Perception of Own Personality Traits."
  3. Next, view the round 6 questionnaire to see the exact questions. It's not easy to find the questions. They appear in the Self-Administered Questionnaire section and if you search on the word "trait," you find the beginning of the section: YSAQ-282I. If we look through the question sequence, we can see that words such as "agreeable" and "dependable" are in the question text. So we can use those to search for the question sequence in the NLS Investigator, or we can search on QNAME "YSAQ-282" to get the sequence.
  4. Open the NLS Investigator, and search by "Question Name (pick from list)" "starts with" "YSAQ-2" AND "Survey Year" "equals" "2002." If we scroll down, we can see a question sequence about personality traits.

A few more search hints for common search problems.

  • When searching for basic core concepts like demographics, hours worked per year, highest grade completed, do the following:
    • NLSY79: Search on the "Key Variables" "Area of Interest"
    • NLSY79 Child/Young Adult: Search on the "Child Background" "Area of Interest"
    • NLSY97: Search for "Variable Type" "equals" "Created Variables" and "Word in Title." For example, a "Word in Title" search might contain hours or grade.
  • In the NLSY79, the "Birth Record" Area of Interest contains raw survey data about the respondent's children, and the "Fertility and Relationship History/Created" Area of Interest contains created variables that have undergone extensive cleaning and editing by survey staff. We strongly recommend using the latter.

Part 2: How to systematically look for information

Part 2 of this tutorial will show you how to systematically look for information on a specific research topic in the NLS documentation, and then find these variables via the NLS Investigator. To do so, we'll go through a specific example of a research question, and describe how to find the variables of interest using documentation and the NLS Investigator.

Example: Effect of marital status on labor supply (NLSY79)

Preview of steps

  1. Step 1: Construct list of the concepts that you want to measure
  2. Step 2: Check and search
    1. Check documentation to find information for Step 1 concepts
    2. Search for the appropriate variables in the NLS Investigator

Step 1: Construct list of the concepts that you want to measure

  1. Dependent variable
    • Hours worked in a calendar year
  2. Key independent variable
    • Current marital status
  3. Control variables
    • AFQT score
    • Year of birth
    • Race/ethnicity
    • Sex
    • Spouse labor supply
    • Educational attainment
    • Number of children

Step 2: Check and search

  1. Check documentation to find information for Step 1 concepts
  2. Search for the appropriate variables in the NLS Investigator

Note that the NLSY79 User's Guide and Asterisk Table will be particularly helpful here. They provide information on the topics of interest, describe the types of variables that may be available, and point to particular Areas of Interest and other information that will help you search for variables in the NLS Investigator.

  1. Dependent variable
    • Hours worked in a calendar year:
      1. Read the entry on Work Experience in the NLSY79 Topical Guide. The created variables list indicates that "number of hours worked in past calendar year" is created for each survey wave. We can search using a few of these words to find this set of variables.
      2. Search using "Word in Title" "contains" "hours" AND "Word in Title" contains "year"; this search yields 28 variables. To further refine the search to get only those variables that we want, ADD "Word in Title" "does not contain" "norc"; this yields 23 variables, 1 for every round, called "Number of hours worked in past calendar year."
  2. Key independent variable
    • Current marital status:
      1. Read the Topical Guide entry on Marital Status, Marital Transitions & Attitudes. This section notes that the "Key Variables" Area of Interest includes created marital status variables for each interview date.
      2. Search using "Word in Title" "contains" "marital status" AND "Area of Interest" "equals" "Key Variables"; this yields 46 variables, two for each round: marital status and marital status (collapsed).
  3. Control variables
    • AFQT score:
      1. Read the Topical Guide entry on Aptitude, Achievement & Intelligence Scores. Table 2 is very helpful. It shows that we should look under the "Profiles" Area of Interest, and even gives the reference numbers to look for. Of course, searching for the initials AFQT could work as well, since we are looking for a particular score.
      2. Search using "Word in Title" "contains" "AFQT"; this yields 3 scores from which to choose.
    • Year of birth:
      1. See the Asterisk Table or the Topical Guide entry on Age. In the latter, Table 1 shows us that "Date of Birth of R" is available in 1979 and 1981, and provides the reference numbers (R00003.00, R00005.00, R04101.00, R04103.00)
      2. Use the "Reference Number" "starts with" search to obtain the four birth date variables.
    • Race/ethnicity:
      • Already pre-selected in the NLS Investigator
    • Sex:
      • Already pre-selected in the NLS Investigator
    • Spouse labor supply:
      1. See the Asterisk Table or the Topical Guide entry on Marital Status, Marital Transitions & Attitudes. In the latter, we see under the topic "Spousal Characteristics" that weeks and hours worked in the past calendar year are two of the available variables.
      2. Choose "Word in Title" "contains" "spouse," "Area of Interest" "equals" "marriage," and "Word in Title" contains "worked"; this yields 143 variables, including number of weeks worked in past calendar year and hours per week worked during weeks worked by spouse.
    • Educational attainment:
      1. See the Topical Guide entry on Educational Attainment and School Enrollment. In the "data files" section, we find out that a yearly created variable for highest grade completed is found in the "Key Variables" Area of Interest.
      2. Choose "Word in Title" "contains" "highest," "Word in Title" "contains" "grade," and "Area of Interest" "equals" "key variables"; this yields 46 variables, two for each round. These variables are "Highest grade completed as of May 1 of survey year" and a revised version of the same variable. Use the revised one, as it was recleaned and made consistent over the years by survey staff.
    • Number of children:
      1. See the Asterisk Table or the Topical Guide entry on Fertility. We see there are variables on this topic, and that the "Fertility and Relationship History/Created" Area of Interest might be a good place to look.
      2. Choose "Word in Title" "contains" "number," "Word in Title" "contains" "children," and "Area of Interest" "equals" "fertility and relationship history/created"; this yields 62 variables, including number of children ever born and number of bio/step/adopted children in household.

Using the NLS Investigator

Once your variable search is complete, create a tagset of your specific variables and then extract the data set using the Save / Download Tab in the NLS Investigator.