Tutorial: Variable Search in the NLS Investigator

Objective: This tutorial has two parts. The first gives search hints for finding variables on a particular topic of interest in the NLS Investigator. The second shows how one can research information on particular topics using available NLS documentation and then locate the variables in the NLS Investigator.

  1. Common search problems and tips for resolving them
  2. How to systematically look for information on a specific research topic in the NLS documentation, and then find these variables via the NLS Investigator

Knowledge Assumed:  This tutorial assumes that you already know how to use the NLS Investigator to search for variables. If you need assistance with the NLS Investigator before starting this tutorial, please see "How to use the NLS Investigator."

Background Information about Searching in the NLS Investigator:

Eight search methods are available in the NLS Investigator, which can be combined using and/or logic operators:

  1. Area of Interest: the main research area of the variable, helps narrow down variables to a particular topical area
  2. Word in Title: any word that appears in the variable title
  3. Question Text: search for any word that appears in the text of the question. Can also type in portion of a word
  4. Question Name: a code assigned to identify each question, which gives the location of the variable in the questionnaire or identifies it as a created variable. When possible, question names remain the same for the same question across survey years.
  5. Reference Number: abbreviated RNUM, a letter and number combination uniquely assigned to each variable
  6. Survey Year: restrict your search to a particular survey interview or group of survey interviews
  7. Codebook: search for any text appearing anywhere on the codebook page
  8. Variable Type: pick a class of variable such as created variables or roster variables (NLSY97 only)

Background Information about Key NLS Documentation:

The NLS surveys contain extensive documentation. Each cohort contains its own set of documents. Here is a list of three types of documentation that are most useful to you when trying to figure out what has been collected in the surveys.

  1. The cohort-specific User's Guides are always a great place to start, as they contain a topic-by-topic guide to the surveys, and describe the majority of topics covered in the survey. [Cohort-specific User's Guides: NLSY97, NLSY79, NLSY79 Child/Young Adult, NLS Mature and Young Women, NLS Older and Young Men]
  2. Asterisk Tables show topics covered and rounds in which the topics are covered. [Cohort-specific Asterisk Tables: NLSY97, NLSY79, NLSY79 Child/Young Adult, NLS Mature and Young Women, NLS Older and Young Men]
  3. Questionnaires show the questions asked of the respondents. They include universe restrictions (explanation of which respondents get asked certain questions) and skip patterns (flow of questions a given respondent will get during the survey depending on his answers and characteristics). [Links to the questionnaires can be found in each cohort's User's Guide under the Other Documentation link.]


A. Common search problems and tips for resolving them

1. I can't seem to find variables on my topic of interest.

Try one or more of the following:

  • Use the "Word in Title" "pick from list" option rather than the "enter search term" option. The drop-down list includes every word that appears in a question title.
  • When searching using "Word in Title," use a related word for the same concept, like "drink" instead of "alcohol" or "smoking" instead of "cigarettes."
  • When searching using "Word in Title, enter just the first part of the word in the "enter search term" box. Entering "smok" will find variable titles that contain "smoke," "smoking," or "smoked."
  • Use the "Codebook Search" instead of just searching on "Word in Title." It looks for a word anywhere on the codebook page, including the question text and answer categories.
  • After finding one variable related to your research, use the "Reference Number" search option to see if related variables appear next to it in the data.
  • Note that some variables are not linked to a specific survey year, but instead are classified as "Survey Year" equals "XRND" (cross-round). XRND variables often present cumulative information for a respondent (such as highest grade completed in the NLSY97) or group age-related information (such as the series on health status at age 40 in the NLSY79).
  • Get familiar with "Areas of Interest," as shown in the NLS User's Guides and used as a search option in the NLS Investigator.
  • If the "Variable Preference Level" is set to "show primary only" (the default), change the setting to "show all."

2. My search returned too many variables, how can I get a smaller, more targeted list?

Try one or more of the following:

  • Limit your search to just one or two survey years. Variables often appear in multiple rounds of the surveys. If a variable is repeated, it will have the same question name (in the NLSY97) and variable title (in all NLS cohorts). Once you find the variables you need, you can search for the "Question Name" or a few unique words using the "Word in Title" search option to find the variable in more survey years.
  • Sort the search results by clicking on the heading at the top of each column. If you sort by question name or variable title, it will be easier to find sets of variables that repeat across survey years.
  • Limit your search to just the first loop. Use the search option "Question Name" "enter search term" "contains" ".01". Once you find the variables you need, you can use the question name to get the rest of the loops more quickly.
  • Use the "enter search term" option instead of "pick from list" in a particular search. Then you can enter a whole phrase (for example, "hours worked," "past calendar year") instead of entering one word at a time. It will limit your results to variables that have the exact phrase.
  • Make use of "Not" searches, in which you can exclude rather than include the desired term. Depending on the type of search, the phrase might be "doesn't equal" or "doesn't contain", etc. This can allow you to quickly exclude chunks of variables.

3. How can I be sure my topic is really NOT in the data set?

The best way to demonstrate is with a couple of examples:

A. Does the NLSY97 contain questions about respondents' participation in high school athletics?

  • First look in the NLSY97 Asterisk Tables, NLSY97 User's Guide Topical Guide, as well as the first few rounds of NLSY97 questionnaires (when the respondents were still in high school). Nothing.
  • Now let's try searching in the NLS Investigator using words such as "athletic" and other synonyms such as "sport" and "football." Use the singular, or part of a word to be more inclusive. Use the "Codebook" search option with OR, or enter each in separate searches. Sometimes the list of NLSY97 variables you get back is long, and you have to scroll down to make sure none of the questions apply to your topic of interest. Nothing.
  • Now let's try "Codebook" "contains" "extra" AND "Codebook" "contains" "curricular." Nothing, again.
  • The NLSY97 does not ask questions about respondents' participation in high school sports or any extra-curricular activities for that matter.

B. Does the NLSY97 have any information about respondents' personality type?

  • First, let's look in the NLSY97 User's Guide. Nothing--even in the Attitudes sub-section, where it would potentially fall.
  • Next, let's look in the Asterisk Tables. Yes! Under Section VIII, "Attitudes, Behaviors, and Time Use," we learn that round 6 (2002 survey) contained a sequence of questions about the "Youth's Perception of Own Personality Traits."
  • Now let's look in the round 6 questionnaire to see the exact questions. It's not easy to find the questions. They appear in the Self-Administered Questionnaire section and if you search on the word "trait," you find the beginning of the section: YSAQ-282I. If we look through the question sequence, we can see that words such as "agreeable" and "dependable" are in the question text. So we can use those to search for the question sequence in the NLS Investigator, or we can search on QNAME "YSAQ-282" to get the sequence.
  • Next, go into the NLS Investigator, and search by "Question Name (pick from list)" "starts with" "YSAQ-2" AND "Survey Year" "equals" "2002." If we scroll down, we can see a question sequence about personality traits.

4. A few more search hints for common search problems.

  • When searching for basic core concepts like demographics, hours worked per year, highest grade completed, do the following:

    • NLSY79: Search on the "Key Variables" "Area of Interest"
    • NLSY79 Child/Young Adult: Search on the "Child Background" "Area of Interest"
    • NLSY97: Search for "Variable Type" "equals" "Created Variables" and "Word in Title" is, for example, hours or grade

  • In the NLSY79, the "Birth Record" Area of Interest contains raw survey data about the respondent's children, and the "Fertility and Relationship History/Created" Area of Interest contains created variables that have undergone extensive cleaning and editing by survey staff. We strongly recommend using the latter.