Tutorial: Variable Search in the NLS Investigator

B. How to systematically look for information on a specific research topic in the NLS documentation, and then find these variables via the NLS Investigator

Now we'll go through a specific example of a research question, and describe how to find the variables of interest using documentation and the NLS Investigator.

Effect of Marital Status on Labor Supply (NLSY79):

Preview of Steps

  1. Construct list of the concepts that you want to measure
  2. a.  Check documentation to find information on concepts from step 1
    b.  Search for the appropriate variables in the NLS Investigator

Step 1: Construct list of the concepts that you want to measure

  1. Dependent Variable
    Hours worked in a calendar year
  2. Key Independent Variable
    Current marital status
  3. Control Variables
    AFQT score
    Year of birth
    Race/ethnicity
    Sex
    Spouse labor supply
    Educational attainment
    Number of children

Step 2:

  1. Check documentation to find information on concepts from step 1
  2. Search for the appropriate variables in the NLS Investigator

Note that the NLSY79 User's Guide and Asterisk Table will be particularly helpful here. They provide information on the topics of interest, describe the types of variables that may be available, and point to particular Areas of Interest and other information that will help you search for variables in the NLS Investigator.

A. Dependent Variable: Hours worked in a calendar year

  1. Read the entry on Work Experience in the NLSY79 Topical Guide. The created variables list indicates that "number of hours worked in past calendar year" is created for each survey wave. We can search using a few of these words to find this set of variables.
  2. Search using "Word in Title" "contains" "hours" AND "Word in Title" contains "year"; this search yields 28 variables. To further refine the search to get only those variables that we want, ADD "Word in Title" "does not contain" "norc"; this yields 23 variables, 1 for every round, called "Number of hours worked in past calendar year."

B. Key Independent Variable: Current marital status

  1. Read the Topical Guide entry on Marital Status, Marital Transitions & Attitudes. This section notes that the "Key Variables" Area of Interest includes created marital status variables for each interview date.
  2. Search using "Word in Title" "contains" "marital status" AND "Area of Interest" "equals" "Key Variables"; this yields 46 variables, two for each round: marital status and marital status (collapsed).

C. Control Variables

AFQT score:

  1. Read the Topical Guide entry on Aptitude, Achievement & Intelligence Scores. Table 2 is very helpful. It shows that we should look under the "Profiles" Area of Interest, and even gives the reference numbers to look for. Of course, searching for the initials AFQT could work as well, since we are looking for a particular score.
  2. Search using "Word in Title" "contains" "AFQT"; this yields 3 scores from which to choose.

Year of birth:

  1. See the Asterisk Table or the Topical Guide entry on Age. In the latter, Table 1 shows us that "Date of Birth of R" is available in 1979 and 1981, and provides the reference numbers (R00003.00, R00005.00, R04101.00, R04103.00)
  2. Use the "Reference Number" "starts with" search to obtain the four birth date variables.

Race/ethnicity:

a/b. Already pre-selected in NLS Investigator

Sex:

a/b. Already pre-selected in NLS Investigator

Spouse labor supply:

  1. See the Asterisk Table or the Topical Guide entry on Marital Status, Marital Transitions & Attitudes. In the latter, we see under the topic "Spousal Characteristics" that weeks and hours worked in the past calendar year are two of the available variables.
  2. Choose "Word in Title" "contains" "spouse," "Area of Interest" "equals" "marriage," and "Word in Title" contains "worked"; this yields 143 variables, including number of weeks worked in past calendar year and hours per week worked during weeks worked by spouse.

Educational attainment:

  1. See the Topical Guide entry on Educational Attainment and School Enrollment. In the "data files" section, we find out that a yearly created variable for highest grade completed is found in the "Key Variables" Area of Interest.
  2. Choose "Word in Title" "contains" "highest," "Word in Title" "contains" "grade," and "Area of Interest" "equals" "key variables"; this yields 46 variables, two for each round. These variables are "Highest grade completed as of May 1 of survey year" and a revised version of the same variable. Use the revised one, as it was recleaned and made consistent over the years by survey staff.

Number of children:

  1. See the Asterisk Table or the Topical Guide entry on Fertility. We see there are variables on this topic, and that the "Fertility and Relationship History/Created" Area of Interest might be a good place to look.
  2. Choose "Word in Title" "contains" "number," "Word in Title" "contains" "children," and "Area of Interest" "equals" "fertility and relationship history/created"; this yields 62 variables, including number of children ever born and number of bio/step/adopted children in household.

Additional Information

Next, you would create a tagset of your specific variables and then extract your data set.