Tutorial: Linking Roster Items Across Rounds in the NLSY97

Example: NLSY97 Household Roster

Objective:  Our goal is to link household roster items across survey rounds using unique household member ID codes provided in the NLSY97 data set. This tutorial explains how to find the same member of the respondent's household across different survey rounds in the NLSY97.

Note that the additional information at the end of the tutorial provides some basic guidance on using the same concepts to investigate other rosters in the NLSY97.

Knowledge Assumed: This tutorial assumes that you already know how to use the NLS Web Investigator to create a tag set that saves your variables and to extract data. If you need assistance with the NLS Web Investigator before starting this tutorial, please contact NLS User Services.

Background Reading: To understand how to find the same household member across different survey rounds, you should first know how rosters are created. A detailed discussion is provided in the Types of Variables section of the NLSY97 Users Guide. Additional information about the household roster can be found in the Household Composition section of the guide.

Preview of Steps

  1. Find the household roster variables.
  2. Extract selected roster data.
  3. Compare Unique ID (UID) codes across rounds.


Step 1: Find the household roster variables

The household roster is basically a list of all the people in the respondent's household. It can be pictured as a grid or table, where each person in the household gets a line on the roster which contains their information. For example, a simple round 1 roster might look like this:

Line number Household member name Age
Highest Grade Completed Relationship to Respondent

So, in round 1, all household roster variables labeled "HH Member 01" (or similar terms like Person 01, Line 01, etc.) have information about John, variables labeled "HH Member 02" have information about his father Steve, and so on.

Roster are used not only to organize data in one round but also to link information across rounds. For example, Susan's gender won't change in round 2, but her highest grade completed probably will. So, researchers might need to look at the gender code from one round and the highest grade completed from another round. This requires linking items on a roster across survey rounds using a unique identification number, or UID.

This tutorial explains how to find the same household member across rounds using the UID. The first step is to use NLS Web Investigator to find the household roster variables.

  1. Let's start with round 1. The household roster variables for round 1 have question names starting with "HHI2". In NLS Web Investigator, select the "Question Name (search text)" search. Type "HHI2" into the search box and submit. Now, you’ll see a long list of variables that begin with HHI2.  If you scroll down through the list you'll see various characteristics of household members, such as gender, race/ethnicity, marital status, age, etc.
  2. Tag all the variables that describe characteristics that you're interested in. For this example, please tag highest grade completed (HHI2_HIGHGRADE) for all household members.
  3. Next, tag the unique ID variables (HHI2_UID) for all household members.
  4. You can use the same process to find the corresponding variables from round 2. The round 2 roster variables have question names starting with "HHI", so you'll search for "HHI_" in the NLS Web Investigator. Please make sure you tag the HHI_UID ID variable for all household members, along with HHI_HIGHGRADE for our example.

    Note that searching for question name=HHI_ will return the household roster variables from round 2 and all subsequent rounds, since they all have the same question name.  If you only want round 2, you can do a combined search in which question name=HHI_ and survey year=1998. This will give you a much shorter list to pick from.

Click here to see a list of the variables selected in step 1.

Step 2: Extract the roster data

In step 1, you created a tagset with the unique ID and highest grade completed for the respondent's household members in rounds 1 and 2. Now it's time to run an extract to create a data set and corresponding SAS/SPSS/STATA programs.


Step 3: Compare UID codes between rounds

Now that you have your data set, you're ready to start comparing UID codes. The logic of this is as follows:

  1. We'll start by looking at the second household member in round 1 (since the first member is the respondent, who does not appear on the round 2 roster). The unique identification code for this person is found in the variable HHI2_UID.02.
  2. Then we'll look at each UID variable in round 2, that is HHI_UID.01-HHI_UID.14, and see if one has a value that matches the number in HHI2_UID.02. If we find a match, we know that particular entry on the round 2 roster is the same person as the second entry on the round 1 roster.
  3. We'll record the line number on the round 2 roster in a new variable called "position". There will be a "position" variable for each line number (2-17) on the round 1 household roster, so our position variables will be numbered position2-position17.

Here's sample programming code in SAS, SPSS, or STATA to find round 1 household member 2 on the round 2 household roster. This code adjusts the question names as follows:

  • HHI2_UID.02 (round 1 UID) will be shown as R1uid2
  • HHI_UID.01 (round 2 UID) will be shown as R2uid1
  • position2 will be the newly created variable corresponding to household member #2 in round 1

Suppose after you run your program you find that position2 = 9. You can then look at HHI2_HIGHGRADE.02 in round 1 and HHI_HIGHGRADE.09 in round 2 to see if that person completed an additional year of schooling. Note that if position2 = 0, this means that the person no longer lives in the respondent's household in round 2 or the respondent did not complete a round 2 interview (and all variables will have a value of -5 for that round).

You can use the same code to find the position of round 1 household members #3-17 in round 2. Simply start by creating a new "position" variable for each member (position3, position4, etc.). Then substitute "R1uid3," "R1uid4," etc. for "R1uid2" in the original code.

Additional Information

Note that rosters are used in a number of other sections of the NLSY97 data, such as for:

The techniques described here for using the household roster can be applied to these other rosters as well. Figure 4 in the Types of Variables: Raw, Symbols, Rosters & Created section of the NLSY97 guide shows what rosters are available in each round; the question name can be used to find the various items in the NLS Web Investigator.

Special note about UID codes on the household, nonresident, biochild, bioadoptchild, partners, otherparents, and cumpartners rosters:

These seven rosters are interlinked. People related to/living with the respondent will have the same UID code on all seven rosters. This makes it possible to get information about the same person from more than one roster. For example, assume in our code above that position2 = 0, meaning that the person left the respondent's household between rounds 1 and 2. If this person was related to the respondent, he or she will appear on the nonresident roster in round 2 and can be identified using similar programming code (R2NRuid1 = NONHHI_UID.01 from round 2):

NRposition2 = 0;
if R1uid2 = R2NRuid1, then NRposition2 = 1;
if R1uid2 = R2NRuid2, then NRposition2 = 2;
if R1uid2 = R2NRuid3, then NRposition2 = 3;
[and so on through R2NRuid22]
SPSS compute NRposition2 = 0
if (R1uid2 = R2NRuid1) NRposition2 = 1
if (R1uid2 = R2NRuid2) NRposition2 = 2
if (R1uid2 = R2NRuid3) NRposition2 = 3
[and so on through R2NRuid22]

gen NRposition2 = 0;
replace NRposition2 = 1 if R1uid2 = = R2NRuid1;
replace NRposition2 = 2 if R1uid2 = = R2NRuid2;
replace NRposition2 = 3 if R1uid2 = = R2NRuid3;
[and so on through R2NRuid22]

The same programming logic applies to household members who may also appear on the biochild/bioadoptchild roster or the various partners rosters.