Tutorial objective and prerequisites
Objective
The goal is to link household roster items across survey rounds using unique household member ID codes provided in the NLSY97 data set. This tutorial explains how to find the same member of the respondent's household across different survey rounds in the NLSY97.
Knowledge assumed
This tutorial assumes that you already know how to use the NLS Investigator to create a tagset that saves your variables and to extract data. If you need assistance with the NLS Investigator before starting this tutorial, please review the Investigator User Guide or contact NLS User Services.
Background reading
To understand how to find the same household member across different survey rounds, you should first know how rosters are created. A detailed discussion is provided in the Types of Variables section of the NLSY97 Users Guide. Additional information about the household roster can be found in the Household Composition section of the guide.
Example: NLSY97 household roster
Preview of steps
- Step 1: Find the household roster variables
- Step 2: Extract selected roster data
- Step 3: Compare Unique ID (UID) codes across rounds
Additional information at the end of the tutorial provides some basic guidance on using the same concepts to investigate other rosters in the NLSY97.
Step 1: Find the household roster variables
The household roster is basically a list of all the people in the respondent's household. It can be pictured as a grid or table, where each person in the household gets a line on the roster which contains their information. For example, a simple round 1 roster might look like this:
Line number | Household member name | Age | Sex | Highest Grade Completed | Relationship to Respondent |
---|---|---|---|---|---|
1 | John | 17 | M | 11 | [Respondent] |
2 | Steve | 48 | M | 16 | Father |
3 | Mary | 46 | F | 16 | Mother |
4 | Susan | 11 | F | 6 | Sister |
In round 1, all household roster variables labeled HH Member 01 (or similar terms like Person 01, Line 01, etc.) have information about John, variables labeled HH Member 02 have information about his father Steve, and so on.
Rosters are used not only to organize data in one round but also to link information across rounds. For example, Susan's gender won't change in round 2, but her highest grade completed probably will. So, researchers might need to look at the gender code from one round and the highest grade completed from another round. This requires linking items on a roster across survey rounds using a unique identification number, or UID.
The following steps will show you how to find the same household member across rounds using the UID. The first step is to use NLS Investigator to find the household roster variables.
- Start with round 1. The household roster variables for round 1 have question names starting with HHI2. In NLS Investigator, select the Question Name (search text) search. Type HHI2 into the search box and submit. Now, you will see a long list of variables that begin with HHI2. If you scroll down through the list you'll see various characteristics of household members, such as gender, race/ethnicity, marital status, age, etc.
- Tag all the variables that describe characteristics that you are interested in. For this example, please tag highest grade completed (HHI2_HIGHGRADE) for all household members.
- Next, tag the unique ID variables (HHI2_UID) for all household members.
- You can use the same process to find the corresponding variables from round 2. The round 2 roster variables have question names starting with HHI, so you will search for HHI_ in the NLS Investigator. Please make sure you tag the HHI_UID ID variable for all household members, along with HHI_HIGHGRADE for our example.
Note that searching for question name=HHI_ will return the household roster variables from round 2 and all subsequent rounds, since they all have the same question name. If you only want round 2, you can do a combined search in which question name=HHI_ and survey year=1998. This will give you a much shorter list to pick from.
Reference Number | Question Name | Variable Title | Year |
---|---|---|---|
R10994.00 | HHI2_HIGHGRADE.01 | HHI2_HIGHGRADE (ROS ITEM) L1 1997 | 1997 |
R10995.00 | HHI2_HIGHGRADE.02 | HHI2_HIGHGRADE (ROS ITEM) L2 1997 | 1997 |
R10996.00 | HHI2_HIGHGRADE.03 | HHI2_HIGHGRADE (ROS ITEM) L3 1997 | 1997 |
R10997.00 | HHI2_HIGHGRADE.04 | HHI2_HIGHGRADE (ROS ITEM) L4 1997 | 1997 |
R10998.00 | HHI2_HIGHGRADE.05 | HHI2_HIGHGRADE (ROS ITEM) L5 1997 | 1997 |
R10999.00 | HHI2_HIGHGRADE.06 | HHI2_HIGHGRADE (ROS ITEM) L6 1997 | 1997 |
R11000.00 | HHI2_HIGHGRADE.07 | HHI2_HIGHGRADE (ROS ITEM) L7 1997 | 1997 |
R11001.00 | HHI2_HIGHGRADE.08 | HHI2_HIGHGRADE (ROS ITEM) L8 1997 | 1997 |
R11002.00 | HHI2_HIGHGRADE.09 | HHI2_HIGHGRADE (ROS ITEM) L9 1997 | 1997 |
R11003.00 | HHI2_HIGHGRADE.10 | HHI2_HIGHGRADE (ROS ITEM) L10 1997 | 1997 |
R11004.00 | HHI2_HIGHGRADE.11 | HHI2_HIGHGRADE (ROS ITEM) L11 1997 | 1997 |
R11005.00 | HHI2_HIGHGRADE.12 | HHI2_HIGHGRADE (ROS ITEM) L12 1997 | 1997 |
R11006.00 | HHI2_HIGHGRADE.13 | HHI2_HIGHGRADE (ROS ITEM) L13 1997 | 1997 |
R11007.00 | HHI2_HIGHGRADE.14 | HHI2_HIGHGRADE (ROS ITEM) L14 1997 | 1997 |
R11008.00 | HHI2_HIGHGRADE.15 | HHI2_HIGHGRADE (ROS ITEM) L15 1997 | 1997 |
R11009.00 | HHI2_HIGHGRADE.16 | HHI2_HIGHGRADE (ROS ITEM) L16 1997 | 1997 |
R11621.00 | HHI2_UID.01 | HHI2_UID (ROS ITEM) L1 1997 | 1997 |
R11622.00 | HHI2_UID.02 | HHI2_UID (ROS ITEM) L2 1997 | 1997 |
R11623.00 | HHI2_UID.03 | HHI2_UID (ROS ITEM) L3 1997 | 1997 |
R11624.00 | HHI2_UID.04 | HHI2_UID (ROS ITEM) L4 1997 | 1997 |
R11625.00 | HHI2_UID.05 | HHI2_UID (ROS ITEM) L5 1997 | 1997 |
R11626.00 | HHI2_UID.06 | HHI2_UID (ROS ITEM) L6 1997 | 1997 |
R11627.00 | HHI2_UID.07 | HHI2_UID (ROS ITEM) L7 1997 | 1997 |
R11628.00 | HHI2_UID.08 | HHI2_UID (ROS ITEM) L8 1997 | 1997 |
R11629.00 | HHI2_UID.09 | HHI2_UID (ROS ITEM) L9 1997 | 1997 |
R11630.00 | HHI2_UID.10 | HHI2_UID (ROS ITEM) L10 1997 | 1997 |
R11631.00 | HHI2_UID.11 | HHI2_UID (ROS ITEM) L11 1997 | 1997 |
R11632.00 | HHI2_UID.12 | HHI2_UID (ROS ITEM) L12 1997 | 1997 |
R11633.00 | HHI2_UID.13 | HHI2_UID (ROS ITEM) L13 1997 | 1997 |
R11634.00 | HHI2_UID.14 | HHI2_UID (ROS ITEM) L14 1997 | 1997 |
R11635.00 | HHI2_UID.15 | HHI2_UID (ROS ITEM) L15 1997 | 1997 |
R11636.00 | HHI2_UID.16 | HHI2_UID (ROS ITEM) L16 1997 | 1997 |
R11636.01 | HHI2_UID.17 | HHI2_UID (ROS ITEM) L17 1997 | 1997 |
R24079.00 | HHI_HIGHGRADE.01 | HHI HIGHGRADE (ROS ITEM) L1 1998 | 1998 |
R24080.00 | HHI_HIGHGRADE.02 | HHI HIGHGRADE (ROS ITEM) L2 1998 | 1998 |
R24081.00 | HHI_HIGHGRADE.03 | HHI HIGHGRADE (ROS ITEM) L3 1998 | 1998 |
R24082.00 | HHI_HIGHGRADE.04 | HHI HIGHGRADE (ROS ITEM) L4 1998 | 1998 |
R24083.00 | HHI_HIGHGRADE.05 | HHI HIGHGRADE (ROS ITEM) L5 1998 | 1998 |
R24084.00 | HHI_HIGHGRADE.06 | HHI HIGHGRADE (ROS ITEM) L6 1998 | 1998 |
R24085.00 | HHI_HIGHGRADE.07 | HHI HIGHGRADE (ROS ITEM) L7 1998 | 1998 |
R24086.00 | HHI_HIGHGRADE.08 | HHI HIGHGRADE (ROS ITEM) L8 1998 | 1998 |
R24087.00 | HHI_HIGHGRADE.09 | HHI HIGHGRADE (ROS ITEM) L9 1998 | 1998 |
R24088.00 | HHI_HIGHGRADE.10 | HHI HIGHGRADE (ROS ITEM) L10 1998 | 1998 |
R24089.00 | HHI_HIGHGRADE.11 | HHI HIGHGRADE (ROS ITEM) L11 1998 | 1998 |
R24090.00 | HHI_HIGHGRADE.12 | HHI HIGHGRADE (ROS ITEM) L12 1998 | 1998 |
R24091.00 | HHI_HIGHGRADE.13 | HHI HIGHGRADE (ROS ITEM) L13 1998 | 1998 |
R24092.00 | HHI_HIGHGRADE.14 | HHI HIGHGRADE (ROS ITEM) L14 1998 | 1998 |
R24093.00 | HHI_UID.01 | HHI UNIQUE ID (ROS ITEM) L1 1998 | 1998 |
R24094.00 | HHI_UID.02 | HHI UNIQUE ID (ROS ITEM) L2 1998 | 1998 |
R24095.00 | HHI_UID.03 | HHI UNIQUE ID (ROS ITEM) L3 1998 | 1998 |
R24096.00 | HHI_UID.04 | HHI UNIQUE ID (ROS ITEM) L4 1998 | 1998 |
R24097.00 | HHI_UID.05 | HHI UNIQUE ID (ROS ITEM) L5 1998 | 1998 |
R24098.00 | HHI_UID.06 | HHI UNIQUE ID (ROS ITEM) L6 1998 | 1998 |
R24099.00 | HHI_UID.07 | HHI UNIQUE ID (ROS ITEM) L7 1998 | 1998 |
R24100.00 | HHI_UID.08 | HHI UNIQUE ID (ROS ITEM) L8 1998 | 1998 |
R24101.00 | HHI_UID.09 | HHI UNIQUE ID (ROS ITEM) L9 1998 | 1998 |
R24102.00 | HHI_UID.10 | HHI UNIQUE ID (ROS ITEM) L10 1998 | 1998 |
R24103.00 | HHI_UID.11 | HHI UNIQUE ID (ROS ITEM) L11 1998 | 1998 |
R24104.00 | HHI_UID.12 | HHI UNIQUE ID (ROS ITEM) L12 1998 | 1998 |
R24105.00 | HHI_UID.13 | HHI UNIQUE ID (ROS ITEM) L13 1998 | 1998 |
R24106.00 | HHI_UID.14 | HHI UNIQUE ID (ROS ITEM) L14 1998 | 1998 |
Step 2: Extract selected roster data
In Step 1, you created a tagset with the Unique ID (UID) and Highest Grade Completed for the respondent's household members in rounds 1 and 2. In this step, you will run the extract process to create a data set and corresponding SAS/SPSS/STATA/R programs.
- Click on the Save/Down Tab in the NLS Investigator.
- Choose either the Basic Download Tab or the Advanced Download Tab.
- Basic downloads include: Tagset, SAS/SPSS/STATA files, Codebook, and Comma-delimited data file.
- Advanced downloads include: Tagset, SAS/SPSS/STATA/R, Codebook, Short Description file, Comma-delimited data file, and the ability to create frequency tables or apply universe restrictors.
- To review the download process, visit the Save/Download Tab section of the Investigator User Guide.
- After you have chosen the Basic or Advanced options, assign a filename and click the download button to process your variable request.
- Once your request has been processed, the files will be available in the Manage Downloads Tab for you to access.
Importing SAS/SPSS/STATA/R files
Instructions for loading files into your statistics software can be found in the Importing Data section of the Investigator User Guide or view our video How to Import NLS Data into Statistical Software.
Step 3: Compare Unique ID (UID) codes across rounds
Now that you have your data set, you are ready to start comparing UID codes. The logic of this is as follows:
- Start by looking at the second household member in round 1 (since the first member is the respondent, who does not appear on the round 2 roster). The unique identification code for this person is found in the variable HHI2_UID.02.
- Next, look at each UID variable in round 2, that is HHI_UID.01-HHI_UID.14, and see if one has a value that matches the number in HHI2_UID.02. If there is a match, then that particular entry on the round 2 roster is the same person as the second entry on the round 1 roster.
- Record the line number on the round 2 roster in a new variable called position. There will be a position variable for each line number (2-17) on the round 1 household roster, so our position variables will be numbered position2-position17.
Click below for sample programming code in SAS, SPSS, and STATA to find round 1 household member 2 on the round 2 household roster.
position2 = 0; if R1uid2 = R2uid1, then position2 = 1; if R1uid2 = R2uid2, then position2 = 2; if R1uid2 = R2uid3, then position2 = 3; if R1uid2 = R2uid4, then position2 = 4; if R1uid2 = R2uid5, then position2 = 5; if R1uid2 = R2uid6, then position2 = 6; if R1uid2 = R2uid7, then position2 = 7; if R1uid2 = R2uid8, then position2 = 8; if R1uid2 = R2uid9, then position2 = 9; if R1uid2 = R2uid10, then position2 = 10; if R1uid2 = R2uid11, then position2 = 11; if R1uid2 = R2uid12, then position2 = 12; if R1uid2 = R2uid13, then position2 = 13; if R1uid2 = R2uid14, then position2 = 14;
Open SAS sample code in a separate browser window
Note that more experienced SAS programmers can use arrays to achieve the same result.
compute position2 = 0 if (R1uid2 = R2uid1) position2 = 1 if (R1uid2 = R2uid2) position2 = 2 if (R1uid2 = R2uid3) position2 = 3 if (R1uid2 = R2uid4) position2 = 4 if (R1uid2 = R2uid5) position2 = 5 if (R1uid2 = R2uid6) position2 = 6 if (R1uid2 = R2uid7) position2 = 7 if (R1uid2 = R2uid8) position2 = 8 if (R1uid2 = R2uid9) position2 = 9 if (R1uid2 = R2uid10) position2 = 10 if (R1uid2 = R2uid11) position2 = 11 if (R1uid2 = R2uid12) position2 = 12 if (R1uid2 = R2uid13) position2 = 13 if (R1uid2 = R2uid14) position2 = 14
gen position2 =0; replace position2 =1 if R1uid2 = = R2uid1; replace position2 =2 if R1uid2 = = R2uid2; replace position2 =3 if R1uid2 = = R2uid3; replace position2 =4 if R1uid2 = = R2uid4; replace position2 =5 if R1uid2 = = R2uid5; replace position2 =6 if R1uid2 = = R2uid6; replace position2 =7 if R1uid2 = = R2uid7; replace position2 =8 if R1uid2 = = R2uid8; replace position2 =9 if R1uid2 = = R2uid9; replace position2 =10 if R1uid2 = = R2uid10; replace position2 =11 if R1uid2 = = R2uid11; replace position2 =12 if R1uid2 = = R2uid12; replace position2 =13 if R1uid2 = = R2uid13; replace position2 =14 if R1uid2 = = R2uid14;
The sample code adjusts the question names as follows:
- HHI2_UID.02 (round 1 UID) will be shown as R1uid2
- HHI_UID.01 (round 2 UID) will be shown as R2uid1
- position2 will be the newly created variable corresponding to household member 2 in round 1
Suppose after you run your program you find that position2 = 9. You can then look at HHI2_HIGHGRADE.02 in round 1 and HHI_HIGHGRADE.09 in round 2 to see if that person completed an additional year of schooling.
- If position2 = 0, this means that the person no longer lives in the respondent's household in round 2 or the respondent did not complete a round 2 interview (and all variables will have a value of -5 for that round).
Use the same code to find the position of round 1 household members #3-17 in round 2. Simply start by creating a new position variable for each member (position3, position4, etc.). Then substitute R1uid3, R1uid4, etc. for R1uid2 in the original code.
Additional information
Rosters are used in a number of other sections of the NLSY97 data
- Children: Marital History, Children & Fertility
- Non-resident relatives of the respondent: Characteristics of Non-Residential Relatives
- Spouses and partners: Marital History, Children & Fertility
- Employers: Employment
- Schools: Education
The techniques described here for using the household roster can be applied to these other rosters as well. Figure 4 in the Types of Variables: Raw, Symbols, Rosters & Created section of the NLSY97 guide shows what rosters are available in each round; the question name can be used to find the various items in the NLS Investigator.
Special note about UID codes and rosters
The household, nonresident, biochild, bioadoptchild, partners, otherparents, and cumpartners rosters are interlinked. People related to/living with the respondent will have the same UID code on all seven rosters. This makes it possible to get information about the same person from more than one roster. For example, assume in our code above that position2 = 0, meaning that the person left the respondent's household between rounds 1 and 2. If this person was related to the respondent, he or she will appear on the nonresident roster in round 2 and can be identified using similar programming code (R2NRuid1 = NONHHI_UID.01 from round 2):
SAS |
NRposition2 = 0; if R1uid2 = R2NRuid1, then NRposition2 = 1; if R1uid2 = R2NRuid2, then NRposition2 = 2; if R1uid2 = R2NRuid3, then NRposition2 = 3; [and so on through R2NRuid22] |
---|---|
SPSS |
compute NRposition2 = 0 if (R1uid2 = R2NRuid1) NRposition2 = 1 if (R1uid2 = R2NRuid2) NRposition2 = 2 if (R1uid2 = R2NRuid3) NRposition2 = 3 [and so on through R2NRuid22] |
STATA |
gen NRposition2 = 0; replace NRposition2 = 1 if R1uid2 = = R2NRuid1; replace NRposition2 = 2 if R1uid2 = = R2NRuid2; replace NRposition2 = 3 if R1uid2 = = R2NRuid3; [and so on through R2NRuid22] |
The same programming logic applies to household members who may also appear on the biochild/bioadoptchild roster or the various partners rosters.