Skip to main content

Linking NLSY79 Mothers and Their Children

Tutorial objective and prerequisites

Objective

The goal is to link NLSY79 mothers with their children. This tutorial explains the general logic to link mothers and children of any age covered in the Children of the NLSY79. The tutorial then gives a specific example of using data on mothers and young adult daughters by creating two variables: (1) whether the mother had a first birth prior to age 18, and (2) whether the daughter had a first birth prior to age 18. This allows one to examine intergenerational correlations in teenage childbearing.

Knowledge assumed

This tutorial assumes that you already know how to use the NLS Investigator to create a tagset that saves your variables and to extract data. If you need assistance with the NLS Investigator before starting this tutorial, please review the Investigator User Guide or contact NLS User Services.

Background reading

To understand how to link mother and child/young adult files, see the NLSY79 Child and Young Adult Users Guide section on Linking Children, Young Adults and Mothers and Appendix E and Appendix F for sample SPSS and SAS programs. In the NLSY79 User's Guide, see the sections on Age and Fertility.

Example: Intergenerational linking of NLSY79 mothers and their young adult daughters

Preview of steps

  1. Step 1: Find and extract the Child/Young Adult variables
  2. Step 2: Find and extract the NLSY79 variables
  3. Step 3: Merge NLSY79 and Child/Young Adult data files and create new fertility variables

Additional information provides the statistics output from the sample program and suggestions for extending the tutorial.

Step 1: Find and extract the Child/Young Adult variables

Find and extract the respondent IDs, age at first birth, and other needed variables in the Child/Young Adult data set using the NLS Investigator.

  1. Start by finding the mother and child/young adult IDs.
    • Select variables by reference number and pick "C000" and then submit. C00001.00 is the respondent ID from the child/young adult file, and C00002.00 is the mother ID. Tag these two variables.
    • Note that the respondent ID, C00001.00, is a comprehensive ID variable, created for all children regardless of age or young adult status.
    • For this example, one could also use the Young Adult ID, Y00001.00, which will have the same value as C00001.00, but exists only for those children who participate at age 15 or older.
  2. Find age at first birth.
    • Search on the YA Fertility and Relationship Data/Created Area of Interest and Survey Year = 2006 (or whatever the most recent survey year is). Y12111.00 is age at first birth at the most recent interview the young adult completed. Tag Y12111.00.
  3. Find the gender variable, since this tutorial looks at young adult females, and find whether the respondent was at least 18 years old at her last young adult interview.
    • Select variables by the YA Common Key Variables Area of Interest.
    • Scroll down and select gender (Y06774.00), most recent young adult interview year (Y12051.00), and age at each young adult interview (Y19485.00, Y16727.00, Y14343.00, Y11924.00, Y09748.00, Y06776.00, Y03424.00).
  4. Run an extract to create the data set and corresponding SAS/SPSS/STATA/R program.

Reference Number Question Name Variable Title Year
C0000100 CPUBID ID CODE OF CHILD 2006
C0000200 MPUBID ID CODE OF MOTHER OF CHILD 2006
Y0342400 YADULT.AGE YOUNG ADULT AGE 1994
Y0677400 YASEX SEX OF YOUNG ADULT 2006
Y0677600 AGEINT96 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 1996
Y0974800 AGEINT98 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 1998
Y1192400 AGEINT2000 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 2000
Y1205100 LASTINTYR YEAR OF MOST RECENT YOUNG ADULT INTERVIEW 2006
Y1211100 AGE1B AGE OF R AT 1ST BIRTH 2006
Y1434300 AGEINT2002 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 2002
Y1672700 AGEINT2004 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 2004
Y1948500 AGEINT2006 AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW 2006

Open step 1 variable list in a separate browser window

Using the NLS Investigator

To create a tagset of specific variables and then extract the data set, use the Save / Download Tab in the NLS Investigator.

Step 2: Find and extract the NLSY79 variables

Find and extract comparable variables (to those in Step 1) in the NLSY79 data set using the NLS Investigator.

  1. Start by finding the NLSY79 respondent ID.
    • Select variables by reference number and pick "R000" and then submit. R00001.00 is the respondent ID for the mother. Tag this variable.
    • Note that R00001.00 = C00002.00 in the child/young adult data set.
  2. Search on the Word in Title Birth, Search Variable Title Age, and the Fertility and Relationship History/Created Area of Interest.
    • From the fairly long list, you can find the age at first birth created variables from 1982 forward (R08988.40, R11468.32, R15220.39, R18927.39, R22598.39, R24480.39, R28778.00, R30768.44, R34079.00, R36590.49, R40094.49, R44449.00, R50877.00, R51730.00, R64866.00, R70144.00, R77120.00, R85045.00, T09962.00).
    • Note that you will need this variable for each year because if the respondent misses an interview, it is not created for that interview year.
  3. Searching on Word in Title Age and the Key Variables Area of Interest will provide a list of created variables for age at the interview date, which you need from 1982 forward (R08983.10, R11451.10, R15203.10 R18910.10, R22581.10, R24455.10, R28713.00, R30750.00, R34017.00, R36571.00, R40076.00, R44187.00, R50817.00, R51670.00, R64798.00, R70075.00, R77048.00, R84972.00, T09890.00).
  4. Run an extract to create the data set and corresponding SAS/SPSS/STATA/R program (see "Using the NLS Investigator" in Step 1).

Reference Number Question Name Variable Title Year
R0000100 CASEID IDENTIFICATION CODE 1979
R0898310 *Created AGE OF R AT INTERVIEW DATE 1982
R0898840 *Created AGE OF R AT 1ST BIRTH 1982
R1145110 *Created AGE OF R AT INTERVIEW DATE 1983
R1146832 *Created AGE OF R AT 1ST BIRTH 1983
R1520310 *Created AGE OF R AT INTERVIEW DATE 1984
R1522039 *Created AGE OF R AT 1ST BIRTH 1984
R1891010 *Created AGE OF R AT INTERVIEW DATE 1985
R1892739 *Created AGE OF R AT 1ST BIRTH 1985
R2258110 *Created AGE OF R AT INTERVIEW DATE 1986
R2259839 *Created AGE OF R AT 1ST BIRTH 1986
R2445510 *Created AGE OF R AT INTERVIEW DATE 1987
R2448039 *Created AGE OF R AT 1ST BIRTH 1987
R2871300 *Created AGE OF R AT INTERVIEW DATE 1988
R2877800 *Created AGE OF R AT 1ST BIRTH 1988
R3075000 *Created AGE OF R AT INTERVIEW DATE 1989
R3076844 *Created AGE OF R AT 1ST BIRTH 1989
R3401700 *Created AGE OF R AT INTERVIEW DATE 1990
R3407900 *Created AGE OF R AT 1ST BIRTH 1990
R3657100 *Created AGE OF R AT INTERVIEW DATE 1991
R3659049 *Created AGE OF R AT 1ST BIRTH 1991
R4007600 *Created AGE OF R AT INTERVIEW DATE 1992
R4009449 *Created AGE OF R AT 1ST BIRTH 1992
R4418700 AGEATINT AGE OF R AT INTERVIEW DATE 1993
R4444900 *Created AGE OF R AT 1ST BIRTH 1993
R5081700 AGEATINT AGE OF R AT INTERVIEW DATE 1994
R5087700 AGE1B94 AGE OF R AT 1ST BIRTH 1994
R5167000 AGEATINT AGE OF R AT INTERVIEW DATE 1996
R5173000 AGE1B96 AGE OF R AT 1ST BIRTH 1996
R6479800 AGEATINT AGE OF R AT INTERVIEW DATE 1998
R6486600 AGE1B98 AGE OF R AT 1ST BIRTH 1998
R7007500 AGEATINT AGE OF R AT INTERVIEW DATE 2000
R7014400 AGE1B00 AGE OF R AT 1ST BIRTH 2000
R7704800 AGEATINT AGE OF R AT INTERVIEW DATE 2002
R7712000 AGE1B02 AGE OF R AT 1ST BIRTH 2002
R8497200 AGEATINT AGE OF R AT INTERVIEW DATE 2004
R8504500 AGE1B04 AGE OF R AT 1ST BIRTH 2004
T0989000 AGEATINT AGE OF R AT INTERVIEW DATE 2006
T0996200 AGE1B06 AGE OF R AT 1ST BIRTH 2006

Open step 2 variable list in a separate browser window

Step 3: Merge NLSY79 and Child/Young Adult data files and create new fertility variables

Once you have the two data sets from Steps 1 and 2, you are ready to merge them and start programming the variables. The logic is as follows:

  1. Start by merging the two data sets. Merge NLSY79 mother characteristics in with the Child/Young Adult data set using the code sample below.
  2. Code whether the mother had a birth prior to age 18.
    • Create this variable only for mothers who are interviewed after they turned 18.
    • Calculate the age at last interview and the year of last interview from 1982 forward.
    • Use this information to code the teen birth variable.
  3. Follow similar steps for the young adults.
    • Restrict the data to female young adults, and construct variables for age and year of last interview.
    • Code the teen birth variable for young adults who are interviewed after they turn 18.

Part A

*sort two data sets by mother id, and then merge;
data child;
set x;
*rename mother id to match name of variable in mom dataset;
momid = C0000200;
proc sort;  by momid;

data mom;
set y;
*rename NLSY79 id to match name of mother id variable in child dataset;
momid = R0000100;
proc sort;  by momid;

data childmom;
merge child mom;  by momid;
*eliminate NLSY79 respondents with no children in child data set, final data set has 11469 observations using data through 2006;
if C0000100 ne . ;

Part B

*create variables for age and year of last interview for mom;
*note that we name these variables to start with an "m" to denote that these are variables for the mother;
if R0898310 gt 0 then do;
m_age_lint = R0898310;
m_year_lint = 1982;
end;
*repeat for all intervening years;
*note that we redefine these variables each time "age at interview" is reported to find the age at last interview;

if T0989000 gt 0 then do;
m_age_lint = T0989000;
m_year_lint = 2006;
end;
*create variable indicating that mom had a teen birth;
*note that we define this variable only for women ages 18 and over;
if m_age_lint ge 18 then do;     
*age at 1st birth is between 0 and 17;
if (m_year_lint = 1982 and  R0898840 gt 0  and R0898840 lt 18) then m_teenbirth = 1;
*age at 1st birth is 18 or greater, so no teen birth;
if (m_year_lint = 1982 and  R0898840 ge 18) then m_teenbirth = 0;
*never gave birth, so no teen birth;  
if (m_year_lint = 1982 and R0898840 = -998) then m_teenbirth = 0;
*repeat for each year-- this strategy lets us define these variables using data reported at the last interview;

if (m_year_lint = 2006 and T0996200 gt 0  and T0996200 lt 18) then m_teenbirth = 1;
if (m_year_lint = 2006 and T0996200 ge 18) then m_teenbirth = 0;
if (m_year_lint = 2006 and T0996200 = -998) then m_teenbirth = 0;
end;

*using data through 2006;
*m_teenbirth (mean = .201, N = 11463);
*m_year_lint (mean = 2002, N = 11465);
*m_age_lint (mean =41.4, N = 11465);

Part C

*restrict sample to female young adults, drops 8322 observations using data through 2006;
*sample size is now 3147;
if Y0677400 = 2;
*create variables for age and year of last interview for young adult;
*note that we name these variables to start with "y" to denote that these are variables for the young adult;
if Y1205100 gt 0 then y_year_lint = Y1205100;
if y_year_lint = 1994 then y_age_lint = Y0342400;
*repeat for each year--this strategy lets us define these variables using data reported at the last interview;

if y_year_lint = 2006 then y_age_lint = Y1948500;
*create variable indicating that female young adults who are 18 or over had a teen birth;
if y_age_lint ge 18 then do;
if (Y1211100 gt 0  and Y1211100 lt 18) then y_teenbirth = 1;
if (Y1211100 ge 18) then y_teenbirth = 0;
if (Y1211100 = -998) then y_teenbirth = 0;
end;

*Final Statistics from Program: Data through 2006 survey
*m_teenbirth (mean = .249, N = 3147);
*y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18;
*y_year_lint (mean = 2006, N = 3147);
*y_age_lint (mean = 21.3, N = 3147);
*m_year_lint (mean = 2005, N = 3147);
*m_age_lint (mean = 44.6, N = 3147);

Open SAS sample code in a separate browser window

Part A

/* sort two data sets by mother id, and then merge
/* data files are "free" or "space-delimited" format
data list file=' _your filename and location for NLSY79 data here_' list/
  R0000100
  R0216500
  R0406510
  R0619010
  R0898310
  R0898840
  R1145110
  R1146832
  R1520310
  R1522039
  R1891010
  R1892739
  R2258110
  R2259839
  R2445510
  R2448039
  R2871300
  R2877800
  R3075000
  R3076844
  R3401700
  R3407900
  R3657100
  R3659049
  R4007600
  R4009449
  R4418700
  R4444900
  R5081700
  R5087700
  R5167000
  R5173000
  R6479800
  R6486600
  R7007500
  R7014400
  R7704800
  R7712000
  R8497200
  R8504500
  T0989000
  T0996200
execute
/* rename NLSY79  id to match name of mother id variable in child dataset
compute momid=r0000100
/* following "list" command can be deleted to reduce size of *.log file
list
save outfile="datmom.sav"

data list file='_your filename and location for Child/Young Adult data here_' list/
  C0000100
  C0000200
  Y0000100
  Y0342400
  Y0677400
  Y0677600
  Y0974800
  Y1192400
  Y1205100
  Y1211100
  Y1434300
  Y1672700
  Y1948500
execute
/* rename mother id to match name of variable in mom dataset
compute momid=c0000200
/* following "list" command can be deleted to reduce size of *.log file
list
save outfile="datkid.sav"

get file="datmom.sav"
sort cases by momid
save outfile="datmom2.sav"

get file="datkid.sav"
sort cases by momid
save outfile="datkid2.sav"

match files file="datkid2.sav" /table="datmom2.sav" /by=momid
/* following "list" command can be deleted to reduce size of *.log file
list

/* eliminate NLSY79 respondents with no children in child data set, final data set has 11469
/* observations using data through 2006
select if not(sysmis(c0000200))

/* rename NLSY79  id to match name of mother id variable in child dataset
compute momid = c0000200
sort cases by momid

Part B

/* create variables for age and year of last interview for mom
/* note that we name these variables to start with an "m" to denote that these are variables for
/* the mother
do if (t0989000 gt 0)
compute magelint = t0989000
compute myrlint = 2006

/* repeat for all intervening years
/* note that we redefine these variables each time "age at interview" is reported to find the age
/* at last interview

else if (r0898310 gt 0)
compute magelint = r0898310
compute myrlint = 1982
else
compute magelint = -4
compute myrlint = -4
end if

/* create variable indicating that mom had a teen birth
/* note that we define this variable only for women ages 18 and over
/* currently age 18 or over and age at 1st birth is between 0 and 17
do if (magelint ge 18 and myrlint eq 2006 and t0996200 gt 0 and t0996200 lt 18)
compute mtnbirth=1
/*currently age 18 or over and  age at 1st birth is 18 or greater, so no teen birth
else if (magelint ge 18 and myrlint eq 2006 and t0996200 ge 18)
compute mtnbirth=0
/* currently age 18 or over and never gave birth, so no teen birth
else if (magelint ge 18 and myrlint eq 2006 and t0996200 eq -998)
compute mtnbirth=0

/* repeat for each year-- this strategy lets us define these variables using data reported at the
/* last interview

else if (magelint ge 18 and myrlint eq 1982 and r0898840 gt 0 and r0898840 lt 18)
compute mtnbirth=1
else if (magelint ge 18 and myrlint eq 1982 and r0898840 ge 18)
compute mtnbirth=0
else if (magelint ge 18 and myrlint eq 1982 and r0898840 eq -998)
compute mtnbirth=0
else
compute mtnbirth=-4
end if

/* using data through 2006
/* m_teenbirth (mean = .201, N = 11463)
/* m_year_lint (mean =2002, N = 11465)
/* m_age_lint (mean =41.4, N = 11465)

Part C

/* restrict sample to female young adults, drops 8322 observations using data through 2006
/* sample size is now 3147
select if (y0677400 eq 2)

/*create variables for age and year of last interview for young adult
/*note that we name these variables to start with "y" to denote that these are variables 
/*   for the young adult
if (y1205100 gt 0) yyrlint = y1205100

do if (yyrlint eq 2006)
compute yagelint = y1948500

/* repeat for each year--this strategy lets us define these variables using 
/*   data reported at the last interview

else if (yyrlint eq 1994)
compute yagelint = y0342400
end if

/* create variable indicating that female young adults who are 18 or over had a teen birth
/* currently age 18 or over and  age at 1st birth is between 0 and 17
do if (yagelint ge 18 and y1211100 gt 0 and y1211100 lt 18)
compute ytnbirth=1
/* currently age 18 or over and age at 1st birth is 18 or greater, so no teen birth
else if (yagelint ge 18 and y1211100 ge 18)
compute ytnbirth=0
/* currently age 18 or over and never gave birth, so no teen birth
else if (yagelint ge 18 and y1211100 eq -998)
compute ytnbirth=0
end if

/* Final Statistics from Program:  Data through 2006 survey;
/* m_teenbirth (mean = .249, N = 3147)
/* y_teenbirth (mean = .137, N = 2419) smaller sample size because only created 
/*    for those at least 18
/* y_year_lint (mean = 2006, N = 3147)
/* y_age_lint (mean = 21.3, N = 3147)
/* m_year_lint (mean = 2005, N = 3147)
/* m_age_lint (mean = 44.6, N = 3147)

Open SPSS sample code in a separate browser window

Part A

*sort two data sets by mother id, and then merge;
use child;
*rename mother id to match name of variable in mom dataset;
gen momid = C0000200;
sort momid;
save child, replace;

use mom;
*rename NLSY79  id to match name of mother id variable in child dataset;
gen momid = R0000100;
sort momid;
save mom, replace;

merge momid using child mom;
*eliminate NLSY79 respondents with no children in child data set, final data set has 11469 observations using data through 2006;
drop if C0000100 = = . ;

Part B

*create variables for age and year of last interview for mom;
*note that we name these variables to start with an "m" to denote that these are variables for the mother;
gen m_age_lint = .;
gen m_year_lint = .;

replace m_age_lint = R0898310 if R0898310 > 0;
replace m_year_lint = 1982 if R0898310 > 0;
*repeat for all intervening years;
*note that we redefine these variables each time "age at interview" is reported to find the age at last interview;

replace m_age_lint = T0989000 if T0989000 > 0;
replace m_year_lint = 2006 if T0989000 > 0;
*create variable indicating that mom had a teen birth;
*note that we define this variable only for women ages 18 and over;
gen m_teenbirth = .;
*age at 1st birth is between 0 and 17;
replace m_teenbirth = 1 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 > 0 & R0898840 < 18;
*age at 1st birth is 18 or greater, so no teen birth;
replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 > = 18;
*never gave birth, so no teen birth;  
replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 = = -998;
*repeat for each year-- this strategy lets us define these variables using data reported at the last interview;

replace m_teenbirth = 1 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 > 0 & T0996200 < 18;
replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 > = 18;
replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 = = -998;

*using data through 2006;
*m_teenbirth (mean = .201, N = 11463);
*m_year_lint (mean =2002, N = 11465);
*m_age_lint (mean =41.4, N = 11465);

Part C

*restrict sample to female young adults, drops 8322 observations using data through 2006;
*sample size is now 3147;
keep if Y0677400 = = 2;
*create variables for age and year of last interview for young adult;
*note that we name these variables to start with "y" to denote that these are variables for the young adult;
gen y_year_lint = .;
gen y_age_lint = .;
replace y_year_lint = Y1205100 if Y1205100 > 0;
replace y_age_lint = Y0342400 if y_year_lint = =1994;
*repeat for each year--this strategy lets us define these variables using data reported at the last interview;

replace y_age_lint = Y1948500 if y_year_lint = = 2006;
*create variable indicating that female young adults who are 18 or over had a teen birth;
gen y_teenbirth = .;
replace y_teenbirth = 1 if y_age_lint >= 18 & Y1211100 > 0 & Y1211100 < 18;
replace y_teenbirth = 0 if y_age_lint >= 18 & Y1211100 >= 18;
replace y_teenbirth = 0 if y_age_lint >= 18 & Y1211100 = = -998;

*Final Statistics from Program:  Data through 2006 survey;
*m_teenbirth (mean = .249, N = 3147);
*y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18;
*y_year_lint (mean = 2006, N = 3147)
*y_age_lint (mean = 21.3, N = 3147)
*m_year_lint (mean = 2005, N = 3147)
*m_age_lint (mean = 44.6, N = 3147)

Open STATA sample code in a separate browser window

Additional information

Final statistics from sample program (data through 2006 survey)

m_teenbirth (mean = .249, N = 3147);
y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18;
y_year_lint (mean = 2006, N = 3147)
y_age_lint (mean = 21.3, N = 3147)
m_year_lint (mean = 2005, N = 3147)
m_age_lint (mean = 44.6, N = 3147)

Extensions

This tutorial focuses on linking mothers and their young adult daughters. Similar techniques can be used to link other characteristics of mothers with characteristics of their children. For further suggestions, review the Possible Research Agendas for Intercohort and Cross Generational Research section.