Tutorial objective and prerequisites
Objective
The goal is to link NLSY79 mothers with their children. This tutorial explains the general logic to link mothers and children of any age covered in the Children of the NLSY79. The tutorial then gives a specific example of using data on mothers and young adult daughters by creating two variables: (1) whether the mother had a first birth prior to age 18, and (2) whether the daughter had a first birth prior to age 18. This allows one to examine intergenerational correlations in teenage childbearing.
Knowledge assumed
This tutorial assumes that you already know how to use the NLS Investigator to create a tagset that saves your variables and to extract data. If you need assistance with the NLS Investigator before starting this tutorial, please review the Investigator User Guide or contact NLS User Services.
Background reading
To understand how to link mother and child/young adult files, see the NLSY79 Child and Young Adult Users Guide section on Linking Children, Young Adults and Mothers and Appendix E and Appendix F for sample SPSS and SAS programs. In the NLSY79 User's Guide, see the sections on Age and Fertility.
Example: Intergenerational linking of NLSY79 mothers and their young adult daughters
Preview of steps
- Step 1: Find and extract the Child/Young Adult variables
- Step 2: Find and extract the NLSY79 variables
- Step 3: Merge NLSY79 and Child/Young Adult data files and create new fertility variables
Additional information provides the statistics output from the sample program and suggestions for extending the tutorial.
Step 1: Find and extract the Child/Young Adult variables
Find and extract the respondent IDs, age at first birth, and other needed variables in the Child/Young Adult data set using the NLS Investigator.
- Start by finding the mother and child/young adult IDs.
- Select variables by reference number and pick "C000" and then submit. C00001.00 is the respondent ID from the child/young adult file, and C00002.00 is the mother ID. Tag these two variables.
- Note that the respondent ID, C00001.00, is a comprehensive ID variable, created for all children regardless of age or young adult status.
- For this example, one could also use the Young Adult ID, Y00001.00, which will have the same value as C00001.00, but exists only for those children who participate at age 15 or older.
- Find age at first birth.
- Search on the YA Fertility and Relationship Data/Created Area of Interest and Survey Year = 2006 (or whatever the most recent survey year is). Y12111.00 is age at first birth at the most recent interview the young adult completed. Tag Y12111.00.
- Find the gender variable, since this tutorial looks at young adult females, and find whether the respondent was at least 18 years old at her last young adult interview.
- Select variables by the YA Common Key Variables Area of Interest.
- Scroll down and select gender (Y06774.00), most recent young adult interview year (Y12051.00), and age at each young adult interview (Y19485.00, Y16727.00, Y14343.00, Y11924.00, Y09748.00, Y06776.00, Y03424.00).
- Run an extract to create the data set and corresponding SAS/SPSS/STATA/R program.
Reference Number | Question Name | Variable Title | Year |
---|---|---|---|
C0000100 | CPUBID | ID CODE OF CHILD | 2006 |
C0000200 | MPUBID | ID CODE OF MOTHER OF CHILD | 2006 |
Y0342400 | YADULT.AGE | YOUNG ADULT AGE | 1994 |
Y0677400 | YASEX | SEX OF YOUNG ADULT | 2006 |
Y0677600 | AGEINT96 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 1996 |
Y0974800 | AGEINT98 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 1998 |
Y1192400 | AGEINT2000 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 2000 |
Y1205100 | LASTINTYR | YEAR OF MOST RECENT YOUNG ADULT INTERVIEW | 2006 |
Y1211100 | AGE1B | AGE OF R AT 1ST BIRTH | 2006 |
Y1434300 | AGEINT2002 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 2002 |
Y1672700 | AGEINT2004 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 2004 |
Y1948500 | AGEINT2006 | AGE OF YOUNG ADULT (IN YEARS) AT DATE OF INTERVIEW | 2006 |
Using the NLS Investigator
To create a tagset of specific variables and then extract the data set, use the Save / Download Tab in the NLS Investigator.
Step 2: Find and extract the NLSY79 variables
Find and extract comparable variables (to those in Step 1) in the NLSY79 data set using the NLS Investigator.
- Start by finding the NLSY79 respondent ID.
- Select variables by reference number and pick "R000" and then submit. R00001.00 is the respondent ID for the mother. Tag this variable.
- Note that R00001.00 = C00002.00 in the child/young adult data set.
- Search on the Word in Title Birth, Search Variable Title Age, and the Fertility and Relationship History/Created Area of Interest.
- From the fairly long list, you can find the age at first birth created variables from 1982 forward (R08988.40, R11468.32, R15220.39, R18927.39, R22598.39, R24480.39, R28778.00, R30768.44, R34079.00, R36590.49, R40094.49, R44449.00, R50877.00, R51730.00, R64866.00, R70144.00, R77120.00, R85045.00, T09962.00).
- Note that you will need this variable for each year because if the respondent misses an interview, it is not created for that interview year.
- Searching on Word in Title Age and the Key Variables Area of Interest will provide a list of created variables for age at the interview date, which you need from 1982 forward (R08983.10, R11451.10, R15203.10 R18910.10, R22581.10, R24455.10, R28713.00, R30750.00, R34017.00, R36571.00, R40076.00, R44187.00, R50817.00, R51670.00, R64798.00, R70075.00, R77048.00, R84972.00, T09890.00).
- Run an extract to create the data set and corresponding SAS/SPSS/STATA/R program (see "Using the NLS Investigator" in Step 1).
Reference Number | Question Name | Variable Title | Year |
---|---|---|---|
R0000100 | CASEID | IDENTIFICATION CODE | 1979 |
R0898310 | *Created | AGE OF R AT INTERVIEW DATE | 1982 |
R0898840 | *Created | AGE OF R AT 1ST BIRTH | 1982 |
R1145110 | *Created | AGE OF R AT INTERVIEW DATE | 1983 |
R1146832 | *Created | AGE OF R AT 1ST BIRTH | 1983 |
R1520310 | *Created | AGE OF R AT INTERVIEW DATE | 1984 |
R1522039 | *Created | AGE OF R AT 1ST BIRTH | 1984 |
R1891010 | *Created | AGE OF R AT INTERVIEW DATE | 1985 |
R1892739 | *Created | AGE OF R AT 1ST BIRTH | 1985 |
R2258110 | *Created | AGE OF R AT INTERVIEW DATE | 1986 |
R2259839 | *Created | AGE OF R AT 1ST BIRTH | 1986 |
R2445510 | *Created | AGE OF R AT INTERVIEW DATE | 1987 |
R2448039 | *Created | AGE OF R AT 1ST BIRTH | 1987 |
R2871300 | *Created | AGE OF R AT INTERVIEW DATE | 1988 |
R2877800 | *Created | AGE OF R AT 1ST BIRTH | 1988 |
R3075000 | *Created | AGE OF R AT INTERVIEW DATE | 1989 |
R3076844 | *Created | AGE OF R AT 1ST BIRTH | 1989 |
R3401700 | *Created | AGE OF R AT INTERVIEW DATE | 1990 |
R3407900 | *Created | AGE OF R AT 1ST BIRTH | 1990 |
R3657100 | *Created | AGE OF R AT INTERVIEW DATE | 1991 |
R3659049 | *Created | AGE OF R AT 1ST BIRTH | 1991 |
R4007600 | *Created | AGE OF R AT INTERVIEW DATE | 1992 |
R4009449 | *Created | AGE OF R AT 1ST BIRTH | 1992 |
R4418700 | AGEATINT | AGE OF R AT INTERVIEW DATE | 1993 |
R4444900 | *Created | AGE OF R AT 1ST BIRTH | 1993 |
R5081700 | AGEATINT | AGE OF R AT INTERVIEW DATE | 1994 |
R5087700 | AGE1B94 | AGE OF R AT 1ST BIRTH | 1994 |
R5167000 | AGEATINT | AGE OF R AT INTERVIEW DATE | 1996 |
R5173000 | AGE1B96 | AGE OF R AT 1ST BIRTH | 1996 |
R6479800 | AGEATINT | AGE OF R AT INTERVIEW DATE | 1998 |
R6486600 | AGE1B98 | AGE OF R AT 1ST BIRTH | 1998 |
R7007500 | AGEATINT | AGE OF R AT INTERVIEW DATE | 2000 |
R7014400 | AGE1B00 | AGE OF R AT 1ST BIRTH | 2000 |
R7704800 | AGEATINT | AGE OF R AT INTERVIEW DATE | 2002 |
R7712000 | AGE1B02 | AGE OF R AT 1ST BIRTH | 2002 |
R8497200 | AGEATINT | AGE OF R AT INTERVIEW DATE | 2004 |
R8504500 | AGE1B04 | AGE OF R AT 1ST BIRTH | 2004 |
T0989000 | AGEATINT | AGE OF R AT INTERVIEW DATE | 2006 |
T0996200 | AGE1B06 | AGE OF R AT 1ST BIRTH | 2006 |
Step 3: Merge NLSY79 and Child/Young Adult data files and create new fertility variables
Once you have the two data sets from Steps 1 and 2, you are ready to merge them and start programming the variables. The logic is as follows:
- Start by merging the two data sets. Merge NLSY79 mother characteristics in with the Child/Young Adult data set using the code sample below.
- Code whether the mother had a birth prior to age 18.
- Create this variable only for mothers who are interviewed after they turned 18.
- Calculate the age at last interview and the year of last interview from 1982 forward.
- Use this information to code the teen birth variable.
- Follow similar steps for the young adults.
- Restrict the data to female young adults, and construct variables for age and year of last interview.
- Code the teen birth variable for young adults who are interviewed after they turn 18.
Part A
*sort two data sets by mother id, and then merge; data child; set x; *rename mother id to match name of variable in mom dataset; momid = C0000200; proc sort; by momid; data mom; set y; *rename NLSY79 id to match name of mother id variable in child dataset; momid = R0000100; proc sort; by momid; data childmom; merge child mom; by momid; *eliminate NLSY79 respondents with no children in child data set, final data set has 11469 observations using data through 2006; if C0000100 ne . ;
Part B
*create variables for age and year of last interview for mom; *note that we name these variables to start with an "m" to denote that these are variables for the mother; if R0898310 gt 0 then do; m_age_lint = R0898310; m_year_lint = 1982; end; *repeat for all intervening years; *note that we redefine these variables each time "age at interview" is reported to find the age at last interview; if T0989000 gt 0 then do; m_age_lint = T0989000; m_year_lint = 2006; end; *create variable indicating that mom had a teen birth; *note that we define this variable only for women ages 18 and over; if m_age_lint ge 18 then do; *age at 1st birth is between 0 and 17; if (m_year_lint = 1982 and R0898840 gt 0 and R0898840 lt 18) then m_teenbirth = 1; *age at 1st birth is 18 or greater, so no teen birth; if (m_year_lint = 1982 and R0898840 ge 18) then m_teenbirth = 0; *never gave birth, so no teen birth; if (m_year_lint = 1982 and R0898840 = -998) then m_teenbirth = 0; *repeat for each year-- this strategy lets us define these variables using data reported at the last interview; if (m_year_lint = 2006 and T0996200 gt 0 and T0996200 lt 18) then m_teenbirth = 1; if (m_year_lint = 2006 and T0996200 ge 18) then m_teenbirth = 0; if (m_year_lint = 2006 and T0996200 = -998) then m_teenbirth = 0; end; *using data through 2006; *m_teenbirth (mean = .201, N = 11463); *m_year_lint (mean = 2002, N = 11465); *m_age_lint (mean =41.4, N = 11465);
Part C
*restrict sample to female young adults, drops 8322 observations using data through 2006; *sample size is now 3147; if Y0677400 = 2; *create variables for age and year of last interview for young adult; *note that we name these variables to start with "y" to denote that these are variables for the young adult; if Y1205100 gt 0 then y_year_lint = Y1205100; if y_year_lint = 1994 then y_age_lint = Y0342400; *repeat for each year--this strategy lets us define these variables using data reported at the last interview; if y_year_lint = 2006 then y_age_lint = Y1948500; *create variable indicating that female young adults who are 18 or over had a teen birth; if y_age_lint ge 18 then do; if (Y1211100 gt 0 and Y1211100 lt 18) then y_teenbirth = 1; if (Y1211100 ge 18) then y_teenbirth = 0; if (Y1211100 = -998) then y_teenbirth = 0; end; *Final Statistics from Program: Data through 2006 survey *m_teenbirth (mean = .249, N = 3147); *y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18; *y_year_lint (mean = 2006, N = 3147); *y_age_lint (mean = 21.3, N = 3147); *m_year_lint (mean = 2005, N = 3147); *m_age_lint (mean = 44.6, N = 3147);
Part A
/* sort two data sets by mother id, and then merge /* data files are "free" or "space-delimited" format data list file=' _your filename and location for NLSY79 data here_' list/ R0000100 R0216500 R0406510 R0619010 R0898310 R0898840 R1145110 R1146832 R1520310 R1522039 R1891010 R1892739 R2258110 R2259839 R2445510 R2448039 R2871300 R2877800 R3075000 R3076844 R3401700 R3407900 R3657100 R3659049 R4007600 R4009449 R4418700 R4444900 R5081700 R5087700 R5167000 R5173000 R6479800 R6486600 R7007500 R7014400 R7704800 R7712000 R8497200 R8504500 T0989000 T0996200 execute /* rename NLSY79 id to match name of mother id variable in child dataset compute momid=r0000100 /* following "list" command can be deleted to reduce size of *.log file list save outfile="datmom.sav" data list file='_your filename and location for Child/Young Adult data here_' list/ C0000100 C0000200 Y0000100 Y0342400 Y0677400 Y0677600 Y0974800 Y1192400 Y1205100 Y1211100 Y1434300 Y1672700 Y1948500 execute /* rename mother id to match name of variable in mom dataset compute momid=c0000200 /* following "list" command can be deleted to reduce size of *.log file list save outfile="datkid.sav" get file="datmom.sav" sort cases by momid save outfile="datmom2.sav" get file="datkid.sav" sort cases by momid save outfile="datkid2.sav" match files file="datkid2.sav" /table="datmom2.sav" /by=momid /* following "list" command can be deleted to reduce size of *.log file list /* eliminate NLSY79 respondents with no children in child data set, final data set has 11469 /* observations using data through 2006 select if not(sysmis(c0000200)) /* rename NLSY79 id to match name of mother id variable in child dataset compute momid = c0000200 sort cases by momid
Part B
/* create variables for age and year of last interview for mom /* note that we name these variables to start with an "m" to denote that these are variables for /* the mother do if (t0989000 gt 0) compute magelint = t0989000 compute myrlint = 2006 /* repeat for all intervening years /* note that we redefine these variables each time "age at interview" is reported to find the age /* at last interview else if (r0898310 gt 0) compute magelint = r0898310 compute myrlint = 1982 else compute magelint = -4 compute myrlint = -4 end if /* create variable indicating that mom had a teen birth /* note that we define this variable only for women ages 18 and over /* currently age 18 or over and age at 1st birth is between 0 and 17 do if (magelint ge 18 and myrlint eq 2006 and t0996200 gt 0 and t0996200 lt 18) compute mtnbirth=1 /*currently age 18 or over and age at 1st birth is 18 or greater, so no teen birth else if (magelint ge 18 and myrlint eq 2006 and t0996200 ge 18) compute mtnbirth=0 /* currently age 18 or over and never gave birth, so no teen birth else if (magelint ge 18 and myrlint eq 2006 and t0996200 eq -998) compute mtnbirth=0 /* repeat for each year-- this strategy lets us define these variables using data reported at the /* last interview else if (magelint ge 18 and myrlint eq 1982 and r0898840 gt 0 and r0898840 lt 18) compute mtnbirth=1 else if (magelint ge 18 and myrlint eq 1982 and r0898840 ge 18) compute mtnbirth=0 else if (magelint ge 18 and myrlint eq 1982 and r0898840 eq -998) compute mtnbirth=0 else compute mtnbirth=-4 end if /* using data through 2006 /* m_teenbirth (mean = .201, N = 11463) /* m_year_lint (mean =2002, N = 11465) /* m_age_lint (mean =41.4, N = 11465)
Part C
/* restrict sample to female young adults, drops 8322 observations using data through 2006 /* sample size is now 3147 select if (y0677400 eq 2) /*create variables for age and year of last interview for young adult /*note that we name these variables to start with "y" to denote that these are variables /* for the young adult if (y1205100 gt 0) yyrlint = y1205100 do if (yyrlint eq 2006) compute yagelint = y1948500 /* repeat for each year--this strategy lets us define these variables using /* data reported at the last interview else if (yyrlint eq 1994) compute yagelint = y0342400 end if /* create variable indicating that female young adults who are 18 or over had a teen birth /* currently age 18 or over and age at 1st birth is between 0 and 17 do if (yagelint ge 18 and y1211100 gt 0 and y1211100 lt 18) compute ytnbirth=1 /* currently age 18 or over and age at 1st birth is 18 or greater, so no teen birth else if (yagelint ge 18 and y1211100 ge 18) compute ytnbirth=0 /* currently age 18 or over and never gave birth, so no teen birth else if (yagelint ge 18 and y1211100 eq -998) compute ytnbirth=0 end if /* Final Statistics from Program: Data through 2006 survey; /* m_teenbirth (mean = .249, N = 3147) /* y_teenbirth (mean = .137, N = 2419) smaller sample size because only created /* for those at least 18 /* y_year_lint (mean = 2006, N = 3147) /* y_age_lint (mean = 21.3, N = 3147) /* m_year_lint (mean = 2005, N = 3147) /* m_age_lint (mean = 44.6, N = 3147)
Part A
*sort two data sets by mother id, and then merge; use child; *rename mother id to match name of variable in mom dataset; gen momid = C0000200; sort momid; save child, replace; use mom; *rename NLSY79 id to match name of mother id variable in child dataset; gen momid = R0000100; sort momid; save mom, replace; merge momid using child mom; *eliminate NLSY79 respondents with no children in child data set, final data set has 11469 observations using data through 2006; drop if C0000100 = = . ;
Part B
*create variables for age and year of last interview for mom; *note that we name these variables to start with an "m" to denote that these are variables for the mother; gen m_age_lint = .; gen m_year_lint = .; replace m_age_lint = R0898310 if R0898310 > 0; replace m_year_lint = 1982 if R0898310 > 0; *repeat for all intervening years; *note that we redefine these variables each time "age at interview" is reported to find the age at last interview; replace m_age_lint = T0989000 if T0989000 > 0; replace m_year_lint = 2006 if T0989000 > 0; *create variable indicating that mom had a teen birth; *note that we define this variable only for women ages 18 and over; gen m_teenbirth = .; *age at 1st birth is between 0 and 17; replace m_teenbirth = 1 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 > 0 & R0898840 < 18; *age at 1st birth is 18 or greater, so no teen birth; replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 > = 18; *never gave birth, so no teen birth; replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 1982 & R0898840 = = -998; *repeat for each year-- this strategy lets us define these variables using data reported at the last interview; replace m_teenbirth = 1 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 > 0 & T0996200 < 18; replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 > = 18; replace m_teenbirth = 0 if m_age_lint >= 18 & m_year_lint = = 2006 & T0996200 = = -998; *using data through 2006; *m_teenbirth (mean = .201, N = 11463); *m_year_lint (mean =2002, N = 11465); *m_age_lint (mean =41.4, N = 11465);
Part C
*restrict sample to female young adults, drops 8322 observations using data through 2006; *sample size is now 3147; keep if Y0677400 = = 2; *create variables for age and year of last interview for young adult; *note that we name these variables to start with "y" to denote that these are variables for the young adult; gen y_year_lint = .; gen y_age_lint = .; replace y_year_lint = Y1205100 if Y1205100 > 0; replace y_age_lint = Y0342400 if y_year_lint = =1994; *repeat for each year--this strategy lets us define these variables using data reported at the last interview; replace y_age_lint = Y1948500 if y_year_lint = = 2006; *create variable indicating that female young adults who are 18 or over had a teen birth; gen y_teenbirth = .; replace y_teenbirth = 1 if y_age_lint >= 18 & Y1211100 > 0 & Y1211100 < 18; replace y_teenbirth = 0 if y_age_lint >= 18 & Y1211100 >= 18; replace y_teenbirth = 0 if y_age_lint >= 18 & Y1211100 = = -998; *Final Statistics from Program: Data through 2006 survey; *m_teenbirth (mean = .249, N = 3147); *y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18; *y_year_lint (mean = 2006, N = 3147) *y_age_lint (mean = 21.3, N = 3147) *m_year_lint (mean = 2005, N = 3147) *m_age_lint (mean = 44.6, N = 3147)
Additional information
Final statistics from sample program (data through 2006 survey)
m_teenbirth (mean = .249, N = 3147); y_teenbirth (mean = .137, N = 2419) smaller sample size because only created for those at least 18; y_year_lint (mean = 2006, N = 3147) y_age_lint (mean = 21.3, N = 3147) m_year_lint (mean = 2005, N = 3147) m_age_lint (mean = 44.6, N = 3147)
Extensions
This tutorial focuses on linking mothers and their young adult daughters. Similar techniques can be used to link other characteristics of mothers with characteristics of their children. For further suggestions, review the Possible Research Agendas for Intercohort and Cross Generational Research section.