Skip to main content
National Longitudinal Survey of Youth 1997 (NLSY97)

Appendix 6: Event History Creation and Documentation

The NLSY97 survey records significant life-course transitions experienced by young people in a longitudinal format. The event history arrays document these events in a chronological format that records the significant transitions in a meaningful manner while maintaining data quality. Using these arrays, researchers can extract the status of a respondent at a point in time or over time. Event history arrays are generated for five distinct areas: employment, marital/cohabitation status, program participation, schooling, and arrests/incarceration. This section presents information on each type of event history array; for details on the chronological format of the arrays and the naming conventions used to identify the variables, users should refer to Appendix 7: Continuous Month Scheme and Crosswalk.

Click a topic below to view details and programming code:

Three employment arrays provide information on the respondent's civilian employment on a weekly basis. These arrays include information about employee jobs and self-employment; jobs reported in the freelance section are not included in the arrays. Please see the NLSY97 User's Guide for a more complete description of these job types. All employment arrays provide information starting in the month when the respondent turned 14 and ending in the week that he or she was last interviewed.

  1. EMP_STATUS
    This main array presents the civilian employment status of a respondent in a particular week. The codes and their explanations follow:

    Code

    Definition

    Status=0: No information reported to account for week Week cannot be assigned due to missing job start and stop dates.
    Status=1: Not associated with an employer, not actively searching for an employee job Refers to weeks during a between-jobs gap in which the respondent is not actively searching and reports working at a freelance job. Since the actual weeks working at a freelance job cannot be determined, all weeks in which the respondent is not actively searching are coded in this manner. This status code is only used when respondents reported working at a freelance job in addition to a gap in a regular job. As a result, this code only exists through round 5, after which all respondents aged out of the freelance section.
    Status=2: Not working (unemployment vs. out of labor force cannot be determined) Assigned when the respondent is not asked follow-up questions about his or her search activity during a within-job gap or a between-jobs gap.
    Status=3: Associated with an employer, periods not working for employer are missing Used when a respondent reports an indeterminate start or stop date for a within-job gap.
    Status=4: Unemployed Indicates that the respondent reports actively searching for work during a within-job gap or a between-jobs gap. When the number of weeks unemployed do not account for the entire gap period, weeks unemployed are assumed to occur in the middle of that period.
    Status=5: Out of the labor force Assigned during a between-jobs gap or a within-job gap when the respondent is either not actively searching for work or on layoff from a job.
    Status=6: Active military service Indicates that the respondent is tied to the military.
    Status=9701 to 201010: Employer on roster Refers to the employer number on the employer roster (YEMP_UID.xx). Presence of an employer number indicates that the respondent was working during a given week. Civilian work takes precedence over other activities, such as job search. Respondents who report working at an employer job for one day in a given week are listed as having worked at that job for the entire week, regardless of other activities.
  2. EMP_DUAL_JOB#
    If a respondent holds more than one civilian employee job during a week, the second employee job is presented in a dual job array. These arrays contain only the job number of the overlapping job; labor force status information is only included in the main array. For example, if a respondent held two civilian employee jobs (e.g., the first and third jobs listed on the employer roster) in one week, the employer number for the first job would be recorded in the EMP_STATUS array and the employer number for the third job would be recorded in the EMP_DUAL_2 array. If a respondent held three jobs (e.g., jobs #01, #04, and #05 on the roster) in one week, the first job would be recorded in the EMP_STATUS array, the employer ID for job #04 would be recorded in the EMP_DUAL_2 array, and the employer ID for job #05 would be recorded in the EMP_DUAL_3 array. Unlike the NLSY79 work history arrays, jobs are recorded in the status and dual jobs arrays based upon the order presented in the employer rosters, which is sorted by the ending date with the current or most recent job listed first.
  3. EMP_HOURS
    This final array calculates the total number of hours worked by a respondent at any civilian employee job during each week. Hours per week worked at each job are assumed constant except during a reported gap, when the hours for that job are assumed to be zero. Each week is assigned a code of '-3 (invalid skip)' when any of the jobs has an indeterminate gap date.

Other information about employment event history arrays

Continuous week crosswalk. A secondary set of variables translates the reported beginning and ending dates (day, month, and year) of employee jobs and the gaps within those jobs to the week and year naming scheme (e.g., EMP_GAP_START_YEAR.01.01 and EMP_GAP_END_YEAR.01.01 provide the start and end dates of the respondent's first gap at the first job in the continuous week and year format). More information about the week and year naming scheme is provided in Appendix 7 in this document.

Linking to survey data using unique ID codes. The created event history variables can be used in conjunction with the main file information about the respondent's employment. In the main data, unique employer ID numbers are listed under the question name YEMP_UID.xx (e.g., R24761.); these codes are used in the weekly employment status variables. Using these unique ID codes, researchers can identify the comparable job information (e.g., complete start and stop dates, fringe benefits, job satisfaction, industry and occupation, etc.) from the main file. The unique ID codes are assigned based on the survey round in which the employer was first reported.

User note about freelance and self-employment

The collection of freelance and self-employment information changed in the round 4 interview, as described in the introduction to Appendix 2: Employment Variable Creation. A small number of round 4 self-employed jobs may have a unique ID of 199999. The assignment of unique ID codes is described in detail in Appendix 8: Instrument Rosters.

Denial of previously reported employers. Respondents sometimes deny that they ever worked for an employer reported in a previous round. If this situation occurs, the data for that employer remain in the event history arrays, but a flag variable (EMP_DENY) indicates that the employer was denied in a subsequent round. For example, assume that a respondent reported working for employer number 9802 from January 1, 1998 through the round 2 interview date. In round 3, however, the respondent stated that he or she never worked for that employer. The weekly STATUS variables for January 1, 1998, through the round 2 interview date will continue to report the respondent's status as working for employer 9802, but the EMP_DENY variable will also have a value of 9802, indicating that the respondent denied working for that employer during the round 3 interview.

Missing and imputed values. Occasionally, respondents cannot provide information about the start and end dates of employment periods or gaps in employment. Because dates of employment are often used in subsequent questions in the jobs section, default values are substituted for these missing values so that the interview program can continue. The missing values are then reinserted in the public use data file so that researchers will know the true value. However, to follow the flow of an interview, users may need to understand what values were substituted so that the correct question path can be followed. Similarly, in the creation of the event history arrays, some missing values are imputed. Imputation rules and the effect of each on the event history arrays are as follows:

Type of missing data

Imputed value in interview and event history data

Effect on event history variables

Missing job start or stop day Start day = 1
Stop day = 28
If there is a valid month and year, the imputed days are used in the creation of status variables as if they were valid data. For example, a respondent with an imputed start day of "1" for employer 9701 will be listed as working for employer 9701 for the first week (and each subsequent week) of the reported month in the STATUS array.
Missing job start or stop month Start month = 1
Stop month = 12
In the STATUS array, weeks in imputed months are assigned a status of "0"--no information. Each month from the beginning of the job to the next known date or from the last known date to the end of the job is assigned a 0. For example, assume a respondent reports starting a job in an unknown month of 1997 but then reports a within-job gap starting on 6/1/97. All weeks in the months of January-May will be assigned a value of 0.
Missing job start or stop year Start year = year of last interview
Stop year = year of current interview
In the status array, weeks in imputed years are assigned a status of 0. For example, in round 2 a respondent who reported a new job with an unknown start year would be assigned an imputed value equal to the year of the round 1 interview (usually 1997). In the STATUS variables, each week from the round 1 interview date to the first known employment date (or the current interview date) would have a value of 0.
Missing gap start or stop day Start day = 1
Stop day = 28
If there is a valid month and year, the imputed days are used in the creation of status variables as if they were valid data. For example, a respondent with an imputed start day of "1" for a within-job gap will be assigned a value of 2, 4, or 5--depending on information about layoff and job search--for the first week (and each subsequent week) of the reported month in the STATUS array.
Missing gap start or stop month; missing gap start or stop year Start month = job start date (or the date of a previous known gap)
Stop month = job stop month (or the date of a later known gap)
Each week in the imputed period is assigned a value of 3 in the STATUS array, meaning associated with an employer but with missing gap information. For example, a respondent with a job start date of 4/12/97 and a gap with an unknown start month and year would have an imputed gap start date of 4/12/97. Each week from that date to the next known employment date would have a status of 3.

Respondents may have more than one job in a given week due to the imputation of dates as described above. If the month or year was imputed for one job, resulting in the assignment of zeros to a given set of weeks, but another job with known dates falls in some of those weeks, the zeros will be dropped and replaced by information about the known job. However, the respondent will not be listed as having a dual job in those weeks. The imputed employer will not be listed in any array if the zeros are dropped because another job provides valid information.

Backreporter variables. Some respondents report during the current interview a new job with a start date prior to the date of the last interview that was not reported during that interview. If these jobs had been reported at the previous interview, the weeks and hours worked would have been represented in the arrays at that time. When they are instead reported in the current interview, the event history arrays created at the previous interview date are not changed to include information about these new jobs. Three "backreporter" variables alert users to changes that would have resulted if the jobs had been correctly reported during the previous interview.

The first variable, EMP_BK_WKS, tells how many weeks before the previous interview date the job started. The second and third variables show how the status and hours arrays would have been affected had the job beginning before the date of last interview been reported at the prior interview and included in the original array construction. One variable, EMP_BK_STATUS, indicates the number of weeks from the job's start date to the date of last interview for which a nonworking status would have changed to an employer ID had the job been reported during the previous interview round. The other variable, EMP_BK_HOURS, informs users about the additional number of hours per week worked on this job for the weeks from the job's start date to the date of the previous interview.

For example, assume a respondent named Mary was interviewed on January 15, 1999 (round 3), and January 15, 2000 (round 4). In round 3, Mary reported no employers. In round 4, she reported working 30 hours a week on a job that began on January 1, 1999. Since the job began 2 weeks before the round 3 interview, EMP_BK_WKS would have a value of 2. EMP_BK_STATUS would also have a value of 2, indicating that 2 weeks in the round 3 arrays would have changed from nonworking to working status. EMP_BK_HOURS would have a value of 30, indicating that 30 additional hours would have been worked in each of those weeks.

Similarly, assume a respondent named John was interviewed on the same dates as Mary in rounds 3 and 4. In round 3, John reported a job that he had worked at for 10 hours per week since the round 2 interview. In round 4, he reported a second, 20 hours-per-week job that began on January 1, 1999, 2 weeks before his round 3 interview. Like Mary, John would have a value of 2 for the EMP_BK_WKS variable. However, the weeks between January 1 and January 15, 1999, would already indicate that John was working (at the original employer). Therefore, EMP_BK_STATUS would have a value of 0, because no weeks would have changed from nonworking to working status if John had reported the new job in round 3. EMP_BK_HOURS would have a value of 20, indicating the number of hours per week that John worked at the new job. In John's case, the hours worked array variables created in round 3 would have a value of 10, reflecting the job he reported in round 3. Researchers can add the value of EMP_BK_HOURS to the value in the original round 3 arrays for the 2 weeks before January 15, 1999, to determine that John worked 30 hours per week in those weeks.

The NLSY97 marital and cohabitation arrays record changes in the respondent's marital status and cohabitation changes on a monthly basis. The marital/cohabitation history program converts dates reported in the marriage section (beginning and ending dates of cohabitations, marriages, separations, divorces, and widowhoods) to an actual month number, using January 1980 as month #1. Used jointly, these arrays allow the researcher to obtain a detailed history of the respondent's partners and changes in his/her marital and cohabitation status on a monthly basis. All marital/cohabitation arrays provide information beginning in the month that the respondent turned 14 (although respondents do not answer marriage and cohabitation questions until they reach age 16) and ending in the month that he or she was last interviewed. Additionally, the beginning dates of the youth's first marriage and first cohabitation and first divorce are provided in three variables: CVC_FIRST_MARRY_MONTH, CVC_FIRST_COHAB_MONTH, and CVC_FIRST_DIVORCE_MONTH.

Three types of arrays record transitions between living without a partner of the opposite sex to cohabiting or to marriage.

  1. MAR_STATUS
    The main array presents the status (e.g., never married/not cohabiting, cohabiting, married, divorced) of a respondent during a particular month. Marital status takes precedence over cohabiting; for example, if a respondent is divorced and living with another partner, the status listed in this array will be 'divorced.' Respondents who are married but not living with their spouse are coded as married. If a respondent reports an annulment, the previously reported dates of marriage are maintained and the marital status code after the annulment is 'divorced.'

    Missing and imputed values. Some respondents do not provide complete information about marriage and cohabitation dates. In the creation of the event history arrays, these missing values are imputed so that an array can be constructed for each respondent. Imputed values are as follows:

    Type of Missing Data

    Imputed Value

    Missing start date for marriage or cohabitation: both month and year missing Month and year are imputed to one month after the date of the respondent's last interview
    Missing start date for marriage or cohabitation: year known, month missing January (of the known year) is assigned as the imputed start month of marriage/cohabitation
    Missing start date for separation or divorce: both month and year missing Month and year are imputed to the month of the current interview
    Missing start date for separation or divorce: year known, month missing December (of the known year) is assigned as the imputed date of separation/divorce
    Missing end date for cohabitation: both month and year missing Month and year are imputed to one month before the month of the current interview
    Missing end date for cohabitation: year known, month missing December (of the known year) is assigned as the imputed date of the end of cohabitation
  2. MAR_COHABITATION
    This second array details the partner that the respondent is living with in a particular month. For example, if the respondent is cohabiting, the variable for each month identifies whether the respondent lives with partner 1, partner 2, spouse 1, spouse 2, etc. Users should note that "1" and "2" in this case refer to the respondent's partners/spouses in chronological order. The numbers do not necessarily refer to the same person as the loop numbers in the spouse/partner questions asked directly of the respondent during the survey. Users can distinguish between partners and spouses because partner IDs begin with "1" (e.g., 101, 102) and spouse IDs begin with "2" (e.g., 201, 202). Using the partner ID, we also can count the total number of the cohabitation and total number of the marriage; for example, a code of 106 indicates the respondent has had a total of six partners from round one to current round. A code of 203 means the respondent has been married three times from round one till the current round.

    Note that some respondents are married but are not living with their spouse. These respondents are coded as "married, spouse absent" in the created marital status variable (CV_MARSTAT), and in this array they will have a -4 (valid skip) rather than a partner ID. Additionally, a few respondents are married but cohabiting with someone other than their spouse. These respondents are coded as "married, spouse absent" in the created marital status variable (CV_MARSTAT), and in this array they will have the ID of the partner they are cohabiting with (not the ID of the spouse).

  3. MAR_PARTNER_LINK
    The third array links the cohabiting partner or spouse to the partner order in the main survey questions. This array allows the researcher to identify characteristics of the respondent's partner and to link them with spells of marriage or cohabitation. For example, a researcher might look at the MAR_COHABITATION variable for the 9th month of 1998 and determine that a respondent was living with his second partner ever in that month because the variable's value is 102. The MAR_PARTNER_LINK variable provides a crosswalk between this value and the new partner ID variable on the partner roster (PARTNERS_ID ). The researcher can then examine the roster and survey variables for that partner to determine the person's characteristics, such as race, ethnicity, age, religion, and so on.

    User note about partner rosters

    Researchers should be aware that the partner rosters were created for all rounds and released as part of the round 5 data set. Consequently, the partner link variable in the event history data (MAR_PARTNER_LINK) now uses those new IDs (PARTNERS_ID). This results in the ability to better link partners across rounds, so cohabitation (MAR_COHABITATION) and marital status arrays (MAR_STATUS) were updated for the round 5 event history release. These changes, combined with careful cleaning of the data, minimized the possibility that one spouse/partner is incorrectly recorded as a second spouse/partner due to the respondent reporting the same information in more than one interview. As a result, it is less likely that over counting of the total number of marriages and spells of cohabitation (MAR COHABITATION) will occur. The changes also reduced the number of dual partners reported (MAR_DUAL).

    Other information about marital status event history arrays

    MAR_DUAL. Rounds 1 and 2 contain a fourth monthly array. If there is an overlap of partners (e.g., partner 1 leaves at the beginning of the month and partner 2 moves in at the end of the month), this array records the presence of the new partner. The format of these variables is the same as the MAR_COHABITATION variables. Beginning in round 3, this array was changed to one single variable. Because this is a relatively rare event, there is only one variable per round that indicates whether there was any month with an overlap period between the current interview and the previous interview.

    Denial of previous data. Occasionally, respondents report that the marital status information from a previous round is not true. As is the case with employment, the information in arrays based on that rounds' information is maintained, but the respondent is assigned a value in a flag variable (MAR_DENY) indicating later denial of the information. This flag variable has several different possible values, depending on the type of information denied. For example, assume a respondent reported cohabiting at the round 2 interview date but denied that the cohabitation had occurred in round 3. The status variables for each month from the beginning of the cohabitation to the round 2 interview date would continue to reflect the cohabitation, but the MAR_DENY variable would have a value of 3, indicating that the cohabitation was later denied. Starting from round 9, MAR_DENY is no longer available.

Program participation arrays are constructed individually for four programs--Unemployment Insurance, AFDC, Food Stamps, and WIC. These arrays were also constructed for Worker's Compensation for rounds 1 to 3. The AFDC array includes all federal and state programs created under Temporary Assistance to Needy Families (TANF) or any government program for needy families that replaces AFDC. All other programs (e.g., LIHEAP, SSI, other) are combined into one set of questions in the survey and are presented in a sixth array entitled 'Other.' For each program type, except Unemployment Insurance, three arrays are created. All program participation arrays provide information starting in the month that the respondent turned 14. The Unemployment Insurance arrays end in the month that the respondent was last interviewed, while the other arrays end by September 2009.

A secondary set of variables translates the reported beginning and ending dates (month and year) of a spell within the program into the continuous month scheme (e.g., AFDC_START_MONTH and AFDC_STOP_MONTH). More information about the continuous month scheme is provided in Appendix 7: Continuous Month Scheme and Crosswalk.

  1. STATUS
    The main array, (e.g., AFDC_STATUS), presents the status--receiving or not--of a respondent during each month. When asked for the start or stop date of a spell, the respondent could answer 'don't know' or 'refuse' to any component. In this case, the respondent was then asked how many weeks the spell lasted. The number of reported weeks was then divided by 4.3 to determine the equivalent number of months. If a fraction of a month was reported, then the entire month was counted as a month receiving benefits. Using a combination of start date, stop date, and week information, each spell was defined and a value of '1' inserted into the status array to indicate months of receipt. The months that a respondent did not receive that benefit, but was eligible to receive it, have a value of '0.' An edit variable (e.g., AFDC_EDIT_DATE) flags respondent-reported and imputed dates. The process by which imputed dates and the corresponding edit flag were assigned is described below:

    Flag

    Definition

    Edit Flag=1: Respondent reported participation dates Respondent reported a complete start and stop date and is not currently receiving. If the respondent reports still receiving at the time of the interview, the interview date is assigned as the temporary stop date. In the next survey round, the respondent will be asked if he or she is still receiving; if not, a permanent stop date equivalent to the previous round's interview date will be assigned. If the respondent reports receiving, participation will continue in filling the array.
    Edit Flag=2: Start month imputed

    Total weeks known: If the respondent reports not currently receiving, then set the month equal to January and count forward by the number of weeks to imply a stop date. If currently receiving, then count back by the number of weeks from the interview date to impute a start month. If the month indicated by the count falls short of the start year, the start month is December of the start year. If the month occurs in the year before the reported start year, then the start month is January of the start year.

    Total weeks unknown: If the respondent reports not currently receiving, then the start month is set to January. Use December as the stop month and the start year as the stop year. If the respondent reports currently receiving, use December as the start month.

    Edit Flag=3: Start month and year imputed

    Total weeks known: Count back by the number of weeks from the interview date if currently receiving. If not currently receiving, then count back from interview date to find the most recent year the respondent could have begun receiving and call the start date January of that year; then count forward the number of weeks from that date to imply a stop date.

    Total weeks unknown: If currently receiving or the stop date is reported, begin the spell at the respondent's 14th birthday (in round 1) or the last interview month (in later rounds).

    Edit Flag=4: Stop month imputed

    Total weeks known: If not currently receiving, then count forward from start date. If the month indicated falls short of the stop year, then use January of the stop year as the stop month; if the number of months exceeds the stop year, then set the stop month to December of the stop year. If the stop year is equal to the interview year and the stop month exceeds the interview month, then stop at the interview date.

    Total weeks unknown: Use December of stop year or the interview month, whichever comes earlier, for the stop month.

    Edit Flag=5: Stop month and year imputed

    Total weeks known: If not still receiving, count forward from the start date.

    Total weeks unknown: If not currently receiving, then use December of the start year as the stop month and the start year as the stop year.

    Edit Flag=6: Start and stop dates imputed Total weeks unknown: The imputed dates are based on the previous interview's date (start date) to the current interview date (stop date); in round 1, the last interview date is the respondent's 14th birth month and year.
    Edit Flag=7: Start and stop dates complete but gap information missing No information was collected about gaps in receipt.

    Starting in survey year 2009, we have added values 8, 9, 10, 11, 12, 13, and 14 to Edit Flag. Their definitions are similar to those of Edit Flag=1, 2, 3, 4, 5, 6, and 7, respectively, except the stop dates are truncated to September 2009.

  2. AMOUNT RECEIVED
    If a respondent reports receiving in a particular month, a second array presents the amount received in each month (e.g., AFDC_AMT). The dollar values asked about during the interview were meant to be monthly values. However, some responses were higher than the federal or state limits on the amount received from a particular benefit. A likely reason is that the respondent mistakenly reported a total value rather than a monthly value. Values determined to be too high were divided by the number of months the respondent reported receiving the benefit. These values were used in the AMT arrays instead. A second set of edit variables (e.g., AFDC_EDIT_AMT) flags these values for a particular spell.
  3. HOUSEHOLD MEMBERS RECEIVING
    If a respondent reports receiving in a particular month, the persons in the household who benefit from the program in each month (e.g., respondent only, child only, respondent and child) are recorded in a third array (e.g., AFDC_HH). This program condenses the set of answers from the question in the survey that collects this information; for example, see YPRG-35920_UPD.01~000001 to YPRG-35920_UPD.01~000005 for AFDC. Users should note that this array is not present for Worker's Compensation and Unemployment Insurance because these programs are collected for the respondent only.

    Other information about program participation event history arrays

    A few respondents report receiving assistance but then deny that receipt in a later interview. These situations are treated in the same way as in the marriage arrays, as described above. The denial flag variables for the program participation arrays incorporate the name of the program (e.g., AFDC_DENY).

    Researchers should be aware of an important source of variability in the Worker's Compensation data. These data suggest that some respondents report the dates the payment was actually received and some report the period of time to which the payment applied. For example, if a respondent was out of work for six months but received a lump sum payment a year later, he or she might report either the date the lump sum was paid or the dates he or she was unable to work.

There are three sets of schooling event history arrays; monthly grade school histories, yearly grade school histories and monthly college event histories.  Together, these three sets of information provide researchers with a complete overview of a respondent's education. Grade school histories which cover kindergarten through 12th grade are only available from rounds 2 to 12. Grade school histories were discontinued after round 12 because by this round the youngest NLSY97 respondent was in their mid-twenties and almost none were in school or providing information about their grade school activities. College event histories start in round 2 and at present are ongoing.

The grade school education arrays are somewhat different than the other event history arrays. Information on a respondent's education is reported in both yearly and monthly variables. This approach is used to combine information from the youth questionnaire, which collects more detailed data, and from the round 1 parent questionnaire, which presented information only for each year. Users should be aware that, because questions were not identical in the round 1 parent questionnaire and the round 2 youth questionnaire, the transition between the two data sources was not seamless and some information for the yearly variables had to be imputed. If they feel that a given value is questionable, researchers may wish to compare created yearly variables to the raw data and to the monthly schooling arrays described below.

Yearly grade schooling variables

A set of grade school variables provides information for each year beginning in 1980, the year when the first information is available in the survey, through round 12. In general, these variables refer to the school year rather than the calendar year. That is, 1991 in a variable title or in the data for a variable generally indicates the school year starting in fall 1991 and ending in spring 1992.

  1. SCH_YEAR_to_GRADE
    This array presents the grade the respondent attended during the school year. The last four digits of the question name indicate the school year. For example, SCH_YEAR_to_GRADE.1990 refers to the grade attended by the respondent during the school year that starts in fall 1990 and ends in spring 1991.
  2. SCH_GRADE_to_YEAR
    This array refers to the year the respondent attended a certain grade. For example, if the respondent attended second grade in 1992-93, then SCH_GRADE_to_YEAR.2 would have the value 1992.
  3. SCH_CHANGES
    This array counts the number of times the respondent changed the school attended during the school year. For example, SCH_CHANGES.1990 shows how many different schools the respondent attended during the school year that started in fall 1990 and ended in spring 1991.
  4. SCH_MNTHS_MISSED
    This array presents the number of months during the school year that the respondent did not attend school. For example, if SCH_MNTHS_MISSED.1990 has a value of 3 for a respondent, then that respondent had a gap in attendance of three months during the school year that started in the fall of 1990 and ended in the spring of 1991. A gap is defined as missing school for one or more months (not including summer vacation); gaps do not have to be consecutive.
  5. SCH_SUMMER_SCHOOL
    This array refers to extra school classes during an educational break in a given school year, such as summer school. For example, SCH_SUMMER_SCHOOL.1990 shows whether the respondent attended school during a break in the 1990-91 school year.
  6. SCH_SUSPENSIONS
    This array counts the number of days during the school year the respondent was suspended from school. For example, if SCH_SUSPENSIONS.1990 has a value of 3 then the respondent was suspended from school 3 days during the school year that started in fall 1990 and ended in spring 1991.
  7. SCH_GRADE_PROGRESS
    This array has positive values if there are any special events that occurred during the school grade. For example, a positive value in SCH_GRADE_PROGRESS.2 indicates that the respondent was skipped or demoted during second grade. Researchers should note that parents might have been confused as to how to answer the skip grade questions asked during the interview. For example, there are parents who say their child skipped from 5th to 6th grade, while others say from 4th to 6th grades. Both of these cases are probably stating that the child missed most or all of the 5th grade. To resolve this ambiguity, the code states that if a child is skipped consecutive years then the first year (i.e. 5th grade) was missed. If a parent reports non-consecutive years (i.e. 4th to 6th) then the program assumes the year(s) in the middle are the ones not attended.
  8. SCH_YEAR_PROGRESS
    This array refers to any special events that occurred during the school year. The question name's last four digits indicate the school year this variable refers to. For example, SCH_YEAR_PROGRESS.1990 shows special events that occurred during the school year that starts in fall 1990 and ends in spring 1991. The special events, such as grades skipped or demoted to, are defined in the same way as in the previous array.

User note about education variables

As discussed in the Educational Status and Attainment section, there are a number of apparent inconsistencies in the raw survey data with respect to grade progression. Through a data quality review after round 6, survey staff determined that the complexity of the survey questions, coupled with problems in the way the data were interpreted during the programming of the event history arrays, led to a significant number of spurious repeated and skipped grades. For example, because of errors in reporting or programming, it may appear that a respondent completed 10th grade twice and then jumped ahead to 12th grade when in fact the respondent had a normal progression through the grades. The following paragraphs detail the six main problems found in the data and the steps taken to correct them.

  1. Survey staff reviewed the grade reported in the initial 1997 survey and the date of high school graduation. While the detailed school enrollment loops ask for information that individuals may not always report correctly, the date of graduation from high school is a salient event that respondents should report correctly with a high degree of accuracy. Using this information, survey staff identified all respondents who moved from the grade reported in 1997 to high school graduation in the expected amount of time. If a respondent's graduation date indicates that the respondent should have a normal school progression--completed one grade per school year--the event history program flagged the respondent and imposed a normal progression on the event history variables.
  2. A number of respondents enroll in college courses while they are still in high school. Event history arrays only contain a single grade attended for a given time period, and the original event history program was written so that college courses were given precedence over high school. For example, if an 11th-grader also took a freshman-level college class during first semester, the program assigned a grade of "13" (first year in college) for that semester. If the student then finished 11th grade but did not take any college classes during second semester, it would appear in the data that the student jumped ahead to year 13 of schooling and then back to 11th grade during the course of a single year. This resulted in a number of extra promotions and regressions. Consequently, the event history program has been rewritten to prioritize high school over college, removing these spurious grade changes.
  3. Some respondents provided a high school graduation date but then reported additional secondary school enrollment after that date. Survey staff decided to exclude post-graduation secondary school enrollment from the event histories, although this information is preserved in the raw data for researchers who might be interested in the additional training received by respondents after graduation.
  4. While answering the schooling questions, some respondents reported initial enrollment at a school but apparently did not understand that they should report each grade attended at that school in a separate loop within the schooling section. This resulted in some respondents appearing to remain in one grade for a long period of time, particularly if they had missed one or more interviews, and then apparently jumping ahead several grades. If, for example, a respondent appeared to be in 9th grade for 3 years and then jump ahead to 12th grade, the most likely reason is that he or she did not understand the schooling questions and actually did progress normally through 10th and 11th grade. The event history program now flags these respondents and adjusts their schooling history to follow a normal grade progression.
  5. In a number of cases, respondents appear to jump backward and then forward across multiple grades. For example, some respondents were listed as attending 9th grade, then 1st grade, then 11th grade. The most likely explanation for this pattern is a data entry error where the interviewer accidentally dropped the zero from 10th grade. Jumps in a normal school progression which appear to be caused by a missing digit in a two-digit grade were corrected.
  6. Finally, data review of individual cases indicates that, when asked what grade they had first attended at a given school, some respondents reported instead the first grade offered at that school. As with the problem in the previous paragraph, this causes respondents to appear to jump backwards across a number of grades and then jump forward again the next year. Hand edits were made to adjust the event histories for these respondents to a normal grade progression.

The six changes described above significantly reduced the number of abnormal grade progressions found in the event history SCH_GRADE_PROGRESS variables. About 3/4 of the promotions and demotions found in the raw survey data for rounds 1-6 appear to be the result of reporting or programming errors. After the corrections were implemented, about 100 demotions and 570 promotions remained. Although it is possible that errors remain, based on inspection of the data survey staff feel that the vast majority of these grade changes reflect actual atypical progressions. Additional information about younger respondents' schooling continues to be collected, and staff will continue to review the data to determine whether newer information indicates that any of the remaining promotions or demotions are artifacts of inaccurate reporting.

Monthly grade schooling variables

Starting in round 2, three types of monthly arrays are created. Each array captures information for each month from the respondent's interview date in round 1 to the round 12 interview date.

  1. SCH_STATUS
    This array reports the respondent's enrollment status during each month from the round 1 interview date through the current interview date. Coding categories include unknown, not enrolled, in grades K to 12, on vacation, expelled, and other.
  2. SCH_TERM
    These variables report the respondent's school type and grade for each month in the time period. The first two digits represent the type of school (public = 10, private = 20, religious = 30 and unknown = 40). The last two digits provide the respondent's grade in school (1-12).
  3. SCH_ID
    This variable permits users to link array information to the school roster in the main data file and access other information about the school. The variable uses the same ID codes as the identification variable on the school roster in the main data set (for example, NEWSCHOOL_PUBID.01).

Monthly college schooling variables

Starting in round 2, four types of monthly arrays are created. Each array captures information for each month from the respondent's interview date in round 2 to present.

  1. SCH_COLLEGE_STATUS
    This array reports the respondent's enrollment status during each month from the round 2 interview date through the current interview date. Coding categories include unknown, not enrolled, in a two year college, in a four year college and in graduate school.
  2. SCH_COLLEGE_TERM
    These variables report the respondent's school type and grade for each month in the time period. The first two digits represent the type of school (public = 10, private = 20 and unknown = 40). The last two digits provide the respondent's term in college (1-98; 99 means no term information provided).
  3. 3. SCH_COLLEGE_ID
    This variable permits users to link array information to the school roster in the main data file and access other information about the school. The variable uses the same ID codes as the identification variable on the school roster in the main data set (for example, NEWSCHOOL_PUBID.01).
  4. SCH_COLLEGE_DEGREE
    This variable shows what type of degree the respondent is trying to obtain.  The first two digits track if the respondent is going to college full-time (code = 1), part-time (code = 2) or their status is unknown (code = 3).  The last two digits provide the type of degree (1 = Associates; 3 = BA or BS; 4 = MA, MBA, MS; 5 = Ph.D.; 6 = MD, JD; 10 = Joint BA/MA; 40 = Unknown).

Open the College Schooling Event History program file

There are two sets of variables related to respondent arrests and incarcerations. These event history arrays consist of monthly variables that document the number of arrests and incarcerations in each month starting at the respondent's 12th birthday. Using these arrays, researchers can extract the status of a respondent at a point in time or over time.

Please note: The variable INCARC_INCOMPLETE (Title: INCOMPLETE INCARCERATION HISTORY) has been created to indicate whether the incarceration event histories are affected by a round 7 questionnaire design change. For more information, see the codebook information for this variable in NLS Investigator.

ARREST EVENT HISTORY ARRAY: ARREST_STATUS_year.month

This array lists the respondent's number of arrests on a monthly basis. It starts in January 1992 (the month in which the oldest respondent turned age 12) and ends with the most recent publicly available interview date. Arrests and incarcerations that occurred before the respondent turned 12 are not be included in this array. The codes and their definitions are as follows:

Code

Definition

-4

Assigned if R is younger than 12 years old or has not been interviewed about this month

0

Assigned if R was not arrested in this month and previously was not arrested

1-98

Indicates the number of times R was arrested in this month

99

Assigned if R had been arrested previously but was not arrested in this month

Missing and imputed values Occasionally, respondents cannot provide information about arrest dates and the number of arrests. In early rounds of the survey, if respondents cannot provide the arrest date (both month and year) or the year of the arrest, the arrest is not populated in the arrest event history array. However, dates have been imputed for skipped arrests from round 7 onwards. In the main questionnaire, for respondents who reported 4 or more arrests since date of last interview, only the first and last arrests were dated. For the arrest arrays, the middle arrest dates were imputed as being evenly spaced between the first and last arrest dates. Where first or last arrest year or month was missing, it was imputed on a case-by-case basis based on last interview date and known arrest date information.

If respondents cannot provide only the arrest month, then it is imputed using the month of the middle of the period since the last interview date. For example, if a respondent was interviewed in round 6 and in round 8, but not in round 7, the program will take the month from the mid-date between the round 6 interview date and the round 8 interview date. If respondents cannot provide arrest month in round 1, the missing month is set to June.

Arrest summary file: An arrest summary data set was also created to provide summary measures of the respondent's arrest event history. For each respondent, this data includes the first arrest date reported, the total number of arrests (both dated and undated) and the number of arrests with missing year, month or both month and year.

Variable

Definition

ARREST_FIRST Earliest arrest date as reported by R. If R did not provide an arrest date but was arrested, this is set to "-3."
ARREST_TOTNUM Total number of arrests as reported by R
ARREST_MISSNUM Total number of rounds (question years) that R refused to answer the question on number of arrests since the date of last interview
ARREST_DATED Total number of arrests with arrests dates (including missing months). This should equal the number of arrests in ARREST_STATUS array.
ARREST_UNDATED MISSINGYR + MISSINGDT
ARREST_MISSINGDT Number of arrest dates with missing month and missing year
ARREST_MISSINGYR Number of arrest dates with missing year
ARREST_MISSINGMON Number of arrest dates with missing month
ARREST_UNASKED Number of arrests where arrest date was not asked
ARREST_LASTINTDATE Date of last interview with

Open the Arrest Event History program file

INCARCERATION HISTORY ARRAY: INCARC_STATUS_year.month

This array lists the respondent's incarceration status on a monthly basis. It starts in January 1992 and ends with the most recent publicly available interview date. Once again, note that incarcerations that occurred before the respondent turned 12 will not be included in this array. Incarceration refers to jail or adult correctional facilities. Juvenile detention centers are not included. The codes and their definitions are as follows:

Code

Definition

-4

Assigned if R is younger than 12 years old or has not been interviewed about this month

0

Assigned if R was not incarcerated this month and previously never incarcerated

1

Indicates R was incarcerated during all or some portion of this month

99

Indicates R was not incarcerated this month but has previously been incarcerated

Missing and Imputed Values. Occasionally, respondents cannot provide information about entry and exit months/years for incarcerations. Where these were missing, they were imputed using known/given prior and future arrest date information as well as prior and future interview dates. For example, if a respondent indicated being actively incarcerated in one round’s interview and then not being incarcerated in the next interview, with no given exit date, the exit date was given as the mid-month between the interviews. If, instead, the respondent had a listed re-arrest date that was earlier than the next interview date, the mid-month between re-arrest and when the respondent was last known to be incarcerated (the prior interview date) was used. 

Incarceration dates are only given in month/year. Hence, a one day and a 30-day incarceration could both be marked as extending from the same month to the same month, if the 30-day incarceration started on the 1st and the respondent was released before the next month. For the summary values, any incarceration starting and ending in the same month is counted as a one-month incarceration. All longer incarcerations are equally inclusive, counting both the entry month and exit month as full months. These could be almost an entire month shorter depending on exact entry/exit dates.

Incarceration summary file: An incarceration summary file provides summary measures of the respondent's incarceration history. The variables in this data file for each respondent include the date of last interview with the respondent, the first entry date into incarceration, the total number of separate incarceration spells, the age at first incarceration, the length of the first incarceration and longest incarceration, as well as whether the respondent was currently incarcerated at the date of the last interview.

Variable

Definition

INCARC_FIRST Earliest entry date into incarceration as reported by R 
INCARC_TOTNUM Total number of separate incarcerations reported by R
INCARC_AGE_FIRST Age of R when first incarcerated
INCARC _LENGTH_FIRST Months R was incarcerated the first time incarcerated
INCARC_LENGTH_LONGEST Months R was incarcerated during R's longest incarceration
INCARC_TOTMONTHS Total months R has spent incarcerated
INCARC_CURRENT Yes if R currently incarcerated at date of last interview

Open the Incarceration Event History program file