Item Nonresponse & Interview Timings

National Longitudinal Survey of Youth - 1997 Cohort

The codebook provides information on item nonresponse, that is, which questions respondents declined to answer, answered as "don't know," or were skipped through. Also available in the database are timing variables that provide information on how long it took respondents to complete the total interview and to complete individual sections.

Item Nonresponse

Missing data, or nonresponse, occurs for a number of reasons in the NLSY97 survey. First, a number of respondents may not participate at all that survey year, causing all information for those respondents in that particular survey year to be missing. (Note: data that are missing because of a non-interview situation are coded with a -5). The created variable "Reason for Noninterview" (RNI) is available in each survey round and provides counts for the different reasons (unable to be located, refusal, deceased, etc.) a respondent is not interviewed. The extent of non-participation in each survey round is illustrated in Retention & Reasons for Noninterview.

A second reason missing data occurs is that respondents do not provide a valid answer to a question. When this happens, interviewers make a determination about whether to mark the answer as a 'refusal' or a 'don't know' value. Interviewers are trained to distinguish between refusal and don't know responses.  For example, a refusal usually stems from such respondent comments as "That's none of your business," "I don't want to say," "I'm not comfortable telling you that," or "I don't want to answer." A 'don't know' response is coded from respondents comments such as, "I have no idea," "I don't know how I could guess," "I wouldn't know," or "I'm not sure how to answer that." Standard interviewing protocol calls for interviewers to try to convert an item non-response either by allaying the concerns underlying a refusal (for example, by assuring privacy or citing the research reasons for a particular questionnaire item) or by providing cognitive aids to the respondent who "doesn't know" (for example, asking "Do you remember what season it was?" or "Do you have a guess what the range might be?).  Only if conversion attempts are ineffective do interviewers record a 'refusal' or 'don't know' response.

A valid skip is another reason for missing data. Respondents do not answer every question of the survey. For instance, some questions might apply to only females or a certain age range. Users should trace back skip patterns to determine whether a respondent was skipped out because a given topic was inapplicable to him/her or because the respondent answered similar questions along a different path. Survey questions not on a path answered by the respondent are coded with a -4.

Missing data can also occur when there is an incorrect flow in the survey instrument. Incorrect flows may result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked. NLS data archivists have removed from the data most of the extraneous question responses. While extra information can be removed, missing data is not imputed in the NLSY97 surveys. Missing data caused by this reason is flagged with a special 'invalid skip' code. The use of CAPI for surveys reduces the number of invalid skips in complex questionnaires; nevertheless, some invalid skips are still possible in CAPI data. When these errors are found, the CAPI survey can be corrected in the field to prevent further invalid skips, but the missing data from already completed cases are not retrieved.

All missing data are clearly flagged in the NLSY97 data set with five negative values: (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview. In general, these five negative values are reserved as missing value flags. As an example, Figure 1 shows the item, "How is R's general health?" Within the item codeblock, the user can see that 7,494 respondents in 2004 gave responses ranging from "excellent" to "poor," three people refused to answer (-1), four people reportedly did not know (-2), one person was not asked the question and was thus a valid skip (-4), and 1,482 people were not interviewed that survey year (-5).  In this example, there are no invalid skips (-3).

Figure 1. NLSY97 Questionnaire Item Codeblock with Nonresponse Highlighted

              S49195.00   [YHEA-100]                             Survey Year: 2004                           

                PRIMARY VARIABLE
                             HOW IS R'S GENERAL HEALTH?
             Now I would like to ask you some questions about your health.
             In general, how is your health?
                2237       1 Excellent
                2709       2 Very good
                1988       3 Good
                 508       4 Fair
                  42       5 Poor
             Refusal(-1)            3
             Don't Know(-2)         4
             TOTAL =========>     7501   VALID SKIP(-4) 1   NON-INTERVIEW(-5)  1482


Lead In:  S49194.00 [Default]    S48965.00 [1:1]

Default Next Question:  S49196.00


As would be expected, more sensitive questions in the survey tend to yield a higher amount of missing data in the "refused" categories. 

To improve accuracy of reporting, many of the more sensitive questions are found in the self-administered questionnaire (SAQ) portion of the survey, which in-person respondents answer privately using a laptop. (Note: If the survey is done by phone, the SAQ section is not self administered and must be administered verbally by the interviewer).

Rounds 9 and up also include the following response variables (question prefix changes with round number):

  • R9_RESPONSES. The total number of responses provided by the respondent during the round 9 interview.
  • R9_PCT_DK. The percentage of all the respondent's responses to round 9 interview questions that were "Don't Know (-2)."
  • R9_PCT_REF. The percentage of all the respondent's responses to round 9 interview questions that were "Refused (-1)."

Interview Timings

Starting in round 5, timing variables are available that provide the total time taken to conduct the interview (in seconds) and section timings for each section (Household, Schooling, Employment, Health, etc.) of the survey.

The total time taken for the interview (see R16_TIM_INTVW in round 16) is the total interview time taken in seconds, excluding the time taken for the locator section and interviewer remarks. Round 7 timings are available only on the geocode CD.

Starting in round 8, timings for subsections (migration, household roster, schooling attainment, etc.) also became available.

Timing data can be found under the "Timing" Area of Interest in the NLS Investigator.