Appendix 11: Collection of the Transcript Data

National Longitudinal Survey of Youth - 1997 Cohort

Appendix 11: Collection of the Transcript Data

To complement data on respondents' educational experiences collected during the yearly interviews, NLSY97 staff collected transcripts directly from respondents' high schools once the youths graduated or left school. Once the transcripts were received from the schools, survey staff coded the transcript record into a standard format. The resulting created variables comprise a history of the respondent's terms in school, courses taken, and other academic indicators. This appendix describes the survey materials used during data collection and explains the procedures and criteria for data entry and coding. It also lists specific details about individual Transcript Survey variables.

Transcript Survey Data Collection

Conducted in 1999-2000, Wave 1 of the NLSY97 Transcript Survey sought hard copy transcripts from 1,622 NLSY97 respondents who had provided signed authorization for transcript collection, and who were no longer enrolled in high school in spring 2000. Non-enrollment occurred when the youth either graduated from high school or dropped out of school and was at least 18 years old. From Wave I, coded transcript data are available for 1,417 respondents.

To complete the Transcript Survey effort, a second and final wave of the NLSY97 Transcript Survey requested hard copy transcripts from 5,701 eligible NLSY97 respondents. Youth respondents eligible for the Wave 2 Transcript Survey had a signed Permission to Contact School form on file, a known high school reported during a previous interview, and did not have a transcript collected during Wave I. The vast majority of NLSY97 respondents finished their high school careers by the end of the 2004 academic year, resulting in complete transcript records submitted from the schools. Transcript data was collected and coded for 4,815 respondents during Wave 2. Transcript data combined from both waves are available for 6,232 respondents.

User Note

All transcript variables are listed as round 3 variables in the dataset. These variables are associated with round 3 because that was the timing of the first wave of transcript data collection.

NORC mailed a transcript request packet to each school from which an NLSY97 youth received his or her high school diploma, or to the last school the youth reported attending in the Youth interview. The packet contained informational materials about the NLSY97 and a pamphlet describing the NLSY97 Transcript Survey. In addition, packets included the following items:

  1. a cover letter addressed to the school principal
  2. a one-page cover sheet questionnaire collecting school-specific grading and transcript policies
  3. a Student Request list identifying the sampled students in the school
  4. the signed permission forms for these students

These documents are available in PDF form at the links below:

Collection of the Transcript Data, Wave 1 - example documents
Collection of the Transcript Data, Wave 2 - example documents

Creation of the Transcript Data File

Organization of the data. There are several different types of variables in the transcript data file. First, at the school level the variable TRANS_SCH_CAT reports whether a course catalog was received from the school to aid in coding. The highest number of schools reported for any respondent is 12, so this variable is repeated 12 times. This course catalog variable also functions as the identification number of the school. During the data entry process described below, each school attended by a respondent was assigned a unique sequence ID number between 1 and 12, with the school that provided the transcript always listed as school #01. These numbers were used in variables that report which school the respondent attended in each term-for example, if a respondent has a value of 4 for term 1, then he or she attended school #04 in the course catalog variables. This school ID number does not link to any variables in the main data file.

Second, the transcript file includes information about the respondents that is not associated with a specific term or course. For example, these variables present test scores on a variety of achievement tests (ACT, PSAT, SAT, SAT II, AP), information on absences and tardies, the student's school completion status, and dates of enrollment. Variables also indicate whether the respondent participated in programs such as gifted, bilingual, or special education.

A number of variables refer to the respondent's terms of enrollment. For up to 28 terms, these items report the beginning and ending dates of the term, the way in which the school year is divided (such as a season, entire year, or another term based system), the academic year of the term, the respondent's grade level that term, and the number of credits earned. A variable listing the school the respondent attended during that term can be linked to the course catalog variable as described above.

Finally, the transcript file provides details about each course appearing on a student's high school transcript. Course-specific variables include the course code from the Revised Secondary School Taxonomy (SST-R), the grade earned in the course, and the credit value of the course. Because schools use many different grading systems, the course grades were converted into a standard scale that can be compared across respondents. A series of variables called "Recoding Status of Grade" indicates how the grade earned variable for each course was created. This process is described in more detail below.

Data entry and processing procedures. The transcript data capture process involved several distinct data entry steps, tailored to the structure of the data, the cleaning and reconciliation needs for the relevant variables, and scheduling requirements of the data collection process. The basic data entry and processing steps utilized during Wave 1 were:

  1. Entry of course-level data into an Access data capture system from high school transcripts
  2. Coding of entered course-level data using Access coding system
  3. Entry of student-level data from Student Request List and high school transcripts into NORC's SurveyCraft Computer-Assisted Data Entry (CADE) system
  4. Entry and coding of transfer school information from Student Request List, high school transcripts, and NLSY97 youth interview data using Access and SAS programs
  5. Entry of school-level data from one-page Transcript Cover Sheet into SurveyCraft CADE system
  6. Assigning course grades to a uniform grade scale using SAS transformations.

Each data entry and processing step is described in greater detail below. Enhancements to transcript processing developed specifically for the Wave 2 effort are noted following each section.

Wave 2 data entry and processing enhancements. Building on the Wave 1 model, the transcript data entry and processing steps were revised, improving the efficiency of the process and enhancing the quality of the data. The revised process included adding an edit and retrieval task at the beginning, streamlining the data entry instruments for a one-time, comprehensive entry task, utilizing an improved coding system separate from the data entry instrument, and building an auto-coding program.

Wave 2 transcript editing process. Due to the wide variation in the layout of high school transcript records, an editing and review task was implemented prior to data entry. Editing provided the first level of standardization of each transcript in preparation for data entry and also allowed clerks to identify problematic transcripts requiring a retrieval contact with the school. Editor staff identified key student level data elements, counted the number of transfer schools reported and sequenced term and course data as it appeared on the transcript. Editors highlighted terms and dates on the transcript, which created a series of reference points to maintain the sequence of courses and terms during data entry. Editor staff also reviewed the transcript for problematic or missing course and term data. If a potential data entry or coding problem existed, a retrieval form was completed and reviewed by a supervisor to determine whether a call to the school was necessary for further clarification.

Wave 1 course-level data entry. Course-level data include the course title, course number (assigned by school), grade earned, credits earned, and honors designation. For matching purposes, the school ID was assigned and term dates were captured during this phase of data entry. Entry was performed using an MSAccess data-capture system. All courses were independently entered twice. Where entry and re-entry matched perfectly, no further quality control was performed. If one or more discrepancies were found electronically between the entry and re-entry, a supervisor adjudicated the two data-entered versions with the original hard copy transcript to determine the accurate values. Courses were entered in the order that they appeared on the transcript. This order varied from school to school, with systems including chronologically, alphabetically by course title, numerically by course number, etc.

Wave 1 data entry of student-specific data down to the term level. All other student-specific data were captured in a SurveyCraft instrument for computer-assisted data entry. These variables include the student's enrollment in gifted, special education, or bilingual programs, standardized test scores, dates of enrollment at the school, class rank and cumulative grade-point average, term-level information on beginning and ending dates of terms, absences and tardies, and credits earned by term. The SurveyCraft program generated a single record for each youth, containing up to 18 terms of study. Term date information was used to match term-level data with the school attended during that term. All transcripts from a school were data entered at the same time to exploit clerk familiarity with transcript formats and school-specific abbreviations. All transcripts were independently entered twice. Where entry and re-entry matched perfectly, no further quality control was performed. If one or more discrepancies were found electronically between the entry and re-entry, a supervisor adjudicated the two data-entered versions with the original hard copy transcript to determine the accurate values. Terms were entered in chronological order when such sequence could be determined.

Wave 2 data entry system. A more comprehensive SurveyCraft computer-assisted data entry system was constructed for the Wave 2 data processing effort. The updated instrument allowed clerks to key all contents of the transcript at one time, capturing student, school, term and course level data in a series of loops. The editing process allowed a standard transcript sequence to be followed during data entry. Course and term data were reported in a chronological sequence whenever possible. The consolidated CADE system eliminated the need to match course level and term level data from two different systems, allowed data entry to sequence terms in chronological order by school for each youth record, and added another level of quality control through double entry and adjudication of both the data entry and coded items. The same rules for adjudication used during Wave 1 data entry were also applied.

Wave 1 course coding. Course-level data were used for coding courses into the Revised Secondary School Taxonomy (SST-R), a hierarchical framework for high school course offerings. After all course-level data from a transcript had been entered, re-entered, and adjudicated, the transcript was available for course coding. To maximize coder familiarity with school naming and catalog conventions, all transcripts from a school were usually coded together. Coding of all courses was done independently by two coders. If the two codes were not equal, a supervisor adjudicated the discrepancy and assigned a final code. Because many schools did not submit course catalogs or had indecipherable course titles (e.g., Course 1), clerks called some schools directly for assistance in coding, speaking to administrative or instructional staff who were able to clarify course content. The coding process used a menu-driven MSAccess system, which exploited the hierarchical structure of the code frame and prevented coders from inadvertently entering invalid codes. All 'uncodable' courses were reviewed by the coding supervisor and project director where necessary.

Wave 2 course coding. The course coding process in Wave 2 utilized a similar menu driven MSAccess system. After transcript records were entered, re-entered and adjudicated, a flag was set in the data entry system. Flagged transcript records were extracted from the SurveyCraft data on a regular schedule and loaded by batches into the coding system. Within each batch, transcript records were grouped by school to allow clerks to maximize familiarity with school naming and catalog conventions. Along with course level data presented on the coding screen, key term level information, including dates, term season, and grade level were also presented, allowing the clerks to easily reference course titles in the transcript record and course catalog. Mirroring Wave I, each course was coded independently by two different coders, and any discrepancies between the two codes assigned were reviewed by a supervisor responsible for assigning the final code.

Wave 2 auto-coding program. Using course description and coding matches from the Wave 1 coding effort, a list of course descriptions with codes assigned was developed for an auto-coding program. This matching program was run before courses were loaded into the MSAccess coding system. Approximately 25% of all courses coded were completed by the auto-coding program. Project staff reviewed all auto-coded course descriptions and codes assigned for consistency and flagged any discrepancies for manual coding.

Transfer data. Transcripts often included information about courses attended at other institutions. These data could appear either as an original hard copy attachment to the sampled school's transcript or as additional lines on the sampled school's transcript. These terms and courses were data entered during the appropriate stage of data entry, with a designation that the term or course pertained to a transfer school. Course and term-specific information about transferred work was generally complete, but information about the school from which work was transferred was often inadequate for coding purposes. As described above, all terms attended at the same school are associated with the same school ID.

Wave 2 transfer data and sequence of schools and terms. Building on lessons learned during the Wave 1 transcript processing, special effort was made to preserve a chronological sequence within the transcript for course, term and school data reported. The sequence established during the edit and data entry processes was used to order the terms chronologically. When preparing the term level data, the term year and season were used to confirm the sequence. For a small group of cases, the term sequence was difficult to assign when the transcript record indicated attendance at one or more institutions during similar term years. In these instances, attempts to sequence terms were based on the time period reported on the hard copy transcript whenever possible.

School 01 is always associated with the primary school or the school submitting the transcript. For the Wave 2 data, transfer schools are numbered in reverse chronological order as they appear on the transcript, often beginning with the most recent transfer school event moving in reverse order to the earliest transfer school event. In most instances, the school first attended by the student on the transcript will have the highest school number in the SCH_CAT.xx series.

Missing course catalogs and the Internet. For Wave 2 processing, if a series of transfer schools was present for a student, the SCH_CAT.xx variable was set to "no" indicating the catalog was not received. While a catalog for that school may have been received during the data collection period, it may not have been accessible to coding staff during the course of the transcript data collection. When available, online course catalogs were useful in clarifying particular types of coursework reported at a given school and were utilized by supervisors during the adjudication process.

Coursework reported below grade 9. Most transcripts entered and coded span a typical high school career from grades 9 or 10 through 12. For some districts and states, the transcript record includes middle school or junior high coursework, usually taken during grades 7 and 8. Other high school transcripts also record equivalency or classroom coursework eligible for high school credit that was earned while the student was in grade 8 or below. While no effort was made to collect middle school or junior high level coursework for the NLSY97 Transcript Survey, courses taken at these grade levels were coded and have been made available when provided as part of the hard copy transcript record.

School data. The one-page Transcript Cover Sheet provided information for assigning course grades to a uniform grade scale. During Wave 1 transcript processing, these data were entered into a SurveyCraft data capture instrument, once for each school submitting valid transcripts. Ten percent of schools were re-entered, and a supervisor referred to the original hard-copy to adjudicate discrepancies.

Wave 2 Transcript Cover Sheet procedures: Since a small percentage of schools during the Wave 1 effort reported unique grading scales, a data entry system was not built for Wave 2. Rather, the grade scale data were captured by a data processing clerk inside a spreadsheet containing high and low equivalents for each letter grade. An entry was made for each school submitting valid transcripts and a completed Transcript Cover Sheet. A supervisor reviewed the contents of the spreadsheet to ensure accuracy. When discrepancies reported on the Transcript Cover Sheet were discovered, the school was contacted as part of the retrieval process for clarification. The final grade scale spreadsheet was used in the standardized course grade procedures noted below.

Course grades. High school transcripts included a variety of systems for course grades, including letter grades or numbers. For ease of comparison, these were standardized into a uniform grading system. The standardized grading scale for the resulting CRS_GRADE variable ranges from 01 to 20. Table 1 lists the corresponding letter grades for each of the CRS_GRADE values.

Table 1. Grading system for coded transcript variables

CRS_GRADE Corresponding letter grade   CRS_GRADE Corresponding letter grade
01 A+   11 D
02 A   12 D-
03 A-   13 F
04 B+   14 Pass, satisfactory or credit
05 B   15 Unsatisfactory or no credit
06 B-   16 Withdrew or dropped course
07 C+   17 Incomplete
08 C   18 Non-graded course or audit
09 C-   19 Blank, no grade provided
10 D+   20 Unrecodable grade

In addition to the standardized grade variable, survey staff created a variable for each course called CRS_GRADE_RECODE_STATUS. This variable provides information on how the CRS_GRADE variable was created from the information provided by the school. The values of the recoding status variable are listed in Table 2.


0 Directly recoded
1 Recoded using grade specifications of own school
2 Recoded using standard grade specifications
3 Uncodable grade

Each standardized grade was assigned using one of the following four methods:

  1. The transcript reported letter grades using the system in Table 1 above. All letter grades were directly assigned to the corresponding standardized grade in Table 1. Letters that could not be classified into one of the categories 1-19 were considered to be unrecodable and included in category 20. In the cases where the CRS_GRADE variable was recoded directly from the grade on the transcript, CRS_GRADE_RECODE_STATUS was assigned a value of 0.
  2. The school used numeric grades and provided grading specifications on the one-page Transcript Cover Sheet. For these respondents, numeric grades were converted to standardized grades using the grading specifications provided by the school. For example, if the numeric grade fell within the range for an 'A' as specified by that particular school, it was assigned to category 02. Fewer than 5% of schools provided multiple grading specifications; in all cases, the primary specifications were used. Due to the possibility of transcription errors, numeric grades below 15 were considered to be unrecodable when the minimum passing grade was higher than 15. For all cases where the CRS_GRADE variable was recoded from the transcript using the school's own grading specifications, CRS_GRADE_RECODE_STATUS was assigned a value of 1.
  3. The school used letter grades of a type different than those shown in Table 1. During Wave I, grades of 'G' were classified as 05, 'O' and 'E' as 02, and 'O+' and 'E+' as 01. CRS_GRADE_RECODE_STATUS was assigned a value of 2. During Wave 2 grade construction, a variation in the interpretation of the 'E' grade across schools was discovered. In these cases, school specific grade scales were consulted to properly classify "E' grades as 02, 13, 14, or 15. If the grades could not be recoded, then CRS_GRADE was assigned a value of 20 and CRS_GRADE_RECODE_STATUS was assigned to 3.
  4. The school used numeric grades and did not provide grading specifications. The means of the upper and lower limits of the grading systems across all schools were used to construct the standard grading system shown in Table 3. If the school did not specify its grading specifications, numeric grades (and numeric grades with a qualifier attached) were recoded based on this standard system. For Wave 2, the means of the upper and lower limits of the grading schools were recalculated using the grading systems received from all Wave 2 schools, as a check in the possibility of fluctuation in school grading systems. A different set of limits was developed and can be found in Table 3 below.

Once again, to take into account the possibility of transcription errors, numeric grades below 15 were considered to be unrecodable. CRS_GRADE_RECODE_STATUS was assigned a value of 2 when recoding was done using the standard grade specifications. If the grades could not be recoded, then CRS_GRADE was given a value of 20 and CRS_GRADE_RECODE_STATUS was coded as 3.

Table 3. Standard numeric grading system

Wave Lower limit Upper limit CRS_GRADE
1 91 100 02
2 90 100
1 82 Less than 91 05
2 80 Less than 90
1 73 Less than 82 08
2 70 Less than 80
1 65 Less than 73 11
2 60 Less than 70
1 15 Less than 65 13
2 15 Less than 59