The Post-Secondary Transcript Study (PSTRAN) was led by researchers from University of Texas-Austin, University of Wisconsin-Madison. Data collection and file documentation preparation were conducted by NORC at the University of Chicago. This study was funded by a grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (1 R01 HD061551-01A2).
This study adds detailed information on postsecondary education for the National Longitudinal Study of Youth of 1997 (NLSY97) respondents, culled from transcripts from all postsecondary institutions attended by respondents for undergraduate study. Postsecondary transcripts were collected and courses were coded according to a well-established taxonomy used by researchers, policy makers, and administrators. The postsecondary transcript data provide invaluable detailed chronological information about students' enrollment patterns across post-secondary institutions, the courses they took and their performance in those courses.
B. Sample Design
The sample of youth potentially eligible for postsecondary transcript data collection were any NLSY97 respondents who reported attendance in a postsecondary undergraduate degree program during any of the NLSY97 interviews, rounds 1 through 15. The variable series NEWSCHOOL_PUBID.nn_YYYY and NEWSCHOOL_SCHCODE.nn_YYYY identified postsecondary institutions in the survey data. For the youth questionnaire section on regular schooling, respondents are asked to exclude training at technical institutes or license trade programs, unless the credits obtained are transferable to regular schools and could count toward a degree. Therefore, enrollment spells not associated with a degree program are not enumerated in the survey data and did not qualify youths for inclusion in the PSTRAN data collection effort.
Table 1. Youth-level PSTRAN Eligibility and Transcript Receipt (PSTRAN_DISP)
No postsecondary enrollment reported in survey data, no waiver
No postsecondary enrollment reported in survey data, waiver
Postsecondary enrollment reported in survey data, no waiver
Postsecondary enrollment reported in survey data, waiver, no transcript(s) requested
Postsecondary enrollment reported in survey data, waiver, at least one transcript requested, none received
Postsecondary enrollment reported in survey data, waiver, at least one transcript requested, at least one received
Of the 8,984 youths, 2,830 had not reported any post-secondary enrollment through the R15 interview. An additional 1,445 respondents had reported post-secondary enrollment, but declined to provide a waiver and therefore could not be fielded for transcript collection.
There were 310 respondents who provided a waiver and reported post-secondary enrollment, but for whom transcripts could not be requested because their enrollment institutions could not be identified or located, the institutions refused to provide instructions on how transcripts could be requested, or the institution was located outside of the U.S. This group includes some institutions that had closed and whose records could not be recovered elsewhere. Many institutions that had closed had made their records available from another source; their data are found in the PSTRAN variables. In addition, transcripts were fielded from more than 100 institutions that could not be located in the Integrated Postsecondary Education Data System files maintained by the National Center for Education Statistics, but which we were able to locate through other means.
Of the 4,399 youths for whom one or more transcripts were requested, at least one transcript was received for 3,818 youths. Thus, 86.8 percent of youths for whom transcripts were requested have transcript data in the PSTRAN variables.
During data collection, some youths were identified as not having had adequate enrollment at an institution to generate a transcript. These could be, for example, enrollment spells that did not result in a single completed course (even one with a failing grade) or spells in non-degree programs. There were 207 youths identified for whom none of the reported institutions had enrollment that could generate a transcript. (The variable PSTRAN_DISP_UPD incorporates this confirmed non-enrollment information.) Reclassifying these youths as ineligible for the transcript effort, we find that we received at least one transcript for 3,818 of 4,192 youths who may have had undergraduate enrollment spells, yielding a transcript collection rate of 91.1 percent.
C. Waiver Collection
All institutions required written consent of the student in order to release a transcript to NORC.
In Round 14, all respondents completing the R14 interview in-person were asked to sign a waiver for transcript release, regardless of their educational attainment. Also in Round 14, NORC sent a mailing (no incentive) to all non-incarcerated youth completing the R14 interview but for whom no waiver had been received, again regardless of educational attainment. These were primarily individuals completing the Round 14 interview by telephone. No mailing was sent to currently incarcerated prisoners, from whom return mail is difficult to receive. No outreach was done systematically to individuals who did not complete the Round 14 interview, although we received a handful of waivers from these individuals in the course of fielding activities.
In Round 15, we again requested waivers during the field period from all youth completing the Round 15 interview who had not yet submitted a waiver and who had reported some post-secondary enrollment prior to the Round 15 interview. After Round 15 data collection was complete, we sent a mailing to all individuals who had reported some post-secondary enrollment to date but who had not yet provided a waiver, regardless of their Round 15 completion status. As a goodwill gesture, a pre-paid incentive was included in this last mailing to individuals.
In summary, individuals not reporting college enrollment prior to Round 14 and not completing the Round 14 interview were not systematically asked to provide a waiver, so absence of a waiver for them should not be considered an indication of refusal to provide access. Individuals who completed the Round 14 or Round 15 interviews, or anyone reporting postsecondary enrollment prior to Round 14 was asked for a release at least by mail. Note that a small fraction of these (approximately 300) are unlocatable respondents who may not have actually received the mailing. In addition, we know a non-trivial fraction of respondents do not open mail from NORC even when they are expecting to receive a check from us. No phone calls or other personalized outreach was made to collect waivers from individuals who did not complete either the Round 14 or Round 15 interviews. Please see the variable PSTRAN_WAIVER_REQUESTED for youth-level indication of whether or not a waiver was requested. The variables PSTRAN_DISP_UPD and PSTRAN_DISP indicate which youths provided a waiver, whether or not they were eligible for PSTRAN.
Overall, NORC collected signed waivers from 6,523 total individuals. Of these individuals, 4,709 were youth who had attended a postsecondary degree program and for whom a transcript could be requested. This represents 76.5 percent (4,709 of 6,154) of the NLSY97 respondents who had reported at least one post-secondary undergraduate degree enrollment spell by the time of the Round 15 interview.
1. Transcript Collection
Transcript data collection had several steps once a student's signed waiver had been received.
School locating. Project staff began by attempting to locate all necessary contacting information for each postsecondary institution where a sample member reported attending an undergraduate degree program. Contacting information included not only the full postal address for the branch of the postsecondary institution the student attended, but also registrar or transcript request information for each institution. We collected registrar contacting information, including registrar name, address, phone number, and most importantly, any fees associated with transcript requests. Fee and transcript request information were unavailable on many websites and required follow-up telephone calls. The locating process for each institution was ongoing throughout data collection to not only accommodate institutions that were added to the PSTRAN sample throughout the data collection period but also in order to ensure that all contacting information was kept up to date. In addition, the school locating task also collected the URLs for each institution's most recent college course catalog.
Mailings to institutions. Once institution locating was complete, NORC would mail a transcript request to each institution attended by PSTRAN sample members. To the extent possible, each institution received a single mailing with requests for all of its PSTRAN youth. Transcript request mailings included a cover letter explaining the study. The letter made no mention of the NLSY97in order to avoid compromising the sample member's confidential status as a member of this study. The mailings included a list of each student for whom a request was being made, along with that student's date of birth, other names used by that student, start and end dates of attendance at each school as provided by the student's NLSY responses, and an internal study-specific ID. The mailings also included a copy of the signed waiver for each youth who attended that institution. Finally, these mailings also included checks to each institution for each sample member to cover any fees the institution charged per transcript copy, as well as a business return envelope for the institution to use when returning the requested transcripts. Institutions were also given instructions to send NORC transcript copies via fax.
Mailings were sent as waivers were available and to manage the flow of work. Requests for youth whose waivers were received during Round 14 were mostly sent from mid-January to mid-May 2012. Additional mailings were sent infrequently between November 2012 and June 2013 as Round 15 waivers were received and additional institution locating and cooperation issues were resolved. A pre-test mailing was sent in November 2011 to pilot the transcript collection process and determine its feasibility. The pre-test mailing was sent out for 100 youth-college pairs, of which 60 were returned as complete transcripts.
All institutions that received transcript requests also received at least one follow-up phone call to clarify NORC's request, confirm the request's arrival, and prompt institutional cooperation. These prompting efforts were made to each school one week after the requests were mailed. If schools received multiple rounds of transcript request mailings, they received prompts after each mailing. If institutions indicated that they were not able to find a student in their records, we contacted the institution by telephone with additional identifying information in order to confirm the status of the student at that institution.
Issues associated with transcript non-receipt. We encountered several specific circumstances at the institution level during transcript data collection. Through our prompting efforts, we learned that some spells reported in the interview were not associated with undergraduate degree program enrollment and were thus out of this study's scope. In addition, a large portion of non-received transcripts were due to financial or other holds placed on the transcript by the institution because of the students' pending obligations. Although NORC requested "unofficial" transcripts for these students, few institutions were willing to provide any data when such a hold was in place. A summary of these issues, at the level of youth-institution pairs (i.e., potential transcript requests), is found in the table below.
Table 2. Final Transcript Disposition of Youth-Institution Pairs
INELIGIBLE, GRADUATE TRANSCRIPT ONLY
REQUESTED, RECORDS ON HOLD
INELIGIBLE, NO ENROLLMENT OR ATTENDANCE
REQUESTED, NON-RESPONSE OR REFUSAL FROM NON-PARTICIPATING INSTITUTION
REQUESTED, NON-RESPONSE OR REFUSAL FROM PARTICIPATING INSTITUTION
RECORD APPEARS UNDER ANOTHER TRANSCRIPT ID
Note: Only youths providing a waiver for transcript release and institutions reported in R1-R15 survey data for undergraduate enrollment. All institution loops 1-8 combined.
As shown in the table above, some fielding efforts indicated that there was no transcript associated with an undergraduate degree program for the sampled student at a given institution. We identified 995 youth-institution pairs for which transcripts were requested but out of scope because there was only graduate enrollment, or other non-undergraduate enrollment. Excluding these out of scope transcript requests, we received 79 percent of all student transcripts that we requested. The remaining transcripts that we requested but did not receive were not sent to us because 1) the student had a hold on his or her account at the institution (for either financial reasons or otherwise), because 2) the institution refused to cooperate with our study, or because 3) the institution never responded to any of our requests.
We note that there may be additional out of scope requests that we were not able to confirm. For instance, 558 of the 1,046 transcripts that were requested but not completed were at institutions that were otherwise cooperative: i.e. sent us a transcript for another sampled student or informed us that another sampled student had a hold on his or her account. Based on the qualitative information collected for students who were confirmed not to have available transcripts, these unfulfilled requests could include students who had enrolled but never attended classes, or who had taken classes but never in a degree program. It is also possible that a student's institution was mis-recorded in the original interview data, for example, because an incorrect campus was identified within a larger institutional system that does not maintain student records at the system-wide level.
2. Transcript Processing
i. Initial check
Received transcripts were marked as complete only if they contained full information about the student's classes, including items such as grading scales. If any information was missing, institutions were re-contacted via phone and email so that the missing data could be retrieved. Transcripts also were confirmed to contain undergraduate course information. Transcripts that contained exclusively graduate study information were excluded from the final PSTRAN data. Only undergraduate data was included from transcripts that contained both graduate and undergraduate course information.
Received transcripts were edited by production staff. Editing involved highlighting information such as degrees earned, transfer information, term information, credits earned, credits attempted, term and overall transcript summary statistics, and all other class information on the transcripts to facilitate accurate and efficient data entry. Editors undertook the following activities:
Determine if any degree information was present on the transcript, and if so, record how many degrees were earned.
Capture if any transfer credits were present on the transcript and if so determine the type of these credits (postsecondary, military, AP, other).
Distinguish undergraduate from non-undergraduate coursework and terms. If a transcript listed both undergraduate and graduate courses in clearly distinct sections, all non-course related information (such as majors or degrees) and all undergraduate courses were marked for data entry, graduate courses were not. If an undergraduate student took graduate courses and those courses were not listed in a separate section, it is likely these courses appear in the PSTRAN data. If a transcript only included graduate courses, the transcript was labeled "graduate only" (see transcript disposition) and none of the information on the transcript will appear in PSTRAN variables.
Count and record the number of terms and the number of courses within each term; assigned the earliest terms a term ID of 1 and each subsequent term was numbered in sequence, with the number of courses specified after a decimal (i.e. the oldest term with five courses would have a term ID of 1.5).
Review terms for how credits, units, or hours were assigned grades or quality points.
Assess courses listed on the transcript for designation as: remedial, repeated, withdrawn, failed, or given a non-conventional grade.
Capture all summary statistics present on the transcript, such as Credits/Units/Hours Attempted, Credits/Units/Hours Attempted, GPA, or others.
Note any distinctions given such as Honors, Cum Laude, Magna Cum Laude, Summa Cum Laude, Dean's List, or other.
Record any standardized test information present on the transcript such as ACT or SAT scores.
iii. Computer Assisted Data Entry (CADE)
After editing, data entry clerks data-entered the information highlighted by the editors. Approximately one third of all transcripts were data-entered twice, each instance independent of the other, to ensure complete accuracy across all variables. Production and research staff reviewed the data file for internal consistencies and adjudicated these by reviewing the actual transcripts.
All majors and courses were given six-digit codes according to the 2010 College Course Map guidelines (College Course Map_2010 Revision: http://nces.ed.gov/pubs2012/2012162rev.pdf), the code frame used by NCES for post-secondary transcript data in the Beginning Postsecondary Survey and Educational Longitudinal Survey. Almost half of the transcript courses were coded in an automated process using the most common course names, course departments and codes. These assignments were reviewed manually by research stuff at a rate of 10 percent and all identified errors were then corrected in the full file .The other half of the courses were manually coded in batch form by several graduate research assistants, with their work reviewed and reconciled by research staff. Batches were sorted by subject area to streamline the coding process. Each batch was reviewed by research staff before being incorporated into the final course level file. About 6 percent of all college courses did not have course names informative enough to be coded and were thus investigated online via the Course Source Online database. This web-lookup task provided a detailed description, and, in some cases, an updated course title, allowing most of these courses to be coded. A variety of quality assurance activities were conducted on coding quality, primarily at aggregate levels. For example, individual youths' transcripts were reviewed manually for consistency; low frequency exceptions of course title verbatim to course codes were checked as possible outliers, easily confused assignments were double-checked (e.g., assignment of ENG to English vs Engineering), and verbatim course titles assigned to a single code were reviewed for consistency.
v. Term dates
To facilitate linkage between transcript data and the event history arrays derived from the interview responses, we collected start and end dates associated with each of the terms when students took classes. To secure these dates, we consulted the College Source online database (http://tes.collegesource.com) for the exact campus for each school. Start and end dates were sought for each type of term at each institution appearing in the PSTRAN course file. For institutions not appearing in College Source, we conducted online searches for course catalogs and other information directly available on institution websites.
The first item captured was the "Calendar System" variable, coded as: semester, quarter, trimester, variable or other. "Other" captured institutions that changed from Semester to Quarter or if the institution used a 4-1-4 system. The Variable category was used when institution term length varied based on how long the student took to complete the class at his or her own pace. To minimize the risk of disclosure, the public release file combines "variable" and "trimester" into the "other" academic type category. In fewer than 100 schools, the "Calendar System" captured from catalog sources differed from the calendar system reported in the transcript. In these cases, the differences were adjudicated to ensure that the "Calendar System" variable and transcript data could be reconciled. "Calendar System" captures the primary calendar system for each institution; institutions may also have additional secondary calendar systems. For example, many semester institutions also have month-long terms.
The most recent available course catalog was used to identify the academic calendar for that institution and capture the most recent start and end dates for all of the term types (e.g., Winter, Spring) in our file from that institution.
If a school listed both the last day of classes for that term and the last day of exams for that term, the last day of exams is recorded as the end date. Some institutions, primarily those that provide online instruction, have flexible terms. Because term end date was not recoverable in these cases, the school's reported term length (e.g. a month) was used to calculate the term end date. All prior academic years at that institution were assumed to follow the same dates as the selected academic year. For disclosure protection, the file released includes start and end month and year, but not start or end day.
vi. School-level linkage with survey data
The variables PSTRAN_NEWSCHPUBID_X can be linked with the variables NEWSCH_PUBID from youth survey records to connect the self-reported institution information with data extracted from transcripts. The non-disclosive NEWSCH_PUBID values are youth-specific but not meaningful across youths. Users may apply for a geo-code license to access IPEDS IDs for these institutions.
E. Assessing the coverage of the PSTRAN data.
Coverage can be assessed for these data in a variety of ways: at the youth level, at the youth-institution pair level, or as a proportion of total terms enrolled by all of these youth. A complete assessment of coverage or associated bias is not yet available. The loss of coverage due to not having a waiver appears to be greater than the loss of coverage due to non-response by institutions. One uncertainty that is not likely to be resolved is the "true" enrollment status of all sampled youths.
Youth level. Based on the survey data, we identified 4,709 youths who had a waiver and at least one post-secondary degree program enrollment for which a transcript could be requested. During the course of fielding, we determined that 5.1 percent of these youths had never been enrolled in a degree program at the named institution (most had been registered but not attended, or enrolled in non-degree coursework only). Setting aside these "never enrolled" youth, there are 4,502 youths who have a waiver and are presumed to have available transcripts associated with at least one degree program enrollment; the transcript data files contain transcripts for 87.1 percent of these individuals. This rate assumes that all individuals for whom we received no transcript response did indeed have an undergraduate spell; this assumption is likely an overestimate.
Youth-Institution level. Among youths with waivers, we identified 8,942 potential transcripts to be secured (that is, youth-institution pairs). Of these, we determined 995 to not be associated with undergraduate enrollment. Taking these into account, 74.8 percent of youth-institution pairs that might have an undergraduate transcript do have a transcript in the PSTRAN data. Again, it is likely that there are some additional pairs not associated with undergraduate enrollment among the 1,046 transcripts for which no response could be secured; this is particularly likely among the 558 transcript requests not fulfilled by "participating" institutions (that did provide transcripts or information for other youth). Thus, the 74.8 percent is a likely underestimate of the percentage of youth-institution pairs covered by the PSTRAN data.
No analysis has been done yet at the level of terms of reported enrollment.
F. Weighting Report
Executive Summary. This section describes the weights for the NLSY97 Post-Secondary Transcript (PSTRAN) Study. Of the 8,984 NLSY97 round 1 respondents, it was determined that 5,939 attended post-secondary schooling. We were able to receive waivers to collect post-secondary transcripts from 4,494 respondents. For 77 of these respondents, the institutions were closed, foreign, or unlocatable, leaving 4,417 NLSY97 respondents for which transcripts were requested from valid institutions. At least one transcript was received for 3,818 NLSY97 respondents. These 3,818 NLSY97 respondents have a positive PSTRAN_WT.
Below, we describe the steps summarized in the paragraph above. For each of the response rate steps, we use non-response weight adjustments by race/ethnicity ("Hispanic", "non-Hispanic Black", and "non-Hispanic Other") and gender, resulting in six cells for each non-response step.
Construction of PSTRAN Weights:
WT0: PSTRAN Base weight
Since all 8,984 NLSY97 round 1 respondents were considered eligible for the PSTRAN study, the base weight for the PSTRAN study is the final Wave 1 NLSY weight (cumulated cases weight). The sum of WT0 is also the sum of NLSY97 Round 1 weights: 19,378,453. This is the number of 12-16 year-olds in the United States in 1997.
WT1: Adjustment for eligibility
NLSY97 round 1 respondents with no post-secondary schooling are ineligible for the PSTRAN study. In this step, the weight for ineligible cases will be set to zero (in all future steps, the weight will be missing). The weights for eligible cases will be unchanged.
The sum of the weights now equals an estimate of the number of 12-16 year-olds in 1997 who have post-secondary schooling by the time of the study. Table W1 shows the estimated rates of post-secondary schooling by our six demographic cells and overall:
Table W1. Estimated Post-Schooling Rates by Race/Ethnicity and Gender
Table W1 shows that Females are more likely to have post-secondary schooling as well as non-Hispanic Others. The gender gap is widest for non-Hispanic Blacks.
The sum of WT1 is 13,276,012, which divided by the sum of WT0 (19,378,453) is the overall total post-secondary schooling rate (68.51%).
WT2: Adjustment for waiver non-response
We next adjusted for non-response in acquiring a waiver to collect transcript data from the NLSY97 respondents. We calculated the waiver response rate separately within the six race/ethnicity-gender cells. In each cell, weights for responding cases increased by the reciprocal of the response rate such that the responding cases take on the additional weight of the non-responding cases. Table W2 shows the waiver response rates by our six demographic cells and overall:
Table W2. Waiver Response Rates by Race/Ethnicity and Gender
Table W2 shows that the highest waiver response rates were for non-Hispanics Blacks and females. There is only a large gender gap for non-Hispanic Blacks.The sum of WT2 is 13,276,012, which is the same as the sum of WT1.
WT3: Adjustment for unavailable institutions
We next adjusted for closed, foreign, and unlocatable institutions. We treated this as another non-response weight adjustment so that NLSY97 round 1 respondents whose institutions were unavailable were still represented by those whose institutions were available. We called this the "institution" response rate, and calculated the institution response rate separately within the six race/ethnicity-gender cells. In each cell, weights for responding cases increased by the reciprocal of the response rate such that the responding cases take on the additional weight of the non-responding cases. Table W3 shows the institution response rates by our six demographic cells and overall:
Table W3. Institution Response Rates by Race/Ethnicity and Gender
Only 77 out of 4,494 NLSY97 round 1 respondents who provided waivers had institutions that were closed, foreign, or unlocatable, so the "response" rates in Table W3 are quite high. There is very little difference by gender, but the non-Hispanic Other rate of closed, foreign, and unlocatable institutions is the lowest.
The sum of WT3 is 13,276,013, which is only different by rounding error from the sum of WT1.
PSTRAN_WT = WT4: Adjustment for transcript receipt
The final adjustment is a completion non-response weight adjustment. We received transcripts for 3,818 NLSY97 round 1 respondents, so PSTRAN_WT = WT4 is positive for exactly these 3,818 NLSY97 round 1 respondents. We again calculated the completion response rate separately within the six race/ethnicity-gender cells. In each cell, weights for responding cases increased by the reciprocal of the response rate such that the responding cases take on the additional weight of the non-responding cases. Table W4 shows the completion response rates by our six demographic cells and overall:
Table W4. Completion Response Rates by Race/Ethnicity and Gender
Completion rates were highest for non-Hispanic Others. The only gender gap was a higher rate for Hispanic males than Hispanic females.
The sum of PSTRAN_WT = WT4 is 13,276,012, which is again equal to the sum of WT1.