Speech Data in the NLSY97

In round 15, speech data were collected to learn about the relationship between a worker's speech and his/her labor market success, elaborating on the pilot study carried out by Grogger (2011). There were two main steps involved: collecting audio data and converting the audio data into numerical data suitable for regression analysis.

Important information: Locating speech variables

The speech variables are best located using the question name (Qname) search in NLS Investigator. Search for "Question Name starts with SPCH" to find this set of variables.

Audio data collection

Audio data were collected during round 15 of the NLSY97. The data were collected in response to two speech prompts, designed to capture both informal and formal speech. One prompt was administered at the end of the interview, when respondents were asked to recount the happiest moment (HM) in their life since the date of their last interview. The second question, administered during the employment section of the interview, involved a job-search (JS) role-playing exercise where respondents were asked the following:

Let's suppose you applied for a job that sounded really interesting to you and they called you and asked you to come in for an interview. How would you describe your skills, qualifications, and experience to me if I were the person interviewing you for this job? (Employed respondents heard a slightly different preamble to the question.)

All respondents who completed in-person interviews and who gave consent to be recorded were eligible to be assigned at least one speech prompt. Answers were recorded by the on-board microphone in each field interviewer's (FI's) laptop. To make the recording, the CAPI interview software was programmed to turn on the FI's laptop microphone for one minute once a prompt was reached. FIs were provided with instructions designed to keep the respondent talking for as much of that minute as possible.

Because of similarities between African American Vernacular English (AAVE) and Southern American English (SoAE), both stimulus questions were assigned to all African-American and Southern white respondents. Southern white respondents are defined as non-Hispanic whites who resided in the South Census region at age 12. A random sample of 500 respondents who were neither black nor Southern white were also to be assigned both speech prompts, as were roughly 295 other respondents for whom speech data was collected in 2006 as part of Grogger (2011) but who were not included in the other categories above. All other speakers, including non-Southern white respondents and all other respondents, were randomly assigned to only one of the speech prompts.

Table 1 provides data on round-15 speech-prompt sampling and response rates, disaggregated by race/region at age 12. Of the 8,984 original NLSY97 respondents, 7,423 were interviewed during round 15. Among those interviews, 6,579 were carried out in person. Among those, 6,080 respondents provided consent to be recorded and were thus eligible for this coding exercise. The share of round-15 respondents participating in in-person interviews and consenting to be recorded was .83 for blacks, .80 for both Southern whites and non-Hispanic whites, and .84 for the remaining group.

The center panel of Table 1 shows how eligible respondents were assigned to speech prompts. For the most part, the assignments followed the sampling plan fairly closely. All but seven of the black respondents, and all but two of the Southern white respondents, were assigned both questions. Among non-Southern whites and others, 795 respondents were assigned to both stimulus questions. Ten otherwise eligible respondents were not assigned either speech question.

The bottom panel of Table 1 provides counts of eligible respondents for whom audio files were actually generated by the interviews. There is a discrepancy between the number of respondents from whom audio data should have been collected and the number from whom it was actually collected. Of the 6,080 eligible respondents, audio files were obtained from only 4,907. The rate of loss among eligibles was 17 percent for blacks and Southern whites, 21 percent for non-Southern whites, and 20 percent for others. The panel also shows that there were black and Southern whites respondents for whom only one audio file was obtained, when there should have been two.

The reasons for this loss of data are unclear. NLSY project staff indicate that audio files appear not to have been captured for the 1,173 (=6,080-4,907) respondents who were eligible to be recorded but for whom no audio data are available, perhaps due to technical difficulties in the CAPI interviewing system. The loss of recordings is widely distributed among FIs, rather than being concentrated among a few, so appears to have been unintentional.

Dialect density measures

Four additional variables are available in the round 15 speech data. These variables consist of dialect density measures (DDMs) derived from the audio recordings collected in round 15.

Variable	Description
SPCH_JS_WORDS	DDM: Count of words in Job Search response
SPCH_JS_TOKENS	DDM: Count of AAVE tokens in Job Search response
SPCH_HM_WORDS	DDM: Count of words in Happiest Moment responses
SPCH_HM_TOKENS	DDM: Count of AAVE tokens in Happiest Moment response

The DDM is the ratio of the number of African American Vernacular English (AAVE) tokens in the audio file to the number of words in the audio file; tokens and words are provided for both the Happiest Moment (HM) and Job Search (JS) prompts. Many respondents with valid audio recordings, mostly non-Southern whites and others, lack DDMs due to budget limitations.

Producing numerical data from the audio files

To generate data suitable for the regression analysis, anonymous listeners were recruited to listen to the audio files and answer questions about the speakers. After listening to each audio file, listeners were asked to specify the speaker's sex, race/ethnicity, and region of origin. Three listeners were assigned to each audio file. Thus speakers who responded to both the HM and JS prompts have six listener reports, whereas speakers who responded to only one of the prompts have three. To deal with data security issues surrounding the use of potentially identifiable voice data, listeners were recruited from the pool of NORC FIs and research assistants. Data processing was carried out remotely using specially configured laptops that provided secure connections to NORC's computer network, where the audio files resided. All listeners received confidentiality training stipulated by both NORC and BLS.

Summary characteristics of the listeners are reported in Table 2. The modal listener was white and female, reflecting the demographics of the available workforce. Listeners were drawn from throughout the US, with disproportionately many Midwesterners. All listeners had completed high school; most had at least some tertiary education. The 22 listeners who listened to the JS audio files tended to be older, more Southern, and less educated than the 43 listeners who listened to the HM audio files (10 listened to both). Care was taken to ensure that speakers were not assigned to listeners who had interviewed them during round 15.

The HM files were processed first. All speakers with an HM audio file were in scope for HM data processing unless the file was empty or unintelligible. The top part of Table 3 shows that about 94 percent of the HM audio files were in scope, where this fraction varied from 89 percent for black speakers to 99 percent for non-Southern whites.

Budgetary issues limited the scope of processing for the JS files. The goals for JS file processing were to maximize the number of blacks and Southern whites for whom both HM and JS data were available and to maximize the number of non-Southern whites for whom data from at least one of the speech prompts would be available, while meeting the project budget constraint. A handful of "other" speakers were processed as well. As with the HM data, JS files that were empty or inaudible were deemed out of scope. The middle part of Table 3 shows that 83 percent of the available JS files for black speakers were processed, compared to 92 percent of those for Southern whites and 79 percent of those for non-Southern whites. Speech data from at least one prompt are available for a total of 4,225 NLSY respondents.

Table 1. Round 15 response counts by respondent's race and region at age 12
Segment		Black	Southern White	Non-Southern White	Other	Total
Original 1997 sample		2,335	1,160	3,253	2,236	8,984
R15 respondents		2,036	931	2,588	1,868	7,423
In-person interviews		1,833	797	2,269	1,680	6,579
…and consent to record		1,698	741	2,079	1,562	6,080
Speech prompt assignment:	Both questions	1,691	739	257	538	3,225
	Happiest Moment (HM) only	1	0	906	516	1,423
	Job Search (JS) only	6	2	913	501	1,422
	No assignment	0	0	3	7	10
Counts of eligible respondents for whom audio files were actually generated:	At least one audio file	1,402	616	1,638	1,251	4,907
	Both questions	1,283	570	194	419	2,466
	Happiest Moment (HM) only	22	6	706	400	1,134
	Job Search (JS) only	97	40	738	432	1,307

Table 2. Percentage distribution of listener characteristics by speech prompt
Listener Characteristics		Happiest Moment (HM) Prompt (1)	Job Search (JS) Prompt (2)
SEX	Male	27	16
	Female	73	84
	Total	100	100
RACE/ETHNICITY	White	83	84
	Black	13	15
	Hispanic	2	1
	Other	2	0
	Total	100	100
REGION OF RESIDENCE	Northeast	21	19
	Midwest	37	35
	South	21	37
	West	21	10
	Unknown	0	0
	Total	100	100
LEVEL OF EDUCATION	HS diploma or GED	5	24
	HS and some college	38	33
	Bachelor's degree or higher	57	43
	Total	100	100
Mean age of listener (years)		48	54

Table 3. Counts of speakers with speech data by speaker's race and region at age 12
Speech Data Type	Black	Southern White	Non-Southern White	Other	Total
Happiest Moment (HM) audio file	1,305	576	900	819	3,600
In-scope for Happiest Moment (HM) speech data	1,162	526	890	810	3,388
Job Search (JS) audio file	1,380	610	932	851	3,773
In-scope for Job Search (JS) speech data	1,139	564	739	59	2,501
Any speech data	1,168	567	1,629	861	4,225

Attitudes, Expectations, Non-Cognitive Tests, Activities

Important information: Locating speech variables

Audio data collection

Dialect density measures

Producing numerical data from the audio files

U.S. Bureau of Labor Statistics

National Longitudinal Surveys