Created variables
Public use variables:
- Region of residence at each survey date (Northeast, North Central, South, or West)
- Information on whether the current residence is in an urban or rural county
- Through 1996, this series was based on the respondent's State and county of residence and the "% urban population" data from the County & City Data Book. From 1998-2002 this item was based on whether the respondent was living in an urbanized area or in area with a population greater than 2,500. Beginning in 2004, this item indicates whether the respondent resides within an urban cluster or urbanized area. For further information see the Geocode Codebook Supplement.
- Information on whether the current residence is in a Metropolitan Statistical Area (MSA), the central city of an MSA, or outside of an MSA
- Based upon zip code, State, and county matches with metropolitan statistical designations for place of residence, the location of the respondent is determined to be within or outside of a metropolitan statistical area.
- Beginning in 1988, whether the current residence is in the United States.
Geocode file variables:
- The specific county and State (both edited) of residence at the time of interview, coded with Federal Information Processing Standards (FIPS) codes
- Similar information is provided for the respondent's residence at birth and at age 14
- The specific metropolitan area of residence at the time of interview. As applicable, information may be included for the following types of metropolitan areas:
- SMSA-Standard Metropolitan Statistical Area
- MSA-Metropolitan Statistical Area
- CMSA-Consolidated Metropolitan Statistical Area
- PMSA-Primary Metropolitan Statistical Area
- NECMA-New England County Metropolitan Area
- CBSA-Core Based Statistical Area
- Distance between respondent addresses at each interview round (see Appendix 22: Migration Distance Variables for Respondent Locations).
- This supplements the data on state and county of residence and is available only on the geocode release
- The distance between the respondent's addresses at each date of interview was created for all unique pairs of survey years
- The data described here do not actually provide a location for the respondent's residence; these variables only provide distances between the various places the respondent lives
- This pairwise matrix of variables enables various types of migration research by enabling users to consider the distance between residences and to identify return migration to an area where the respondent has lived in the past
- Indicators of the quality of the geographic data:
- May not have an address for the respondent
- In such cases the respondent's address is geocoded to the centroid of the zipcode when we can determine the zipcode
- To identify these cases, an indicator for the quality of this distance measure was created based on the quality of the matches in both years
- Indicator for whether the respondent was located in the same zip code, was created for all pairs of years
Important information: Using restricted-use Geocode data
- The level of detail available determines whether a variable is placed within the restricted release "Geocode" files. For example, general country level information, such as whether the respondent resided at various points in time within or outside of the United States, is available to all users with no restriction, while the specific county or SMSA in which he or she resided at a specific interview point is present only within the restricted-use Geocode data files.
- Researchers interested in using restricted-use Geocode data must submit an application to BLS. These confidential files are available for use only at the BLS National Office in Washington, DC, and at Federal Statistical Research Data Centers (FSRDCs) on statistical research projects approved by BLS. Access to data is subject to the availability of space and resources. Information about applying to use the zip code and Census tract data is available on the BLS Restricted Data Access page.
- The "Household Interview" areas of interest contain a set of variables titled 'Does R Live on a Farm or in a Rural Area?' The interviewer answers this question based on observation when at the respondent's permanent residence; if the interview takes place elsewhere, the interviewer asks the respondent about the place of residence. There are no consistent criteria for the definition of nonfarm property as rural. These variables should not be considered a replacement for the created KEY VARIABLE, 'Current Residence Urban/Rural?'
- The coding of respondents' geographic location before 1993 required extensive hand-editing and may not be completely accurate. The most common error is the potential assignment of a respondent to an adjacent county of residence. Data on addresses, zip codes, and phone numbers are used to clean the geographic codes. The post-1988 use of telephone number information improved data quality. A brief discussion below provides more information on both the hand-edits performed each year and the created variable that indicates the extent of hand-editing required for each case; see Appendix 10 in the Geocode Codebook Supplement for more details.
Geographic data for NLSY79 respondents fall into two categories: information on the main public file and more detailed information released as restricted-use Geocode data. Table 1 lists NLSY79 geographic variables with their areas of interest and corresponding documentation found in the NLSY79 Geocode Codebook Supplement and the NLSY79 Codebook Supplement. Variables with a "Geocode" areas of interest are restricted-use data; all others are public use.
Variables | Survey Years | Area of Interest | Documentation | |
Residence at Birth | Country - U.S. or Other Country | 1979, 1983 | Geocode | |
Country - Actual Other Country | 1979 | Geocode | Attachment 101 | |
County | 1979 | Geocode | Attachment 102 | |
State | 1979 | Geocode | Attachment 102 | |
South/Non-South | 1979 | Family Background | Attachment 100 | |
Residence at Age 14 | Country - U.S. or Other Country | 1979 | Geocode | |
Country - Actual Other Country | 1979 | Geocode | Attachment 101 | |
County | 1979 | Geocode | Attachment 102 | |
State | 1979 | Geocode | Attachment 102 | |
South/Non-South | 1979 | Family Background | Attachment 100 | |
Area of Residence - Urban/Rural | 1979 | Family Background | NLSY79 User's Guide and Appendix 6 | |
Present Residence | Lived in Since Birth | 1979 | Family Background | |
Year of Move to | 1979 | Family Background | ||
Migration History | Country/County/State Since Jan. 1978/Last Interview | 1979-1980, 1982, 2000-2020 | Geocode | Attachment 101, Attachment 102 |
Month/Year of Move(s) | 1979-1980, 1982, 2000-2020 | Family Background | ||
Main Reason for Move | 2018-2022 | Family Background | ||
Months Spent at Alternate Residence | 2018-2022 | Family Background | ||
Current Residence | Region | 1979-2022 | Key Variables | Attachment 100 |
Urban/Rural | 1979-2022 | Key Variables | NLSY79 User's Guide and Appendix 6 | |
SMSA/Central City | 1979-2022 | Key Variables | NLSY79 User's Guide and Appendix 6 | |
In U.S. | 1979-2022 | Key Variables | NLSY79 User's Guide | |
County | 1979-2022 | Geocode | Attachment 102 | |
State | 1979-2022 | Geocode | Attachment 102 | |
SMSA | 1979-2022 | Geocode | Attachment 104 | |
PMSA | 1979-2022 | Geocode | Attachment 104 | |
MSA | 1979-2022 | Geocode | Attachment 104 | |
CMSA | 1979-2022 | Geocode | Attachment 104 | |
MSA/CMSA/NECMA | 1979-2022 | Geocode | Appendix 10 | |
CBSA | 1979-2022 | Geocode | Appendix 10 | |
Main Reason for Moving Since Date of Last Interview | 2018-2022 | Family Background | NLSY79 User's Guide |
Geocode file variables
The Geocode files provide data on NLSY79 respondents' residence at the State, county, and metropolitan statistical area levels, merging information from Census reference files and data books, and includes additional variables such as local unemployment rates, job location, and college and military discharge locations where available.
- Information on the State, county, and metropolitan statistical area of residence for each respondent (the current residence variables) are merged with information from several other data files, namely the City Reference File (Census 1973, 1982, 1983, 1987, 1992) and the County & City Data Book (Census 1972, 1977, 1983, 1988, 1994), to provide detailed information on the environmental characteristics of the State, county, and metropolitan statistical areas in which each NLSY79 respondent resides. Note: Users may attach additional county and metropolitan statistical area-level data from a variety of sources by simply merging information from the desired source with the Geocode data based upon the State, county, and metropolitan statistical area of residence codes in the Geocode file.
- For select survey years, Geocode information is available on the location of respondents' jobs, the location of colleges attended, and the point of discharge from military service.
- Unemployment rate of each respondent's labor market of current residence:
- The source of the 'Unemployment Rate' variables is the May issue of the Bureau of Labor Statistics' Employment and Earnings for the year following the survey year. Figures from March of each survey year are used. This table supplies unemployment rates for each State and for selected metropolitan statistical areas. Respondents who reside within one of these metropolitan statistical areas are assigned the appropriate unemployment rate. For those residing outside of these areas, a "balance of State" unemployment figure is computed using State total figures for the size of the civilian labor force and the number employed and subtracting the population living in metropolitan statistical areas.
- Additional information on these variables can be found in Appendix 7 in the NLSY79 Geocode Codebook Supplement.
Types of county or Metropolitan Statistical Area environmental characteristics in the NLSY79 restricted-use Geocode data
- Population sizes
- Percent of population that is:
- urban
- black
- female
- under 5 years old
- 65+ years old
- Birth/death/marriage/divorce rates
- Physician and hospital bed rates
- Crime rates
- Poverty level data
- Educational attainment levels
- Median family and per capita income
- Recipients of and payments from:
- Social Security
- Labor force statistics:
- total labor force
- civilian labor force
- number of females in the civilian labor force
- civilians unemployed versus employed
- percent employed in various industries
- Unemployment rate for labor market of residence
Geographic residence
Detailed geographic mobility information was collected during the 1979-80, 1982, and from 2000 forward; data were gathered on the country/county/State and timing of up to five residential moves since January 1978 or since the last interview. Beginning in 2000 only significant geographical moves were recorded.
Neighborhood quality
The neighborhood quality series (1992, and 1994-2000), is taken from the National Commission on Children Parent & Child Study, 1990 Parent Questionnaire. In this series of questions respondents rate how much of a neighborhood problem issues such as crime, lack of police protection, unsupervised children and joblessness are.
Other geographic variables
Users may obtain special permission to use zip code and Census tract data available at the BLS offices in Washington, DC.
Edited versus unedited versions of state/county of residence
For some years (1979-82, 1988-89, 1991-92), two versions of the State and county of residence variables have been included in the "Geocode" files. The set occurring at the beginning of each file is the edited version, while the variables found near the end of the files for these years are unedited. If the variable has an actual source question number/name, it is the original from NORC. If the source question name says created, it is the edited/created version. Note that the unedited variables are sometimes combined into a single variable, with the State and county code appended to each other. These raw variables are preceded by the word "GEOCODE" in the variable title. The edited residence variables contain the corrections made for erroneous address information and are the ones from which the Geocode files themselves are constructed. Users should be aware that the edited version of these variables does not contain data for those respondents who are in the active military forces or who are living abroad or in a U.S. territory. Codes of "-4" appearing in the unedited versions of the State or county variables (because foreign country and U.S. territory codes are placed in one field or the other) should not appear in the edited versions of these residence variables.
Geocode procedures for assigning residence codes and hand-editing discrepant cases
During the 1988 hand-editing process, it became evident that the telephone numbers were very accurate, even in cases for which the address information contained discrepancies. Beginning in 1989, the area code and phone exchange were used to assign State and county of residence codes. The State assigned by the area code was then compared to the State assigned on the basis of zip code alone and the State contained in the original NORC respondent file. A "quality of match" variable was computed on the basis of how well these States match. For a more detailed discussion of these new assignment and matching procedures, refer to Appendix 10 in the Geocode Codebook Supplement. This process was used through the 1994 release.
The hand-editing procedure has also been streamlined. In 1989, the first year in which the phone assignment procedure was used, the residence codes assigned on the basis of the area code and exchange were compared to the raw residence variables received from NORC. Those with information that did not match were identified for individual examination. Ideally, the discrepancies requiring individual examination would be reduced to those cases which are "genuine movers" or which have zip codes covering multiple counties and would require some verification that the correct county was assigned based upon the phone information. The current process for identifying discrepancies and hand-editing is aimed more directly at achieving this objective.
Beginning in 1990, the residence codes assigned based on phone information were compared to the 1989 CHRR-edited residence information to identify cases for individual examination. Because the previous year's edited variables incorporate the corrections that were made in the hand-editing process from earlier years, repeated editing of the same cases across years decreased. Through this process, the discrepancies in residential Geocode information were reduced. The number of cases requiring individual examination also decreased and was restricted more closely to the population of "genuine movers" and people with multiple-county zip codes and phone numbers that require verification of county of residence.
The hand-editing process in previous years included not only these genuine movers and multi-county zip code dwellers, but also other cases for which elements of the address are simply in error or incompatible with each other. Some of these cases could potentially require editing for the same errors in more than one year, even if the respondent stayed in one location. Hand-editing procedures were further streamlined, and in some cases automated, to produce the 1992 data.
Beginning in 1996, a new procedure for verifying and assigning correct final Geocode information was instituted. This procedure is now performed using specialized address tracking Geocode software. The processes are described in Appendix 10. It is the belief of CHRR staff members not only that the current procedures are more efficient in identifying true discrepancies and streamlining the hand-editing process, but also that they result in more accurate and consistent assignment of State and county codes in general.
Missing values, New England cases, and mobility
Missing values in location of residence variables and metropolitan statistical area codes are associated with respondents who are in the active military forces or who are living abroad or in a U.S. territory. Users should be aware that, because the New England County Metropolitan Area (NECMA) codes are not comparable to metropolitan statistical areas from the remainder of the country, New England cases are eliminated from some of the procedures used to construct the Geocode files.
The review and hand-editing process has been periodically revised to improve the accuracy of the data and the efficiency of data production. The potential implications for effects on mobility rates between some years due to these changes have been noted in Appendix 10. Users should read Appendix 10 carefully to gain a better understanding of the issues outlined above and their implications for specific research endeavors.
Comparison to Other NLS Surveys | Data on the respondent's area of residence are available for all cohorts. Geographic residence information for those NLSY79 children who resided with their mother can be inferred from the residence data of their mothers. The NLSY97 main created variables indicate whether the respondent lives in an urban or rural area, whether the respondent lives in a Metropolitan Statistical Area, and in which Census region the respondent resides. More detailed information is available on the restricted-use Geocode data. Region of residence and geographic mobility of Original Cohort respondents are provided for most survey years. Geographic data for NLSY79 respondents fall into two categories: information on the main public file and more detailed information released as restricted-use Geocode data. Information about applying to use the zip code and Census tract data is available on the BLS Restricted Data Access page. |
Survey Instruments & Documentation |
Data on residence at birth and at age 14, as well as the 1979-82 present/most recent residence series, were collected using questions found within Section 1 ("Family Background" and "On Family") of the 1979, 1980, and 1982 questionnaires. All other variables are created from or determined by the geographic information provided by each NLSY79 respondent within the locator section of the questionnaire or from the interviewing Face Sheet or internal NORC locating files. Several attachments and appendices in the NLSY79 Codebook Supplement and/or the NLSY79 Geocode Codebook Supplement offer creation procedure information and coding systems for the geographic residence variables. The following are relevant to the Geocode:
Related Variables |
Related NLSY79 main file variables discussed in the Household Composition and Family Background sections of this guide include:
Areas of Interest | Residence variables can be found within the "Family Background," "Key Variables," "Geocode," or "Household Interview" areas of interest; Table 1 above specifies the particular areas of interest for each variable. All environmental variables, including the 'Unemployment Rate for the Labor Market of Current Residence,' are present in the "Geocode" areas of interest in the restricted-use Geocode data. |