Geographic Residence & Geocode Data

Geographic Residence & Geocode Data



The NLSY79 Child datafile does not contain child-specific geographic information for children under age 15. However, to be interviewed as part of the child survey, all children must reside with their mothers for at least part of the year. Limited geographic information about mother's residence in each survey year is available in the main NLSY79 data set, and more detailed information is provided on the restricted-use NLSY79 geocode CD.

The following created variables are provided, for mothers of NLSY79 children, in the main Youth datafile:

  • REGION. Region of residence at birth, age 14, and survey dates (Northeast, North Central, South, or West)
  • URBAN-RURAL. Information on whether the current residence is in an urban or rural county
    • Through 1996, this series was based on the respondent's State and county of residence and the "% urban population" data from the County & City Data Book.  From 1998-2002 this item was based on whether the respondent was living in an urbanized area or in area with a population greater than 2,500.  Beginning in 2004, this item indicates whether the respondent resides within an urban cluster or urbanized area.   For further information see the Geocode Codebook Supplement.
  • SMSARES. Information on whether the current residence is in a Metropolitan Statistical Area (MSA), the central city of an MSA, or outside of an MSA
    • Based upon zip code, State, and county matches with metropolitan statistical designations for place of residence, the location of the respondent is determined to be within or outside of a metropolitan statistical area
  • USRES. Beginning in 1988, whether the current residence is in the United States

This mother-based geographic information can be merged with any child's record by using the case identification code. As children age into the Young Adult sample, information on their geographic residence is included in the Young Adult data set as described below. By combining the main NLSY79 and Young Adult with the child file, data users can track geographic residence from birth to the current survey round and can also link information on the location of family members who are also respondents in the NLSY79 or Child/Young Adult surveys.

The geographic data available for mothers in the main NLSY79 survey are described in the Geographic Residence & Neighborhood Composition section of the NLSY79 topical guide.


Young Adult

Geographic Residence (Public Data)

Created variables

REGIONyyyy. Region of Current Residence. Year-specific variables are available for each survey year.
URBAN-RURAL. Is Current Residence Urban or Rural? Year-specific variables are available for each survey year.
SMSARES. Is Current Residence in SMSA? Year-specific variables are available for each survey year.

Publicly available created variables for the Young Adult respondent's geographic residence include U.S. region of residence (Northeast, North Central, South, and West), an urban/rural designation for the residence, and whether the residence is in an SMSA (standard metropolitan statistical area). These variables are in the "YA Common Key Variables" Area of Interest. More specific information about geographic residence, including an explanation of missing data, can be found on the geocode data CD described below.

Geocode Data

Beginning with the 2000 data release, the decision was made to create a set of geocode data files for the Young Adults comparable to those created each round for the NLSY79. A full set of geocode variables was created for all Young Adult years from 1994 to 2000 at that time, and geocode variables continue to be prepared for each new round. Researchers interested in obtaining the geocode CD must submit a short geocode application to the Bureau of Labor Statistics and agree to meet certain security requirements. Researchers can find more information about this process at

The Young Adult supplemental data files provide geographic variables from the NLSY79 Young Adult survey data file. Additionally, for survey years 1994-2002, these supplemental data files provide selected variables from the County and City Data Books.  

The Young Adult geocode file includes the state and county of residence for each survey round. For the creation of the 1994 through 2002 geocode data, for Young Adults living in their mother's household, the county and state of residence were drawn from the mother's NLSY79 data if the mother was interviewed for that year. For Young Adults not living with their mothers, and those whose mothers were not interviewed, county and state of residence were coded from the Young Adult survey data. In cases where the mother's data were missing or incomplete, Young Adult survey data were used to provide accurate codes wherever possible. Since 2004, all county and state of residence variables were coded from the Young Adult survey data.

The county and state of residence for each Young Adult respondent for each survey year between 1994 and 2002 were matched with the county and state variables from the County And City Data Book data files for both 1988 and 1994 so that geocode data files include selected county-level and SMSA-level environmental variables. Users should note that a decision was made to extract geocode variables for all five 1994-2002 Young Adult survey years from only the 1988 and 1994 County and City Data Book data files. This decision means that the 1994 and 1996 Young Adult geocode variables are not directly comparable to those of their mothers, whose geocode variables were extracted from the 1983 and 1988 County and City Data Book data files. 

The County and City Data Book data files were prepared by the U.S. Bureau of the Census. Related printed matter for each of these data files can be found in the County and City Data Book for the specified year, which is also published by the U.S. Bureau of the Census. 

The Geocode Codebook for the Young Adult survey provides the following detailed information on each geocode variable: its reference number, variable description, coding information, frequency distribution, file name, variable name, and source of the variable. Included are references to pertinent attachments and appendices from the NLSY79 Geocode Codebook Supplement providing supplementary coding and variable creation procedures. Variables are grouped within the geocode codebook according to the year with YA GEOCODE 1994 variables followed by YA GEOCODE 1996 and so forth. 

Users of the Young Adult geocode data are encouraged to review the NLSY79 Geocode Codebook Supplement for greater detail on the geocoding processes as comparable procedures have been used in the Young Adult as in the NLSY79 main file. This supplement has several appendices and attachments, including:

  • Appendix 10: Geocode Documentation which provides background information on how the original 1979-1982 geocode tape and subsequent updates were created and how those data were modified to form the 1979-2010 release.
  • Attachment 100: Geographic Regions which provides a listing of those states, which comprise each of the four regions, used in such variables as region of residence and south-non-south place of birth/place of residence at age 14.
  • Attachment 102: State Federal Information Processing Standards (FIPS) Codes, which are used to code respondents' state of residence. 
  • Attachment 104: SMSA Codes which contains the coding information utilized to classify SMSA, MSA, CMSA, PMSA of residence at each interview date.
  • Attachment 105: Addendum to FICE Codes contains the supplementary identification numbers for those colleges and universities not listed in the Education of Directory Colleges and Universities (1981-1982 and 1982-1983 supplement) published by the National Center for Educational Statistics.
  • Appendix 7: Unemployment Rates which provides an explanation of how the continuous and collapsed versions of the variable, unemployment rate for labor market of current residence were created.

Geocode Data File Creation Procedure

The software package Maptitude (V4.2) was used in the creation of the NLSY79 Young Adults 1994-2004 geocode data files for Young Adults who could not be matched to previous mother data (see NLSY79 Geocode Codebook Supplement for greater detail). Since 2006, the geocoding process has been undertaken with ArcGIS (V9.2). These programs link respondent address data to standard geographic information such as the FIPS (Federal Information Processing Standards) codes for state and county. Three graduated matching methods were applied, depending on the quality of address data available.

  1. Where possible, an automated match was done between the respondent's locating address data and the GIS database.  Address records with matching street segments were assigned the latitude and longitude of the location. In some cases, addresses had to be cleaned before they could be matched by the program. Cleaning involves steps such as standardizing the address format, correcting obvious misspellings, identifying apartment numbers and locating them in the correct field, etc. It does not include any changes that might result in a change in the actual address location.
  2. For some addresses, the procedure outlined in Step #1 failed to produce a match between the respondent's address data and the GIS database. In these cases, geocode staff used the Maptitude or ArcGIS program to locate the correct street. If the street number could be located along this street, staff assigned the correct latitude and longitude. However, some streets in the GIS database do not include information about street numbers, and, if this occurred, the address was manually located in the center of the street. The street is then classified as either a short street or a long street where long streets cross Census tract or block group boundaries while short streets do not. As a result, the level of certainty about geographical information is much higher for short streets than for long streets.
  3. Addresses unmatched by either of the first two procedures were assigned latitude and longitude coordinates according to a 5-digit zip centroid. A centroid is essentially the midpoint of a ZIP code area. The geographic information is less certain for respondents located using the zip centroid method.

Because some Young Adults had latitude and longitude derived from Maptitude through 2004, while others had these data matched from NLSY79 records for their mothers from years when different systems were used, a quality of match variable equivalent to GEO10 in the NLSY79 geocode data was not released for survey years 1994-2002 but is available for 2004. Quality of match is also available for subsequent survey years. Researchers who need to determine the level of certainty for the respondent's geographic data prior to 2004 may contact CHRR User Services for further details.

Supplementary Created Geocode Variables

College Variables. In all Young Adult survey rounds, information was gathered on the name and location of the college or university that the respondent currently or most recently attended. Included in the geocode variables for survey years 1994 through 2000 are Federal Interagency Committee on Education (FICE) codes for these colleges or universities as well as FIPS codes for the state where they are located. Additionally, beginning in 2000, respondents who were in either their senior year in high school or their first year of college were asked about what colleges and/or universities they had applied to. FICE codes are provided for these colleges and universities.

Beginning in 2002, the codes provided for colleges applied to and college attended are UNITID codes from the Integrated Postsecondary Education Data System (IPEDS) database rather than the FICE codes used in previous rounds. A crosswalk between FICE codes and UNITID codes is available in the IPEDS database. For cases where a UNITID code was unavailable but a FICE code existed, the FICE code is provided. A code of 999999 was assigned to cases where neither a FICE code nor a UNITID code could be found for a given college or university.

Beginning in 2002, the codes provided for colleges applied to and college attended are UNITID codes from the Integrated Postsecondary Education Data System (IPEDS) database rather than the FICE codes used in previous rounds. A crosswalk between FICE codes and UNITID codes is available in the IPEDS database.  For cases where a UNITID code was unavailable but a FICE code existed, the FICE code is provided.  A code of 999999 was assigned to cases where neither a FICE code nor a UNITID code could be found for a given college or university.

Child Support Variables. Information has been collected in all Young Adult rounds about the state in which child support agreements were reached. The FIPS codes for these states are included in the geocode variables for each year.  

Missing Data

Following the same convention as the NLSY79 Child and Young Adult public release data, missing data values on the geocode data files are coded –7 which indicates either a) a non-interview for a given year or b) respondents who have a missing value in the data for any variables from the County And City Data Book for the following reasons:

  1. Respondents who were in the military or who had an APO address;
  2. Respondents who were residing outside of the United States;
  3. Respondents whose state or county codes could not be determined.
  4. Respondents who reside in a county or SMSA/MSA for which there is missing data for that geographic location from the County And City Data Book for that specific item.
  5. Respondents who do not reside in an SMSA for any survey year 1994-2010 who will be missing SMSA level environmental variables for that year.
  6. Respondents whose state, county, and ZIP codes for any survey year 1994-2010 do not lead to an unambiguous SMSA designation. This generally applies only to a small number of respondents living in New England.

In the 1994-2002 geocode data, for the 1988 and 1994 metropolitan statistical area variables included in the data, GEO7 and GEO9A respectively, respondents with NECMA codes (i.e. respondents living in the New England states of Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) were not treated any differently than those residing elsewhere. The addition of the "Record Type" variable in the 1988 and the 1994 County And City Data Book  data files, GEO9 and GEO9C respectively in the Young Adult Data, allows the user to designate these cases as missing and remove them from the analysis, without having to conduct a county-by-county or state-by-state determination of NECMA/non-NECMA status. These data from the County and City Data Books are not available as part of the geocode data releases for survey rounds after 2002.

Use of the Geocode Files

Here are a few suggestions concerning the use of the NLSY79 Young Adult geocode files:  First, the data file and the accompanying documentation should be used in conjunction with the printed versions of the 1988 and 1994 County and City Data Book and the IPEDS codes so that researchers have complete information regarding variable descriptions and coding idiosyncrasies. Second, users should familiarize themselves with the NLSY79 Geocode Codebook Supplement. Also, the data must not be used in any fashion that would endanger the confidentiality of any sample member. To use these data, researchers must sign a written licensing agreement consenting to protect respondent confidentiality and to other conditions; agree not to make, or allow to be made, unauthorized copies of the geocode file; and further agree to indemnify the Center for Human Resource Research for all claims arising from misuse of the file.

Comparison to Other NLS Cohorts: Data on the respondent's area of residence are available for all cohorts. Geographic data for NLSY79 respondents is available for all survey rounds and fall into two categories: information on the main public file and more detailed information released on a restricted-access geocode CD. Geographic residence information for those NLSY79 children who resided with their mother can be inferred from the residence data of their mothers. The NLSY97 main created variables indicate whether the respondent lives in an urban or rural area, whether the respondent lives in a Metropolitan Statistical Area, and in which Census region the respondent resides. More detailed information is available on the restricted-use Geocode CD. Region of residence and geographic mobility of Original Cohort respondents are provided for most survey years.