The CenSoc team is happy to announce the release of Berkeley Unified Numident Mortality Database (BUNMD) and CenSoc-Numident supplemental geography files. These files contain additional geographic information for place of birth and/or place of death for the majority of decedents in the BUNMD and the CenSoc-Numident file. These supplemental variables can be attached to the BUNMD using Social Security number, and to the CenSoc-Numident using the HISTID identifier variable.
Place of death variables
ZIP code of residence from Social Security Numident death records is currently published in the BUNMD and CenSoc-Numident datasets. However, these uncleaned ZIP codes may be any number of digits, contain invalid characters, or otherwise not be valid postal codes. Using United States Postal Service data, we map valid 5-digit ZIP codes to state, county (names and FIPS codes), city, census region, and country. Of 49 million records in the BUNMD, approximately 35 million contain a ZIP code that can be mapped to state and county information. About 6.3 out of 7 million CenSoc-Numident records contain valid ZIP codes of death.
Place of birth variables
Social Security Numident application and claims records contain a 12-character city of birth field. We map these uncleaned, unstandardized strings onto Geographic Names Information System (GNIS) codes using a crosswalk developed for the paper:
The GNIS contains numeric codes identifying millions of current and historical geographic features, included populated places, in the United States. GNIS codes for birthplace are then mapped to county names and 5-digit FIPS county codes using a database from the U.S. Board on Geographic Names. Approximately 30 million records in the BUNMD and 6.5 million records in the CenSoc-Numident contain valid birth city strings that could be matched to a GNIS location in the United States.
Analyses with geography files
These data will allow researchers to analyze geographic mortality variation at a wide variety of levels, including state, county, and ZIP code. The relevance of location to mortality can be assessed both in terms of where individuals are born and where they live out their final days. For millions of individuals where both place and birth and place of death are available, researchers can easily identify interstate and intercounty migrants.
To illustrate an example of the type of CenSoc research made possible with this data release, we use the BUNMD supplemental geography file to model differences in age at death by county/parish of birth for women in select southern states using linear regression. The results are visualized below with a map created using the urbnmapr R package. This map shows that counties/parishes where women have the shortest average lifespans are clustered along the Mississippi River. In the state of Mississippi itself, women from the Yazoo–Mississippi Delta region in the western part of the state tend to die earlier than those born in the east. Many regions with the lowest longevity have prominent histories of plantation slavery and sharecropping, and today have some of the highest county-level poverty rates within these states and the country at large.
For codebooks and links to download the BUNMD and CenSoc-Numident supplementary geography files, visit the CenSoc Data page.
Story by Maria Osborne. For questions, contact censoc@berkeley.edu