Download Data
To access and download data, please visit the CenSoc Data Download Page. Data is available free of charge to all users.
The CenSoc-DMF and CenSoc-Numident datasets described below contain a HISTID variable, which allows users to merge the CenSoc data with the complete-count 1940 Census. Researchers can download the public version of the complete-count 1940 Census from IPUMS-USA. Alternatively, researchers can access a restricted version of the complete-count 1940 Census (with full names and street addresses) with permission via a secure server.
We highly recommend referring to the Documentation page for more information on downloading and working with these data.
Please use the following citation when using CenSoc data:
Joshua R. Goldstein, Monica Alexander, Casey Breen, Andrea Miranda González, Felipe Menares, Maria Osborne, Mallika Snyder, and Ugur Yildirim. CenSoc Mortality File: Version 3.0. Berkeley: University of California, 2023.
Summary of Datasets
The CenSoc project disseminates three primary mortality datasets free of charge: the CenSoc-DMF, the CenSoc-Numident, and the BUNMD. A summary of the features of the different CenSoc datasets is presented in the table below:
Variable | CenSoc-DMF | CenSoc-Numident | BUNMD |
---|---|---|---|
Males-only | ✓ | ||
All-Gender | ✓ | ✓ | |
Linked to Census Characteristics | ✓ | ✓ | |
SS Application Covariates | ✓ | ✓ | |
High Coverage of Deaths | 1975-2005 | 1988-2005 | 1988-2005 |
Size | 4.7 Million | 7.0 Million | 50 Million |
Demo Available | ✓ (42 thousand) | ✓ (65 thousand) | |
Additional Place of Birth and Death Variables Available | ✓ | ✓ |
The CenSoc Project also publishes World War II era American Army enlistment records, both unlinked and linked to other data sources, which are described generally in this post and at the bottom of this page.
CenSoc-DMF (download, codebook)
The CenSoc-DMF dataset links the 1940 census to the Death Master File, a collection of over 83 million death records reported to the Social Security Administration. This matched file includes only men, as surname changes due to marriage for women present challenges for accurate linkage. Our linking strategy relies on first name, last name, and year of birth. We use the ABE fully automated linking approach developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To work with this dataset, researchers must download and link the 1940 full-count Census sample from IPUMS USA on the HISTID variable.
Recommended reading for this dataset: CenSoc: Public Linked Administrative Mortality Records for Individual-level Research (especially Usage Notes section)
Related and Supplementary Files:
- Prelinked “demo” version of the CenSoc-DMF (download, codebook) with 42 thousand mortality records (~1% of the complete CenSoc-DMF dataset) and 20 mortality covariates from the 1940 census. While not conducive to high-resolution mortality research, this file is the easiest way to get started with the CenSoc-DMF dataset.
CenSoc-Numident (download, codebook)
The CenSoc-Numident dataset links the 1940 census to the National Archives’ public release of the Social Security Numident file (“NARA Numident”). Our linking strategy relies on first name, last name, year of birth, and place of birth. To link unmarried women, we use father’s last name as a proxy for women’s maiden name. We use the ABE fully automated linking approach developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To work with this dataset, researchers must download and link the 1940 full-count Census sample from IPUMS USA on the HISTID variable.
Recommended reading for this dataset: CenSoc: Public Linked Administrative Mortality Records for Individual-level Research (especially Usage Notes section)
Related and Supplementary Files:
- Prelinked “demo” version of the CenSoc-Numident with 65 thousand mortality records (~1% of the complete CenSoc-Numident dataset) and 15 mortality covariates from the 1940 census (download, codebook). While not conducive to high-resolution mortality research, this file is the easiest way to get started with the CenSoc-Numident dataset.
- Supplemental geography file with granular place of birth and place of death variables (download, codebook). These include state of death, county of birth & death, and more. Users can attach the supplementary geography file to the CenSoc-Numident dataset using the HISTID variable.
- Sibship files that identify sibling groups in the CenSoc-Numident (download, codebook)
BUNMD (download, codebook)
The Berkeley Unified Numident Mortality Database (BUNMD) is a cleaned and harmonized version of the NARA Numident file. The BUNMD is a single standalone file comprised of the most informative parts of the 60+ application, claim, and death files released by the National Archives. All records are linked by Social Security Number. Variables of interest include race, place of birth, state in which the Social Security card was applied for, and ZIP Code of residence at the time of death.
Recommended reading for this dataset: The Berkeley Unified Numident Mortality Database describes the original release and steps that were taken to create the BUNMD.
Related and Supplementary Files:
- The Supplemental Geography File (download, codebook) contains granular place of birth and place of death variables including state of death, county of birth & death, and more. Users can attach either supplementary file to the BUNMD using Social Security number.
- The BUNMD Cleaned Names File (download, codebook) provides cleaned and standardized names for decedents and their parents. Users can attach either supplementary file to the BUNMD using Social Security number.
- Sibship files that identify sibling groups in the BUNMD (download, codebook)
- For researchers interested in utilizing raw NARA Numident data, these have been published by Anthony Wray and are freely available to download at: https://doi.org/10.3886/E207202V1.
Army Enlistment Records (download)
The CenSoc WWII Army Enlistment Dataset (codebook) is a cleaned and harmonized version of the National Archives and Records Administration’s Electronic Army Serial Number Merged File, ca. 1938 – 1946 (2002). It contains enlistment records for over 9 million men and women who served in the United States Army, including the Army Air Corps, Women’s Army Auxiliary Corps, and Enlisted Reserve Corps. It is a rich source of data on enlistee sociodemographic information, military service, and anthropometry.
We also publish links between men in the CenSoc WWII Army Enlistment Dataset, Social Security Administration mortality data, and the 1940 Census:
- The CenSoc Enlistment-Census-1940 file (codebook) links enlistment records to the complete 1940 Census, and may be merged with IPUMS census data using the HISTID identifier variable.
- The CenSoc Enlistment-Numident file (codebook) links enlistment records to the BUNMD, and the CenSoc Enlistment-DMF file (codebook) links enlistment records to the Social Security Death Master File. For enlistment records in the Enlistment-Numident and Enlistment-DMF datasets that have been independently and additionally linked to the 1940 Census, we include the HISTID identifier variable that can be used to merge the data with IPUMS census data.
Recommended reading for this dataset:
- The technical documentation provides more information on the creation of the CenSoc WWII Army Enlistment Dataset, variables, linked enlistment records, and choosing which enlistment dataset to use.
- If using linked mortality data, we recommend consulting the Usage Notes section of CenSoc: Public Linked Administrative Mortality Records for Individual-level Research for mortality estimation strategies
A summary comparison of Enlistment datasets is presented in the table below:
Variable | WWII Army Enlistment Dataset | Enlistment-Census-1940 | Enlistment-Numident | Enlistment-DMF |
Size | 9.0 million | 2.6 million | 1.7 million | 1.9 million |
All-Gender | ✓ | |||
Linked to Social Security Mortality Records | ✓ | ✓ | ||
Linked to 1940 Census | ✓ (all records) | ✓ (762 thousand records) | ✓(779 thousand records) | |
Mortality Coverage Window | Not applicable | Not applicable | 1988-2005 | 1975-2005 |
Social Security Application Covariates | ✓ |