Download Data

To access and download data, please visit the CenSoc Data Download Page. Data is available free of charge to all users.

The CenSoc-DMF and CenSoc-Numident datasets described below contain a HISTID variable, which allows users to merge the CenSoc data with the complete-count 1940 Census. Researchers can download the public version of the complete-count 1940 Census from IPUMS-USA. Alternatively, researchers can access a restricted version of the complete-count 1940 Census (with full names and street addresses) with permission via a secure server.

We highly recommend referring to the Documentation page for more information on downloading and working with these data.

Please use the following citation when using CenSoc data:

Joshua R. Goldstein, Monica Alexander, Casey Breen, Andrea Miranda González, Felipe Menares, Maria Osborne, Mallika Snyder, and Ugur Yildirim. CenSoc Mortality File: Version 3.0. Berkeley: University of California, 2023.

Summary of Datasets

The CenSoc project disseminates three mortality datasets free of charge: the CenSoc-DMF, the CenSoc-Numident, and the BUNMD. A summary of the features of the different CenSoc datasets is presented in the table below:

VariableCenSoc-DMFCenSoc-NumidentBUNMD
Males-only
All-Gender
Linked to Census Characteristics
SS Application Covariates
High Coverage of Deaths1975-20051988-20051988-2005
Size 4.7 Million7.0 Million50 Million
Demo Available✓ (42 thousand)✓ (65 thousand)
WWII Army Enlistment Links Available
Additional Place of Birth and Death Variables Available
Notes: SS Application Covariates means that additional covariates are available from Social Security Applications, such place of birth, place of death, race, gender, and parents’ names.

We also publish World War II era American Army enlistment records, both unlinked and linked to other data sources, which are described generally in this post and at the bottom of this page.

CenSoc-DMF (download, codebook)

The CenSoc-DMF dataset links the 1940 census to the Death Master File, a collection of over 83 million death records reported to the Social Security Administration. This matched file includes only men, as surname changes due to marriage for women present challenges for accurate linkage. Our linking strategy relies on first name, last name, and year of birth. We use the ABE fully automated linking approach developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To work with this dataset, researchers must download and link the 1940 full-count Census sample from IPUMS USA on the HISTID variable.

A prelinked “demo” version of the file (download, codebook) is available with 42 thousand mortality records (~1% of the complete CenSoc-DMF dataset) and 20 mortality covariates from the 1940 census. While not conducive to high-resolution mortality research, this file is the easiest way to get started with the CenSoc-DMF dataset.

CenSoc-Numident (download, codebook)

The CenSoc-Numident dataset links the 1940 census to the National Archives’ public release of the Social Security Numident file (“NARA Numident”). Our linking strategy relies on first name, last name, year of birth, and place of birth. To link unmarried women, we use father’s last name as a proxy for women’s maiden name. We use the ABE fully automated linking approach developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To work with this dataset, researchers must download and link the 1940 full-count Census sample from IPUMS USA on the HISTID variable.

A prelinked “demo” version of the file (download, codebook) is available with 65 thousand mortality records (~1% of the complete CenSoc-Numident dataset) and 15 mortality covariates from the 1940 census. While not conducive to high-resolution mortality research, this file is the easiest way to get started with the CenSoc-Numident dataset.

A supplemental file with granular place of birth and place of death variables is available for this file (download, codebook). These include state of death, county of birth & death, and more. Users can attach the supplementary geography file to the CenSoc-Numident dataset using the HISTID variable.

BUNMD (download, codebook)

The Berkeley Unified Numident Mortality Database (BUNMD) is a cleaned and harmonized version of the NARA Numident file. The BUNMD is a single standalone file comprised of the most informative parts of the 60+ application, claim, and death files released by the National Archives. All records are linked by Social Security Number. Variables of interest include race, place of birth, state in which the Social Security card was applied for, and ZIP Code of residence at the time of death.

For more information, see the paper “The Berkeley Unified Numident Mortality Database,” which describes the original release and steps that were taken to create the BUNMD.

Two supplemental files are available for the BUNMD. The BUNMD Supplemental Geography File (download, codebook) contains granular place of birth and place of death variables including state of death, county of birth & death, and more. The BUNMD Cleaned Names File (download, codebook) provides cleaned and standardized names for decedents and their parents. Users can attach either supplementary file to the BUNMD using Social Security number.

Army Enlistment Records (download)

The CenSoc WWII Army Enlistment Dataset (codebook) is a cleaned and harmonized version of the National Archives and Records Administration’s Electronic Army Serial Number Merged File, ca. 1938 – 1946 (2002). It contains enlistment records for over 9 million men and women who served in the United States Army, including the Army Air Corps, Women’s Army Auxiliary Corps, and Enlisted Reserve Corps. It is a rich source of data on enlistee sociodemographic information, military service, and anthropometry.

We also publish links between men in the CenSoc WWII Army Enlistment Dataset, Social Security Administration mortality data, and the 1940 Census. The CenSoc Enlistment-Census-1940 file (codebook) links enlistment records to the complete 1940 Census, and may be merged with IPUMS census data using the HISTID identifier variable. The CenSoc Enlistment-Numident file (codebook) links enlistment records to the BUNMD, and the CenSoc Enlistment-DMF file (codebook) links enlistment records to the Social Security Death Master File. For enlistment records in the Enlistment-Numident and Enlistment-DMF datasets that have been independently and additionally linked to the 1940 Census, we include the HISTID identifier variable that can be used to merge the data with IPUMS census data. 

For more information on the creation of the CenSoc WWII Army Enlistment Dataset, variables, linked enlistment records, and choosing which enlistment dataset to use, please refer to our technical documentation. A summary comparison of Enlistment datasets is presented below:

VariableWWII Army Enlistment DatasetEnlistment-Census-1940 Enlistment-NumidentEnlistment-DMF
Size9.0 million2.6 million1.7 million1.9 million
All-Gender
Linked to Social Security Mortality Records
Linked to 1940 Census✓ (all records)✓ (762 thousand records) ✓(779 thousand records)
Mortality Coverage WindowNot applicableNot applicable1988-20051975-2005
Social Security Application Covariates