The CenSoc team is pleased to announce the release of CenSoc WWII Army Enlistment data sets. These data sets link the National Archives’ public release of WWII Army Enlistment Records to mortality data and the 1940 census.
What are these data sets?
The CenSoc-DMF-Enlistment and CenSoc-Numident-Enlistment files contain WWII-era Army Enlistment data alongside data from the Social Security Death Master File and National Archives NUMIDENT files, respectively. Each file also contains a unique historical identifier variable that allows users to match the data to full-count census data from IPUMS USA.
Enlistment records have been matched to mortality records and census records with the standard ABE record linkage algorithm developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017, 2020). This is the same linking method used for existing CenSoc version 2.0 data files and is explained further in our release post.
WWII Army enlistment data come from the National Archives’ World War II Army Serial Number Merged File. This is a collection of about 9 million Army, Army Reserve Corps, and Women’s Auxiliary Army Reserve Corps enlistment records ca. 1938 – 1947. A history of these data can be found in the National Archives’ Prologue magazine. Only men are included in the linked CenSoc enlistment data sets. We include a select subset of variables present in the original Army enlistment file, including those related to military service (e.g., Army rank, term of service), personal characteristics (e.g., marital status, place of residence), body measurements (height and weight), and useful metadata (time and place of enlistment).
Codebooks, available on the CenSoc Data Page, provide a full description of variables. While most enlistment variables have been cleaned or partially cleaned, some data quality issues remain, particularly in the height and weight fields. Such fields were used for multiple purposes (Army General Classification Test score is recorded in the same columns as weight on enlistment punch cards, for instance) and implausible values are abundant. More information can be found in our codebooks, while Ferrie et al. (2012) provide a strategy for distinguishing body weights from test scores.
How to get started
Download the file(s) of your choosing. See the CenSoc data page for links to the most recent data version download. In the CenSoc-DMF-WW2-Army-Enlistment file, there will be variables ending in ‘_dmf’ and ‘_enlistment’. In the CenSoc-Numident-WW2-Army-Enlistment file, there will be variables ending in ‘_numident’ and ‘_enlistment’. These suffixes tell you which data set that variable comes from: either the CenSoc Numident, CenSoc DMF, or Army Enlistment data.
To add census covariates to a data set, download the desired variables from the 1940 full-count census from IPUMS USA, and match on the HISTID variable.
Mortality analyses with enlistment data
CenSoc Army Enlistment data sets provide a rich source of covariates related to the life course, mortality, and WWII veterans’ military experience. These data sets may be used on their own to study mortality outcomes or can be further supplemented with 1940 census data.
Unlike previous CenSoc data sets, these enlistment data sets are not weighted to align with a known population. There are factors affecting selection into military service, such as education and health, that are also related to mortality. There also may be attenuation bias introduced by false matches, since these records are linked by the standard ABE matching process rather than a conservative variant designed to minimize false matches. (Evidence of this bias in CenSoc data sets can be seen here). When working with CenSoc enlistment data, consideration should be given to the population they come from, as well as forms of bias potentially introduced in the linking process.
Story by Maria Osborne. For questions, please email censoc@berkeley.edu.