October 20th, 2023 (Version 3.0)

Improved Weights

  • We use mortality data from the National Center of Health Statistics to create poststratification person-level weights for records in the CenSoc-Numident and CenSoc-DMF files. These weights are computed using age at death, year of death, sex, race, and state of birth. Previous versions of CenSoc-Numident and CenSoc-DMF data used Human Mortality Database mortality data to construct weights by sex and lexis triangle. The new weights will specifically account for racial and geographic disparities in coverage that may interact with year/age coverage trends. Calculation of weights is covered extensively in our technical documentation.

Simplified links

  • CenSoc 3.0 contains the same links between 1940 Census and mortality records as Version 2.1. However, in version 2.1 files, we published both “standard” and “conservative” links established by using the ABE matching algorithm. “Standard” matches comprise all links established by the basic ABE algorithm, which matches on name, birthplace (if present in both datasets), and birth year, allowing birth years to be discrepant by up to 2 years. “Conservative” matches are a subset of standard matches, and require matches on name/birthplace to be unique within and between datasets for a +/- 2 year interval around birth year. CenSoc 3.0 files include only conservative matches, as these are likely to be of higher quality than standard links and contain fewer false matches. Publishing only conservative links increases data quality and simplifies the user experience, as researchers no longer need to chose what types of links to use.

May 19th, 2022 (Version 2.1)

Improved Links

  • In Version 2.0, links between 1940 Census and mortality records (Numident or DMF) were established based on close agreement on birth year. The IPUMS birth year (BIRTHYR) reports the difference between 1940 and age at the time of the survey, rather than the exact birth year. In Version 2.1, matches were established based on age at census (i.e., 1940 minus birth year for those whose birth month is January through March; otherwise, 1939 minus birth year).
  • We corrected an error where marital status of women in Version 2.0 was measured by the MARRNO variable, which doesn’t cover all women in 1940. In Version 2.1, females are matched using MARST instead.
  • We matched to the most recent 2021 version of IPUMS 1940 Census (Version 2.0 was linked to the 2019 version of the Census).   

November 1st, 2020 (Version 2.0)

New linking methodology

  • We have updated our record linkage methodology for CenSoc Version 2.0. Specifically, we use the ABE method developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017, 2020)
  • Implemented standard and conservative variant on exact, standardized names. This allows researchers to test the robustness of their results across samples.

Expanded Variables

  • link_abe_exact_conservative indicates whether a record was linked using the conservative version of the ABE algorithm for the CenSoc-Numident and CenSoc-DMF files
  • weight_conservative: weight to HMD totals for the conservative subset of matches for CenSoc-Numident and CenSoc-DMF files
  • socstate_string and bpl_string variables now available for CenSoc-Numident and BUNMD files

Revised Variables

  • socstate: a more exhaustive coding scheme was used to match a Social Security number to the state in which it was issued, reducing the number of missing values