The CenSoc team is pleased to announce the release of the CenSoc-Numident and CenSoc-DMF version 3.0 datasets, which link the 1940 Census to Social Security mortality records. This version of the data contains new person weights calculated using NCHS data.

New Statistical Weights

In version 3.0 CenSoc-Numident and CenSoc-DMF data, we construct inverse inclusion-probability weights to National Center for Health Statistics (NCHS) Multiple Cause-of-Death totals. Multiple Cause-of-Death mortality files consist of microdata compiled from death certificates by state vital statistics offices, covering nearly all deaths occurring within the United States. We use this information to weight CenSoc deaths to population totals by race and birthplace, in addition to year of death, age of death, and sex. Previous versions of CenSoc-Numident and CenSoc-DMF datasets were weighted to Human Mortality Database (HMD) totals of deaths by year of birth, year of death, age of death and sex. In contrast to HMD weights, weighting to NCHS data has a few benefits:

  • Weighting by race and birthplace in conjunction with age and year of death helps correct for systematic disparities in coverage by geography and demographic characteristics, which may occur because of differential matching or absence of certain death records from public Social Security Administration mortality data. This is especially important to adjust for if the true distribution of death ages for certain groups is not accurately represented in Social Security or linked CenSoc data.
  • Previous weights did not account for post-1940 immigration. CenSoc data captures only persons present in the United States on census day of 1940, but HMD death totals include individuals who migrated to the US after this time. With the new weights, native-born groups are weighted only to native-born population death totals, and weights for non-native-born people are calculated separately.

While weighting to NCHS totals is advantageous in these respect, this weighting strategy presents difficulties for some groups, such as pre-1979 death records where birthplace data are not present in NCHS mortality files. We recommend researchers refer to the CenSoc 3.0 technical documentation for more information on how weights are calculated and adjusted.

Mortality Analyses

In many cases, we expect results obtained using these new weights to align closely with those using previous versions of weights and unweighted data. For example, an OLS analysis of education attainment on longevity using CenSoc-Numident data shows that results are largely robust to choice of weights, though point estimates for the lowest education groups are larger in magnitude using the new weights:

Figure 1: Association between educational attainment and longevity after age 65 in years for American-born men born 1905-1915 using CenSoc-Numident data. All models use middle school completion as the reference group and include cohort fixed effects. “HMD” weights refer to the previous version of CenSoc weights, while “NCHS” weights are those included in the new data version 3.0 release.

It is possible, however, to encounter unexpected results when using unweighted CenSoc data. To provide an example, the figure below shows the association between state of birth and age at death for women born from 1915-1920 in select states relative to those born in Alabama. Modeling this relationship without weights implies that women born in high life-expectancy states such as Minnesota have longevity similar to or even worse than women born in Alabama. Using HMD weights from CenSoc 2.1 data, which weight by sex and lexis triangle only, estimates are virtually identical. When the new NCHS weights are that account for birthplace and race are used, however, we see women from Minnesota, Washington, Michigan, and Massachusetts all live longer than those from Alabama.

Figure 2: Relationship between state of birth and longevity after age 65 in years for select states for women born 1915-1920; CenSoc-Numident data. All models use Alabama as the reference group and include cohort fixed effects. “HMD” weights refer to the previous version of CenSoc weights, while “NCHS” weights are those included in the new data version 3.0 release.

Story by Maria Osborne. For inquiries, contact censoc@berkeley.edu