Visualizing Mortality Differentials by ZIP Code

This tutorial will walk through the steps of visualizing differences in \(e_{65}\) — life expectancy conditional on surviving to age 65 — by ZIP Code using the Berkeley Unified Numident Mortality Database

Specifically, this vignette will walk through the steps of (1) calculating differences by ZIP Code and (2) creating a choropleth map in R using the choroplethr package. Choropleths are thematic maps where each geographic region is colored or shaded according to some metric.

Though this analysis focuses on the New York City borough of Manhattan, the approach can be replicated for other regions to similar effect.

Geography in BUNMD

There are several geographic variables in the Berkeley Unified Numident Mortality Database (BUNMD).

bpl variable reports birthplace. The variable is coded on the state-level for persons born in the United States and country level for those born outside the United States.
zip_residence variable reports the 9-digit ZIP Code of residence at the time of death for a portion of records. ZIP Codes, while not the most robust geographic unit of analysis, can offer insights into a variety of spatial questions.

## library packages
library(tidyverse)
library(choroplethrZip)
library(choroplethr)
library(broom)
library(viridis)
library(data.table)

## Read in data
bunmd <- fread("/data/josh/CenSoc/censoc_data/bunmd_v2/bunmd_v2.csv")

Preparing Data for Analysis

We will look at the subsample of the BUNMD with high death coverage — deaths occurring between 1988-2005. For our analysis, we will focus on 5-digit ZIP Codes, rather than the full 9-digit ZIP Codes.

## Read ZIP Codes
data("zip.regions")
 
## Filter to  ZIP Codes in manhattan
county <- c("new york")

manhattan.zipcodes <- zip.regions %>%   
  filter(state.name == "new york") %>% 
  filter(county.name %in% county)

## Select first 5 digits of zip_residence variable
## filter to only include high coverage Manhattan ZIP Codes
bunmd.manhattan <- 
  bunmd %>% 
  mutate(zip5 = as.numeric(substr(zip_residence, 1, 5))) %>% 
  filter(zip5 %in% manhattan.zipcodes$region) %>% 
  filter(byear %in% c(1910:1919)) %>% 
  filter(dyear %in% 1988:2005)

## prepare for regression
bunmd.manhattan <- bunmd.manhattan %>% 
  mutate(byear = as.factor(byear)) %>% 
  mutate(byear = relevel(byear, ref = "1910")) %>% 
  mutate(zip5 = as.factor(zip5)) %>% 
  mutate(zip5 = relevel(zip5, ref = "10463"))

Ordinary Least Squares Regression

Regression on age at death is an easy and effective way to analyze the BUNMD mortality data. Regression coefficients tell the effect of covariates on the mean age at death. Because we are only observing a narrow window of deaths, the left and right truncation ages vary by cohort. It is important to include fixed effect terms for each year of birth.

Because each birth cohort has a different age of left and right truncation, it is important to include fixed effect terms for each year of birth. Models of the form:

\[ Age\_at\_death = birth\_year\_dummy + covariates\_of\_interest\]

provide estimates of the effect of the covariates on the age of death in the sample, controlling for birth cohort truncation effects.

The code below first fits a linear regression with fixed-effect terms for each year of birth. It then extracts the regression coefficients for each ZIP code.

Note: When running the regression, we weight each observation via the ccweight variable — a post-stratification weight to Human Mortality Database death totals by age, period, and sex.

## Linear model predicting age at death from ZIP and byear 
manhattan.lm <- lm(death_age ~ zip5 +  byear,
                     data = bunmd.manhattan,
                     weight = ccweight) 

## Put model results into a data.frame 
manhattan.lm.df <- tidy(manhattan.lm)

## Select coefficients and ZIP Codes
manhattan.lm.df <- manhattan.lm.df %>%
  select(term, estimate) %>% 
  filter(str_detect(term, "zip")) %>% 
  mutate(zip = substr(term, 5, 9)) %>% 
  select(region = zip, value = estimate)

Choropleth Map

The choroplethr package provides helpful functions to simplify the construction of choropleth maps. The code below constructs a choropleth map where each ZIP code in Manhattan is colored according to its coefficient differentials.

## Plot Manhattan

manhattan.fip = 36061
choro = ZipChoropleth$new(rev(manhattan.lm.df))
choro$title = "Manhattan E(65) Differentials by ZIP Code"
choro$set_zoom_zip(state_zoom = NULL, county_zoom = manhattan.fip, msa_zoom = NULL, zip_zoom = NULL)
manhattan.plot <- choro$render() +
    theme(text=element_text(size=15)) + 
  scale_fill_viridis(option="magma", discrete = "true", name="Difference in E65")

manhattan.plot

Interpretation

When interpreting these mortality differentials, it is important to remember that we are only observing deaths for a truncated window. This truncation excludes the tails of the distribution and reduces the average difference between groups, likely creating a downward bias on the estimated effects of any covariates.

Please see our our BUNMD paper for more information.