This tutorial will walk through the steps of visualizing differences in \(e_{65}\) — life expectancy conditional on surviving to age 65 — by ZIP Code using the Berkeley Unified Numident Mortality Database
Specifically, this vignette will walk through the steps of (1)
calculating differences by ZIP Code and (2) creating a choropleth map in
R using the choroplethr
package. Choropleths are thematic
maps where each geographic region is colored or shaded according to some
metric.
Though this analysis focuses on the New York City borough of Manhattan, the approach can be replicated for other regions to similar effect.
There are several geographic variables in the Berkeley Unified Numident Mortality Database (BUNMD).
bpl
variable reports birthplace. The variable is
coded on the state-level for persons born in the United States and
country level for those born outside the United States.
zip_residence
variable reports the 9-digit ZIP Code
of residence at the time of death for a portion of records. ZIP Codes,
while not the most robust geographic unit of analysis, can offer
insights into a variety of spatial questions.
## library packages
library(tidyverse)
library(choroplethrZip)
library(choroplethr)
library(broom)
library(viridis)
library(data.table)
## Read in data
<- fread("/data/josh/CenSoc/censoc_data/bunmd_v2/bunmd_v2.csv") bunmd
We will look at the subsample of the BUNMD with high death coverage — deaths occurring between 1988-2005. For our analysis, we will focus on 5-digit ZIP Codes, rather than the full 9-digit ZIP Codes.
## Read ZIP Codes
data("zip.regions")
## Filter to ZIP Codes in manhattan
<- c("new york")
county
<- zip.regions %>%
manhattan.zipcodes filter(state.name == "new york") %>%
filter(county.name %in% county)
## Select first 5 digits of zip_residence variable
## filter to only include high coverage Manhattan ZIP Codes
<-
bunmd.manhattan %>%
bunmd mutate(zip5 = as.numeric(substr(zip_residence, 1, 5))) %>%
filter(zip5 %in% manhattan.zipcodes$region) %>%
filter(byear %in% c(1910:1919)) %>%
filter(dyear %in% 1988:2005)
## prepare for regression
<- bunmd.manhattan %>%
bunmd.manhattan mutate(byear = as.factor(byear)) %>%
mutate(byear = relevel(byear, ref = "1910")) %>%
mutate(zip5 = as.factor(zip5)) %>%
mutate(zip5 = relevel(zip5, ref = "10463"))
Regression on age at death is an easy and effective way to analyze the BUNMD mortality data. Regression coefficients tell the effect of covariates on the mean age at death. Because we are only observing a narrow window of deaths, the left and right truncation ages vary by cohort. It is important to include fixed effect terms for each year of birth.
Because each birth cohort has a different age of left and right truncation, it is important to include fixed effect terms for each year of birth. Models of the form:
\[ Age\_at\_death = birth\_year\_dummy + covariates\_of\_interest\]
provide estimates of the effect of the covariates on the age of death in the sample, controlling for birth cohort truncation effects.
The code below first fits a linear regression with fixed-effect terms for each year of birth. It then extracts the regression coefficients for each ZIP code.
Note: When running the regression, we weight each observation via the
ccweight
variable — a post-stratification weight to Human
Mortality Database death totals by age, period, and sex.
## Linear model predicting age at death from ZIP and byear
<- lm(death_age ~ zip5 + byear,
manhattan.lm data = bunmd.manhattan,
weight = ccweight)
## Put model results into a data.frame
<- tidy(manhattan.lm)
manhattan.lm.df
## Select coefficients and ZIP Codes
<- manhattan.lm.df %>%
manhattan.lm.df select(term, estimate) %>%
filter(str_detect(term, "zip")) %>%
mutate(zip = substr(term, 5, 9)) %>%
select(region = zip, value = estimate)
The choroplethr
package provides helpful functions to
simplify the construction of choropleth maps. The code below constructs
a choropleth map where each ZIP code in Manhattan is colored according
to its coefficient differentials.
## Plot Manhattan
= 36061
manhattan.fip = ZipChoropleth$new(rev(manhattan.lm.df))
choro $title = "Manhattan E(65) Differentials by ZIP Code"
choro$set_zoom_zip(state_zoom = NULL, county_zoom = manhattan.fip, msa_zoom = NULL, zip_zoom = NULL)
choro<- choro$render() +
manhattan.plot theme(text=element_text(size=15)) +
scale_fill_viridis(option="magma", discrete = "true", name="Difference in E65")
manhattan.plot
When interpreting these mortality differentials, it is important to remember that we are only observing deaths for a truncated window. This truncation excludes the tails of the distribution and reduces the average difference between groups, likely creating a downward bias on the estimated effects of any covariates.
Please see our our BUNMD paper for more information.