Abstract
There are striking disparities in longevity across sociodemographic groups in the United States. Yet, can sociodemographic characteristics meaningfully explain individual-level variation in longevity? Here, we leverage machine-learning algorithms and a large-scale administrative dataset (N = 122,651) to predict individual-level longevity using an array of social, economic, and demographic predictors. Our top-performing model explains only 1.4% of the variation in age of death, demonstrating that human longevity is highly unpredictable using sociodemographic characteristics alone. These results underscore the limitations of using machine learning to predict major life outcomes and emphasize the need to better account for stochastic processes in demographic theory.