HISTID
variableNote: The 1940 census file is large (10+ GB) — we recommend having an appropriate workflow for handling large datasets in R before getting started.
Download the CenSoc-DMF or CenSoc-Numident file from: https://censoc-download.demog.berkeley.edu
Whether the CenSoc-DMF or CenSoc-Numident file is a better choice for your analysis will depend on the research question. See the data page for more information.
The CenSoc datasets link the 1940 Census to the mortality records.
IPUMS provides integrated census and survey data from across the world free of charge to the broader research community. To access the IPUMS-USA data collection, you first need to register.
Once you have an account, go to https://usa.ipums.org/usa/ and under ‘CREATE YOUR CUSTOM DATA SET’ click the ‘GET DATA’ button.
Select the 1940 Full Count Census:
This will take you back to the variable selection page.
All extracts will by default include HISTID
, the variable used to the link the census file to the CenSoc file.
Choose variables for your analysis. For example, you could select RACE
, which is under PERSONAL → RACE, ETHNICITY, AND NATIVITY.
The IPUMS “select cases” feature allows users to conditionally choose which states to include in an extract. This can be helpful if you are only interested in a subset of the Census. For example, if you are working with the CenSoc-DMF file, which includes only men, it makes sense to restrict your cases to men-only.
To work with IPUMS data in R, it is usually easiest to download the data as a CSV file. To do this, on the EXTRACT REQUEST page, next to ‘DATA FORMAT’, click Change, select ‘Comma delimited (.csv)’ and click the submit button.
You can work with other formats in R as well, but CSV is generally the easiest. The only downside is that variable values are numeric codes. The IPUMSR package helps assign variable labels, value labels, and more.
Once you are happy with your dataset, click the ‘SUBMIT EXTRACT’ button. Because it is full count data, you will need to agree to special usage terms. Click OK to extract the dataset.
Given the size of the file, the processing may take several hours. Once the file is ready, you will receive an email from IPUMS with a link to download the resulting dataset. As all IPUMS datasets come compressed, the data needs to be uncompressed before it can be used. For more information on IPUMS extracts, please see IPUMS-USA.
After downloading the 1940 Census and CenSoc files, the files must be merged before analysis. The HISTID
variable — available in both CenSoc and Census files — can be used to merge the two datasets.
Sample R code: