This post goes over how to download and use Census and ACS data from within R using the tidycensus
and acs
packages. When possible I would suggest using the tidycensus
package as the functions are much easier to work with and provide a cleaner output. The tidycensus
package, unfortunately, does not cover the entirity of the census geographies and that is when we must use the acs
package. They both access the same data so reults should not differ if you decide to use one package over another.
library(acs)
library(jsonlite)
library(stringr)
library(knitr)
library(DT)
library(sf)
library(tidycensus)
library(tidyverse)
# api key needs to be your own ACS api key!
# you can get one here http://api.census.gov/data/key_signup.html
apiKey <- read_json("~/Documents/PopDataDemo/keys/acs.json")
## Error in open.connection(con, "rb"): cannot open the connection
census_api_key(apiKey)
## Error in Sys.setenv(CENSUS_API_KEY = key): object 'apiKey' not found
View Variables for a particular Census/5 year ACS using tidycensus
Trying to figure out exactly what you want from the census data for any given year is probably the most tedious part of navigating the census and ACS data. There is a lot of information in there and the structure of it is not exactly intuitive. Lets say that you know that you want to look at the data from the 2014 five year ACS and you want to find household income data broken down by race. We can start by downloading the variable name sheet which has on the order of 10's of thousands of different variables and start navigating from there. I could search in the interactive table code below I can narrow my search results a bit by using “income race” in the search box, a working demo found here. For this demo I am going to use the B19013
median household income variables.
v14acs <- load_variables(2014, "acs5", cache=T)
## Search Interactively with this code
# v14acs %>%
# datatable (style="bootstrap")
v14acs %>%
filter(grepl("B19013", name)) %>%
as.data.frame %>%
head %>%
kable
name | label | concept |
---|---|---|
B19013_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) |
B19013A_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE HOUSEHOLDER) |
B19013B_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER) |
B19013C_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER) |
B19013D_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER) |
B19013E_001 | Estimate!!Median household income in the past 12 months (in 2014 Inflation-adjusted dollars) | MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER) |
In order to get those values for the household income data we need to build a vector of variables that we want from the acs that follow the naming structure mentioned above. We create it below and when we make the call to the ACS api using get acs we can say that we want these values at the county level with the geometries so we can map the data.
# the variables that I want in my download
incomeVars <- paste0("B19013", c("", LETTERS[1:9]), "_001E")
# downolad data the capture output stuff isnt neccesary its just to get rid
# of the output
t <- capture.output(rawIncomeDF <- get_acs(
geography="county", # I want county level data
variables=incomeVars, # iwant the variables from this list
year=2014, # from the 2014 acs
geometry=TRUE, cache=T)) # I also want the geometry data for mapping later
## Error in get_acs(geography = "county", variables = incomeVars, year = 2014, : A Census API key is required. Obtain one at http://api.census.gov/data/key_signup.html, and then supply the key to the `census_api_key` function to use it throughout your tidycensus session.
hispanicSPDF <- rawIncomeDF %>% # take my sf structure
filter(variable == "B19013I_001") %>% # isolate hispanic income data
filter(endsWith(NAME, ", California")) %>% # only look at CA
st_zm %>% # remove the third dimension which is probably altitude
as("Spatial") # convert to spatial data frame
## Error in eval(lhs, parent, parent): object 'rawIncomeDF' not found
hispanicSPDF$id <- row.names(hispanicSPDF) # add a new column id from row names
## Error in row.names(hispanicSPDF): object 'hispanicSPDF' not found
hispanicSPDF %>%
fortify %>% # make a regular data frame thats plottable
left_join(hispanicSPDF@data) %>% # merge the original values back on
ggplot(aes(x=long, y=lat)) +
geom_polygon(aes(group=group, fill = estimate)) +
geom_path(aes(group=group), size=.1) +
scale_fill_distiller(palette = "Spectral", direction=1) +
theme_void() +
ggtitle("Hispanic Median Household Income")
## Error in eval(lhs, parent, parent): object 'hispanicSPDF' not found
The tidy_census
package is great but it isnt comprehensive and sometimes we need to rely on the acs
package which basically provides an R interface to ACS api. This allows us to acces geographies such as school districts. Below is a demo of using the acs
package to pull some data and then we can plot the distribution of Hispanic Household Incomes by school district.
# make the unified and secondary geographies seperatly
schoolsCA <- list(
unified=geo.make(
state="CA",
school.district.unified="*"),
secondary=geo.make(
state="CA",
school.district.secondary="*")
)
# pull both data sets
schoolResults <- lapply(schoolsCA, function(x){
acs.fetch(
endyear=2014,
span=5,
geography=x,
variable=str_sub(incomeVars, 1, -2),
key=apiKey)
}
)
## Error in paste0("https://api.census.gov/data/", endyear, "/acs/", dataset, : object 'apiKey' not found
# clean and merge them
schoolIncomeDF <- bind_rows(
schoolResults$unified@estimate %>%
as.data.frame %>%
mutate(`School District`=row.names(.)),
schoolResults$secondary@estimate %>%
as.data.frame %>%
mutate(`School District`=row.names(.)))
## Error in eval(lhs, parent, parent): object 'schoolResults' not found
# pllot the log distribution
schoolIncomeDF %>%
ggplot(aes(x=B19013I_001)) +
geom_density() +
theme_classic() +
xlab("Income in Dollars") +
ylab("Density") +
ggtitle("Hispanic Household Income Across California School Districts")
## Error in eval(lhs, parent, parent): object 'schoolIncomeDF' not found