Many GIS analysts work with American Community Survey (ACS) data from the US Census Bureau. This data is based on a sample of the population, therefore—like all survey data—it contains error due to sampling.
The census bureau publishes margins of error (MOE) along with every estimate it publishes, which is incredible. However, when users view the MOE for a specific measure on a specific tract, they can erroneously dismiss the entire ACS dataset. Even more unfortunately, users may not know that the typical geographic analysis of ACS data increases the reliability of the data by a noticeable amount.
[To learn more about MOEs, read “The Importance of Margins of Error and Mapping” in the summer 2021 issue of ArcUser.]
The census bureau can publish estimates down to the census tract and even block group level by coarsening and aggregating data across years. Pooling together five years of data produces estimates at fine geographic levels. In addition to coarsening the data across time, data can also be coarsened across space.
While this may seem like heresy to many geographers, there are benefits to coarsening or aggregating data geographically. Many people need data disaggregated by race/ethnicity, gender, income, and other dimensions. Coarsening can make this data more reliable.
Some cities have gone through robust validation processes to create their own geographies that are coarser than census tracts. For example, New York City has its own Neighborhood Tabulation Areas, and Houston has its Super-Neighborhoods. You can use geography to test your own groups of tracts. You can call them anything you like, but I’ve been calling them super tracts.
The ACS Summarization App
A lightweight app, ACS Summary Statistics with Margin of Error, is available at no charge to help you get a quick sense of how many tracts you’ll need to aggregate to get an estimate that meets your desired level of reliability. The left panel of the app lets you search and select layers and attributes. If you select a count, the tract centroids will be symbolized by size. If you select percentage, the tract centroids will be symbolized by color.
Use the sketch tools on the top of the map to create your super tract. First, draw a shape around the neighborhood or district you know you want to analyze. Get the basic shape on the map to start.
The right panel of the app lets you see summary statistics for the chosen layers and attributes. The gauge shows the coefficient of variation (CV) for the selection, which is calculated on the fly. [CV is a measure of the relative spread of the values.] Watch the gauge change as you modify your sketch. It characterizes the reliability of the CV as high, medium, or low. The lower the CV, the higher the reliability. Conversely, the higher the CV, the lower the reliability.
Best Practices for Tract Summarization
When aggregating tracts to improve reliability, try to
- Preserve patterns in the map.
- Be mindful of tracts with an estimate of zero.
- Use official estimates when available.
Preserve Patterns in the Map
Try to follow the patterns in the map when combining. For example, combine high values with other high values and lower values with other low values. The symbology in this app can help, but your own local on-the-ground knowledge is invaluable here. I realize this is hard when aggregating just a few tracts while balancing other considerations. However, combining areas with wildly differing characteristics will smooth out the numbers and become less informative.
Be Mindful of Tracts with an Estimate of Zero
Tracts with an estimate of zero are symbolized by the transparent teal symbols (for counts). These tracts are generally in places such as airports, cemeteries, and open land, but they can be anywhere. Even zero estimates have MOEs, which means there may be a few individuals in your population of interest in these tracts. They will not add anything to your estimate, but because they have a nonzero MOE, they could add slightly to your error.
Use Official Estimates when Available
In addition, this aggregation method will only give you an approximation for both the estimate and the MOE using the approximation formulas in Understanding and Using American Community Survey Data. The census bureau produces official estimates for school districts, incorporated places, county subdivisions, congressional districts, and many other areas. If you’re aggregating up to get values for a defined census geography, such as a city boundary, check data.census.gov for an official estimate you can use. Not only does the official estimate use the true boundary, it will also have a lower MOE than you’d get from aggregating tracts, since it’s not being approximated.
Reliability and the Nature of the Estimate
Adjust your reliability comfort level depending on the nature of the estimate. Ideally, you would like all estimates to have high reliability with a low coefficient of variation. However, this will be hard to achieve for very small populations, such as these:
- Children in the care of grandparents
- Female veterans
- Rental housing units that are mainly heated by solar energy
In cases like these, ask yourself if you could live with a medium level of reliability if that meant obtaining finer geographic detail. Also, neighboring tracts can have different levels of reliability for the same attribute. Sometimes tracts do have reliable estimates, so there’s no need to aggregate unnecessarily and lose the geographic detail.
Start with the ACS Summarization App
The app is designed to give you a jumping-off point for creating aggregating tracts. It’s designed to help you iterate quickly and dynamically, to get a quick sense of how much you’ll have to aggregate up geographically to meet your reliability comfort level. Note that the final super tract does not persist outside of the app.
Create your final super tract using geoprocessing tools such as merge and dissolve. The app uses centroids for faster performance, but you might want to use the polygon versions of these layers that contain tract boundaries.
Estimates of medians are not included. Aggregating medians requires the full distribution of values, not just the medians of the various tracts. Therefore, estimates of medians are not in the app. Sometimes, entire layers are not included, such as Median Age, since all the attributes in this layer are medians.
The estimates update every year with new values, since they use ArcGIS Living Atlas of the World layers that are updated annually. Remember that an aggregation that meets your reliability requirement now may not hold in future data releases.