Geocoding is a fundamental GIS process for plotting your data on the map. It is often the first step in applying GIS to understand the where. The accuracy of plotted points directly correlates with the success of downstream decisions. Regardless of the GIS application you choose, you will make decisions based on analysis and this analysis must be accurately geocoded.
To ensure geocoding accuracy, consider these five questions:
- How do you define accuracy and why does it matter?
- What are the benefits of using a solution built with authoritative, commercial street data versus a low-cost or free solution that uses OpenStreetMap and Census TIGER data?
- Which match levels are supported by my current geocoding solution, and what is the benefit of cascading to get the best match at the highest accuracy?
- What is the difference between a Address point, Delivery point, and an Address Range match?
- What does the status ‘Match, Tie, or Unmatch’ mean and what does the score indicate?
Defining Accuracy
Many people get confused when describing Accuracy and Precision. Sometimes these terms are incorrectly thought to be interchangeable.
Here are the correct definitions. Accuracy is used to describe the closeness of a measurement to a true value. Precision is the closeness of agreement among a set of results. In the graphic below, you can see the differences between each.
In geocoding, you can obtain highly accurate, highly precise results or results with low accuracy and precision, or anywhere along the spectrum depending on your solution. Accurate and precise geocoding results are essential for many use cases as discussed in our previous blog. They greatly affect the decisions that stakeholders make with analysis downstream. Many factors influence accuracy and precision, as we will discuss.
Street Reference Datasets and Accuracy
To be able to match a spatial coordinate to your address record, you need a street reference dataset that contains the address and the associated coordinates. This will be used by the software as a reference to match your input address to a coordinate. There are three street reference data options:
- Publicly available sources like US Census TIGER, Local/National government addressing data.
- A commercial street dataset such as those from HERE and TomTom.
- A street dataset assembled through contributions by the vendor’s own user community (e.g. Google).
Each of these sources has pros and cons related to accuracy. How do you determine which is right for you?
Cascade Matching
The next aspect of accuracy and precision is about software and the ability to leverage data at multiple layers of precision.
The most highly accurate and precise geocode you can return is at the address point or rooftop and delivery point location. The benefit of a geocoding solution using a commercial dataset is that you get access to multiple levels of precision to match, including address points.
Another benefit is that software vendors who typically leverage commercial datasets also allow users to match at different levels of precision. Results are highly accurate. This feature is called cascade matching. For example, if the user submits an incomplete address without the building number, the software will recognize it. Instead of returning an unreliable, inaccurate match at address point, the software will cascade down to a street name centroid match that is more reliable and accurate.
Address Point, Delivery Point, and Interpolation
With commercial street datasets, there are usually two points that are returned as part of an address match:
- Address point (display x, display y), or Display Point, which is used for centering the map around the location and for other high location accuracy use cases. These point locations are often centered on the rooftop or the parcel centroid.
- Road Adjusted point (x, y), or Routing Point, which is used for routing and network analyses use cases.
Both of these points provide highly accurate matches and cannot be obtained using TIGER or crowd-sourced data in a consistent fashion.
In addition to highly accurate and precise matches, commercial street datasets also provide Street Address Range data including street centerlines with range information. The software uses this to interpolate or approximate the match location along a ranged street segment. While less accurate than the Address Point matches, it delivers high location accuracy (within 10 to 50 meters) to meet most use cases in the market.
Match Quality Indicators
A good geocoding solution also outputs the match accuracy and confidence on the precision of that match along with the matched coordinate. A match without these indicators is often unreliable for analysis or decision making. A good solution should indicate:
- Match status: Whether the input yielded candidate matches to a coordinate in the database or if there are multiple, tied candidates at the same level of precision requiring user review, or if the input was unmatched.
- For those records that are matched or tied, a Confidence score or Match score will assess the confidence on the precision of a match.
- Accuracy of the match: Whether it was down to address point, interpolated point, street centroid, postal centroid, or city centroid.
Geocoding location accuracy depends on many factors. Hopefully, by asking the questions outlined here, you will be able to choose a solution that best supports your use cases, analysis, and downstream decision making. Esri considers and incorporates such aspects into building its own Esri World Geocoding capabilities and products to support the work of users across the globe in all industries. To learn more about geocoding, please refer to our previous blog or visit our webpage.
Article Discussion: