October - December 2007 |
||||||||
|
The GDK was downloaded from Esri's Developer Network (EDN) Web site. The kit contains all a user needs to customize the geocoding process including the Geocoding Rule Base Developer Guide, the geocoding rule bases of the current release of ArcGIS, an interactive standardizer (STANEDIT.EXE) that is used for syntax checking and debugging of standardization pattern rules, and the standardizer pattern rule encryption program (ENCODEPAT.EXE). After installing the GDK, modifications were made to the classification file, us_addr.cls, and the pattern file, us_addr.pat. These files support geocoding locators for several of the most commonly used U.S. style addresses (U.S. Streets, U.S. One Range, and U.S. One Address locators). In general, changes to the classification file were made to
Next, changes to the pattern file provided improvements in road-specific pattern recognition. Certain road names in Leon County are intrinsically baffling to the default Esri geocoding routines (e.g., North by Northwest Rd); the GDK provides users with the tools to create custom patterns to recognize and properly parse addresses containing these confusing road names on a case-by-case basis. Edits to the pattern file are actually made to the unencrypted version of the file (us_addr.xat), then ENCODEPAT.EXE is used to encrypt the file to the version ArcGIS uses (us_addr.pat). By making changes in the classification file and the pattern file, it is possible to improve geocoding rates on problematic roads without requiring changes to the master address database. A balancing act is required when making changes in these two files. For example, in Leon County the placement of street-type keywords is not always consistent (e.g., Ride is sometimes used as a street type and sometimes used as part of the street name), so judgment must be used in determining whether to retain Ride as a street-type keyword in the classification file or to remove it. This determination will, in turn, affect which specific road names must be dealt with on a case-by-case basis by adding pattern recognition routines to the pattern file. After the pattern and classification files have been modified and the pattern file has been encrypted, the files are copied over the default versions in the Program Files\ArcGIS\Geocode directory. Significant Improvements
For benchmark testing, 97,834 addresses were extracted from the Leon County parcel address layer. These addresses were geocoded against a parcel-based locator service using the Esri default classification and pattern files. The geocoding settings were left as default (spelling sensitivity=80, minimum match score=60, ties allowed). Even though all the addresses in the benchmark test had exact string matches to addresses in the locator, 1,131 of the addresses did not geocode successfully. This clearly illustrates the magnitude of improperly parsed addresses in Leon County when using the default ArcGIS classification and pattern files. Next, these same addresses were geocoded using the customized classification and pattern files. Only 59 of the records did not geocode successfully; the customizations to the classification and pattern files resulted in a 95 percent reduction in the number of unmatched addresses. The remaining unmatched records were primarily addresses containing confusing unit numbers for apartments and condos.
Further testing was conducted to gauge the improvements when geocoding typical user-supplied addresses in a real-world environment. There were 7,019 addresses pulled from the Leon County Animal Services service request database that were geocoded against a standard composite locator (a parcel-based locator with no ties allowed, followed by a street centerline-based locator allowing ties). In this test, the GDK customized files reduced the number of unmatched records by 29 percent. ConclusionThe Geocoding Development Kit enables GIS professionals to improve geocoding match rates for address datasets containing unique local address styles by modifying the Esri default classification and pattern files. The return on investment of the time required to understand the GDK and implement the necessary local modifications can be easily justified by the reduction of manual matching efforts throughout an entire organization. History and culture leave a unique signature on local road names, no matter what corner of the world you work in. So look aroundyou are certain to find the GDK can improve your geocoding rates too. For more information, contact
GIS Specialist III Tallahassee-Leon County GIS E-mail: johnsonj-gis@hotmail.com About the AuthorWhile implementing the GDK, Jay Johnson provided GIS support to the Tallahassee-Leon County (Florida) Interlocal GIS program. He has more than 12 years of professional GIS experience and received his master's degree in GIS from the University of Colorado at the Denver College of Engineering and Applied Science. He recently relocated to Reno, Nevada. ReferencesEsri's Geocoding Development Kit (visit edn.esri.com and search the Downloads section for geocoding.) |