ArcGIS Blog

Analytics

ArcGIS API for Python

Dev Summit 2020: Extract and map data from unstructured text

By Rohit Singh and Kimberly Peter

Geospatial data doesn’t always come neatly packaged in the form of file geodatabases and shapefiles. Often, data is hidden away in an unstructured format, such as text-based reports.

To use this data with ArcGIS, you need to convert it into a structured, standardized format. However, it is difficult and time consuming to read and convert unstructured text.

At the Developer Summit 2020 plenary, Lauren Bennett demonstrated how you can automate this process using the Learn module of ArcGIS API for Python.

She analyzed thousands of unstructured text files containing police reports from Madison, Wisconsin, and created a map of the crime locations.

You can watch the presentation below. Then read the rest of the blog for a summary of the information you need to implement the same type of workflow in your organization.

Prepare training data

First, Lauren labeled the contents of a subset of text files to define important entities related to crime data. Entities in the presentation included the location, time, and type of crime, time the crime was reported, reporting officer, and weapons used.

An open source text annotation tool named Doccano was used to label the entities.

These labeled text reports served as training data to train an AI model to extract these entities from unstructured text. 

Crime incident report labelled to show entities that should be extracted
Crime incident report with labelled entities, highlighting entities such as the type of crime, where it occurred, time of incident and when it was reported.

Train the model

Next, Lauren used the arcgis.learn module and the training data to train an EntityRecognizer model.

Training such natural language processing models is just like training computer vision models using the arcgis.learn module. You create the model, fit it to the training data, visualize results and save it for later use.

Once satisfied that the model could identify the information they needed, Lauren used it to extract the entities from all the text files. This resulted in a pandas data frame containing the extracted entities for each police report.

Create a feature layer

With the data in a structured data frame, she could use ArcGIS API for Python to geocode the locations and create a point feature layer. Each point represented a crime location.

When added to a map, clicking a point showed the police report and the specific entities extracted for each crime.

Additionally, the extracted crime was clustered into different categories using scikit-learn, a popular machine learning library.

You can view a web map containing the published feature layer.

Crime points
Feature layer of crime incidents

Try it for yourself

Follow these links to additional resources to help you use the arcgis.learn module to extract and map data from unstructured text files:

Share this article

Subscribe
Notify of
0 Comments
Oldest
Newest
Inline Feedbacks
View all comments