With data influx reaching exabyte levels, big data can easily become unmanageable and useless without the proper tools to analyze it fast. The strength of your data stewardship therefore depends on how much command you have over big data. To bypass the capacity limitations of working with billions of records at a time, Esri created GIS Tools for Hadoop, a toolkit for executing spatial analysis in the Hadoop environment and looping it back into ArcGIS.
Geoenable Big Data in Hadoop
Many data stewards and developers have come to rely on Hadoop’s open source framework to handle large data stores. Because Hadoop lacks the native functionality to exploit the location component in big data, GIS Tools for Hadoop was designed to extend the popular data management platform with utilities for spatially operating on billions of records at a time.
Although most data includes spatial parameters, large volumes typically require tedious, sequential processing to do any meaningful work on them. For non-GIS users of Hadoop, the toolkit lets data stewards study big data as a whole unit in Hadoop format. Results from those analyses reveal patterns and relationships that traditionally could only be derived from smaller, more manageable datasets. For ArcGIS users, GIS Tools for Hadoop bypasses traditional geoprocessing workflows by enabling the execution of spatial queries on Hadoop data from ArcGIS.
This new ability gives developers and data analysts much-needed control—essentially transforming big data from being something to be dealt with later into an immediately useful resource.
Bring Big Data into ArcGIS
After conducting spatial analytics on your Hadoop data, the toolkit provides a way of importing big data into the ArcGIS environment. Results from spatial querying and analytics in Hadoop can be moved to ArcGIS for further processing and visualization. Those geoprocessed datasets can then be saved to ArcGIS or exported back into the Hadoop system, thus creating a looping workflow between the ArcGIS platform and the big data environment.
Forget about archiving your large data stores for later use. Command your big data now with GIS Tools for Hadoop.
The free, open source toolkit lets you do the following:
- Run filter and aggregate operations on billions of spatial data records
- Define new areas represented as polygons and run point-in-polygon analyses inside Hadoop
- Integrate big data maps in reports or publish them as big data web map applications
GitHub Project
Esri recognizes big data as being the next big frontier of IT and requiring many perspectives to help manage it. As such, Esri hosts GIS Tools for Hadoop for free on the popular open source project site GitHub. Esri encourages developers to download the toolkit, report issues, and actively contribute to improving the tools through the GitHub system.
To download GIS Tools for Hadoop, visit esri.github.io/gis-tools-for-hadoop/.
Related Video
Big Data: Using ArcGIS with Apache Hadoop
Esri staff demonstrate how to perform batch processing operations against large quantities of data using Hadoop.