There is a strong, cross-industry initiative to extend analytical capabilities so big data can work harder for organizations. Because of this, many have recognized that spatial analytics are the missing piece of their intelligence strategies. Previously considered niche and used only by specialists in GIS departments, the use of spatial analytics is now gaining momentum with data scientists and analysts as they continue to open the aperture of their data analysis lens. Esri is committed to supporting this evolution by making our spatial analytics available beyond the GIS environment, and with flexible use options, so they can be delivered to who needs them, when they need them, and how they need them.
We’re excited to announce the next step in fulfilling that commitment with the release of ArcGIS GeoAnalytics Engine, a comprehensive library of spatial analytics that’s delivered directly to the big data analysis workflow from Apache Spark™. Now, the tools developed and tested by the experts in spatial data science and trusted by organizations worldwide are accessible by data scientists and analysts – at the speed and scale required for big data – removing the barrier to adding spatial analysis to intelligence strategies.
ArcGIS GeoAnalytics Engine includes over 120 spatial functions and tools that are delivered directly to data processing services in the cloud or in the enterprise. And, it offers a breadth of powerful spatial analytics that go beyond the basics found in most open-source spatial analytics packages. This means data scientists no longer need to stitch together tools and functions from various packages to perform comprehensive, end-to-end spatial data analysis.
The complexity of geospatial data
Corporate data continues to grow at an exponential pace, and more and more organizations are leveraging it as the foundation of their decision-making process. The vast majority of big data contains spatial characteristics such as location, and in many cases shape and size. In other words, big data is increasingly also geospatial data, and it can provide important context about the characteristics of places and the relationships between them.
Why is rich spatial information still not leveraged to the extent it should be? One contributing factor has to do with modern computing paradigms. Many organizations have outgrown their traditional IT environments and have turned to the cloud for both storage and computational resources. As of 2022, over 60% of all corporate data is stored in the cloud (statista.com). However, spatially analyzing big data stored in the cloud is challenging for several reasons:
- The distributed processing approach in big data workflows used by data scientists and analysts was not built with spatial analytics in mind, so processing and analyzing is slow and resource intensive
- Geospatial data analysis has generally required ingesting data into specialized GIS software as opposed to cloud-native technology
- In order to get comprehensive, useful results from spatial analysis, data scientists and analysts must patch together many disconnected spatial libraries or packages
The net outcome of these barriers is that spatial analysis is often left out of intelligence strategies.
Applying the spatial lens: bigger data, bigger potential
Massive amounts of data are being generated every day from trillions of sources – personnel and asset tracking devices, mobile devices, internet-of-things sensors, and its applications are endless. As GIS users know well, when we use analytics to go beyond basic mapping, analyzing geospatial data can fuel awareness of what happens where, when, and why. This awareness has boundless benefits, especially when we’re talking about the scale of big data. Uncovering spatial patterns from millions – or even billions – of records can give organizations intelligence like never before.
With spatial analytics, organizations can:
- Measure the size, shape and distribution of physical objects.
- Determine how places are related to one another and why
- Find the best locations to place things and the right paths to get there
- Detect and quantify patterns
- Make predictions about what might happen next, and where
Spatially analyzing big data enables it to enhance decisions about corporate and public policies, provide operational insights that can increase efficiencies, and support smart growth & development – just to name a few potential applications.
Spatial analytics for everyone
For many years, Esri has provided a wide range of spatial analytics within the ArcGIS ecosystem to support the entire spectrum of spatial analysis – from simple geometry operations to spatial aggregation tools to advanced statistical algorithms. Our traditional user community has used these spatial analysis capabilities as part of their GIS workflows and has been instrumental in informing the evolution of spatial capabilities. There are multiple Esri products that expose ArcGIS spatial analytics to the desktop user, the online user, and enterprise users.
With the introduction of ArcGIS GeoAnalytics Engine, ArcGIS spatial analytics are now also available for big data analysis environments. Data scientists can now access the largest number of spatial tools and functions available today in a single library.
Unlike other ArcGIS products, ArcGIS GeoAnalytics Engine does not require an ArcGIS installation. Users simply deploy best-in-class ArcGIS spatial analytics to their cloud big data processing environments. It’s a Spark-native library that data scientists drop into their existing workflow to run spatial processing right away.
Customers can deploy ArcGIS GeoAnalytics Engine on their own Spark environment or use managed Spark products such as Amazon EMR, Azure Synapse Analytics, Databricks, and Google Cloud Dataproc. Built-in capabilities in Spark allow them to easily connect to and analyze data from their data lakes, data warehouses, and other other cloud data stores. They can also write results back to these systems, and then consume those results in business intelligence applications or in ArcGIS. Overall, this simplified analysis process reduces the need to move massive amounts of data or create redundant copies.
Because ArcGIS GeoAnalytics Engine exposes spatial analytics outside of ArcGIS, there are only two main requirements to use it:
- A supported Spark cluster. GeoAnalytics Engine must run on a Spark cluster that uses a supported version of Apache Spark (3.0.1 through 3.2.x). This can be a Spark cluster on machines your organization has deployed, or you can access it through a managed Spark service. A list of certified Spark environments is available in the Install and set up help topic.
- Vector data in a data source supported by Spark. The functions and tools provided by GeoAnalytics Engine are designed to operate on vector geometry data (like points, lines, polygons, and multipoints). They do not work with imagery or raster data. The data should be stored in a location where Spark can connect to it.
ArcGIS GeoAnalytics Engine was released on June 22nd, 2022 and is available for environments both connected or disconnected from the internet. This means even customers using secure cloud options can leverage ArcGIS spatial analytics for big data. Licensing is available through prepaid plans that suit organizations’ analysis needs. To learn more, simply contact Esri by completing the Contact Us form on the GeoAnalytics Engine product page.
Article Discussion: