ArcGIS GeoAnalytics Server is a capability of ArcGIS Enterprise that speeds up spatial analysis and data management workflows using distributed computing. This means you can use multiple cores across multiple machines in parallel to process your data, generating results quickly. Using the analysis tools in GeoAnalytics Server, you can understand patterns, anomalies and trends in your larger datasets.
This blog outlines the types of data storage locations that are supported as inputs outputs with GeoAnalytics – after all, analysis starts with your data! The blog reflects what is possible at ArcGIS Enterprise 10.6.1 and includes enhancements released at 10.7 and 10.8. As data connections continue to grow, we will keep both this blog and the documentation updated with new enhancements.
Data inputs
GeoAnalytics is flexible in that it can connect to many different data sources that you can use as input into your analysis. Keep in mind that GeoAnalytics works with vector (points, lines and polygons) and tabular data. This can take the form of customer & sales data, sensor reads, energy consumption, crowdsourced data, large collections of movement data (cars, people, storms) and much more.
GeoAnalytics supports connecting to the following to pull in data into your analysis:
- HDFS – Hadoop Distributed File System* (including HDFS secured using Kerberos)
- Hive*
- Microsoft Azure blob storage
- Microsoft Azure Data Lake
- Amazon S3 bucket
- Local or network file shares
*Think of the scope of connection to HDFS and Hive as connecting to ‘vanilla’ HDFS and Hive, without additional software packages in between.
File types can include collections of shapefiles, parquet, ORC, and delimited files (such as .csv, .tsv and .txt). For example, analysts could store a collection of CSVs in an Amazon S3 bucket and dynamically use those in GeoAnalytics to perform aggregations, spatial joins, and other analyses.
In addition to these external stores, GeoAnalytics can use your existing GIS data as input into your analysis. This can take the form of:
- Hosted feature layers stored in your relational ArcGIS Data Store
- Hosted feature layers stored in your spatiotemporal big data store (as a result of tracks from Tracker for ArcGIS, real-time feeds from GeoEvent Server, or GeoAnalytics analysis results)
- Feature services (as long as you have query access, you can use it for analysis!)
- Stream services (data ingested through GeoEvent Server can be used in GeoAnalytics analysis)
If you have data stored in a geodatabase, the best way to use it in GeoAnalytics is to publish what you want to analyze as a hosted feature service and use that as input. Analysis will be faster with a hosted feature layer than a feature layer published by reference because the tools can read directly from the data store. So, when possible, use hosted layers!
Data outputs
When you are running analysis, you have the option to store your results as a hosted feature layer in the ArcGIS Data Store, either the relational type or the spatiotemporal big data store. In most cases, it is recommended to use the spatiotemporal big data store, as it is designed to store a large number of features and uses distributed storage for fast read/write of your data. At ArcGIS Enterprise 10.7, you can now write your results back to HDFS, an Amazon S3 bucket, an Azure Data Lake or a network share. At 10.8 you can now write results to Azure blob storage. Read on to learn more!
Above is an illustration of how data flows through GeoAnalytics Server, with inputs and outputs. Data can be used from external stores to run analysis against your GeoAnalytics server(s). The output is a feature layer in ArcGIS Enterprise that can be used in other areas of ArcGIS Enterprise like web maps and applications.
New at 10.7:
At 10.7, we’ve added the ability to write back to your big data file shares.
The above illustration outlines how your data flows through GeoAnalytics Server at 10.7, with inputs (the same as 10.6.1) and outputs (new options for 10.7). Your analysis output can be a feature layer in ArcGIS Enterprise or – and this is the new part – a big data file share. You can mix and match sources and outputs as you need. This means you can read from one big data file share, write a hosted layer, and then use that as input and write to another big data file share. It’s all up to you where you want your data to go.
New at 10.8:
At 10.8, you can additionally write back your big data file share results to Azure blob storage.
At 10.8, we’ve added Azure blob storage as an additional output option.
When you’re ready to get started, visit the help topic Get started with big data file shares for technical information on how to prepare and connect to your data. For more information on analysis tools provided by GeoAnalytics, visit Perform big data analysis using GeoAnalytics Server.
Keep in mind, this is the current list as of ArcGIS Enterprise 10.6.1 and 10.7, and 10.8. You can always reference the documentation for an updated list and we will keep this blog updated as well as new options are released.
For questions, comments and product needs for GeoAnalytics Server, please contact GeoAnalytics@esri.com. As we continue to develop the product, your data needs will help drive us to add support for other storage types and options.
Hilary & Sarah
Article Discussion: