Data is essential for making informed decisions, but it can be a challenge to integrate and transform data into a usable format. For use cases where data preparation consumes a significant amount of time, when there are issues with cleaning and preparing data, or when datasets need to be combined from multiple sources, ArcGIS Data Pipelines can help with data integration and preparation.
Accessed from the app launcher in ArcGIS Online, Data Pipelines provides a new data integration capability in ArcGIS Online that makes it faster and easier to access, prepare, and maintain data for mapping, analysis, and reporting. It streamlines a user’s extract, transform, and load (ETL) workflow in ArcGIS Online by going beyond traditional ETL tools. With its drag-and-drop interface and task scheduling functionality, users can create data pipelines without writing any code and automatically keep data up-to-date. Data Pipelines enhances data-driven decision-making in ArcGIS Online and enables users to maximize the value of their geospatial data.
Benefits of Using Data Pipelines
By streamlining data integration for ArcGIS Online, Data Pipelines centralizes data from various sources into a single, accessible location. This saves time and effort so that users can focus on deriving valuable insights from the data rather than on data preparation and cleaning tasks.
Data Pipelines also expands access to data stores by supporting data from cloud-based data storage options such as Amazon S3, Azure Blob, Google BigQuery, and Snowflake. Users can leverage previously inaccessible or difficult-to-integrate data to enrich analysis and decision-making capabilities. They can also bring in files from their local drive or via a URL, or feature layers from ArcGIS Online. Access to a broader range of data sources can help foster data-driven decision-making across an organization and between organizations.
The application’s data transformation tools enable users to get data into an ideal state for mapping, analysis, and reporting. Users can clean, transform, and enrich data to ensure accuracy, consistency, and completeness.
An intuitive graphical interface eliminates the need for specialized coding skills by enabling users to build and manage data pipelines via a drag-and-drop authoring experience. This low-code approach allows users of all skill levels to construct and manage data pipelines.
Automation is another key advantage of Data Pipelines. Its scheduling functionality facilitates running data pipelines on a recurring basis, keeping datasets up-to-date and synchronized with the latest changes in the source data.
Data Pipelines offers a comprehensive set of benefits for ETL processes, including streamlined data integration, expanded data access, data transformation tools, democratization of data management, and workflow automation. By leveraging these capabilities, organizations can unlock more insights with their data, helping with data-driven decision-making and gaining a competitive edge in today’s datacentric landscape.
Key Use Cases
While Data Pipelines was in beta, organizations used the application to streamline data integration in exciting and innovative ways, such as the following:
- Bringing data in from external sources and keeping it up-to-date with ease: Data Pipelines allows users to bring data into ArcGIS Online from files stored locally, on the web, or in a cloud store with support for various formats including GeoJSON, CSV, and shapefiles. For example, users can add a download URL to a file that is maintained on a public data website and add that URL as an input to a data pipeline. Additionally, if the source data is changing, users can schedule a data pipeline to run on a recurring basis, ensuring that the dataset is kept up-to-date.
- Visualizing tabular data on a map: Users can enhance tabular data by geospatially enabling it. For instance, in a table with ZIP code information, Data Pipelines can match those fields to corresponding polygons from a layer in ArcGIS Living Atlas or one of the boundary layers that an organization maintains. Datasets can be joined based on matching attributes, spatial relationships, or temporal relationships.
- Transforming datasets to fit unique requirements, without impacting the source: With Data Pipelines, users can modify an input feature layer such as changing attribute values or adding and calculating new fields while keeping the original source unchanged. This allows users to maintain a copy of a feature layer and adjust it to suit specific requirements without affecting the data source or the teams that maintain it.
- Getting a unified view of the data: Users can merge features from multiple inputs into a single source with Data Pipelines. This is particularly useful when there is a need to consolidate data from separate sources into a unified layer with a consistent and predictable schema. For example, users may have subsets of data being maintained by different individuals, teams, or organizations; with Data Pipelines, those multiple input datasets can be combined into a single feature layer for use in a web map, web app, or dashboard.
These are just a few of the workflows that ArcGIS Data Pipelines can help users achieve. By leveraging its capabilities, users can streamline data management processes and enhance geospatial workflows without writing a single line of code or relying on repetitive, manual efforts.
Getting Started with ArcGIS Data Pipelines
For organizations that are looking for a tool to streamline data management and integration workflows in ArcGIS Online, Data Pipelines is available now. It does not require an additional license. Instead, it consumes credits based on the amount of time spent working in the app.