Is data preparation the most time-consuming part of your workflow? Do you have datasets you want to bring to, and use in ArcGIS? Is your data difficult to clean? If these problems sound familiar to you, keep reading to learn about the new data preparation and data ingest capability of ArcGIS Online: Data Pipelines.
What is Data Pipelines?
Data Pipelines is a new native data integration capability of ArcGIS Online that makes it faster and easier to access, prepare, and integrate data.
With Data Pipelines, you can:
- Connect to datasets in your external data stores, like Amazon S3 or Snowflake
- Ingest public data that is accessible via URL, such as datasets found in open data portals or a downloadable CSV provided by your local county
- Filter and clean your data using data processing tools, like Filter by attribute, Select fields, and Remove duplicates
- Enhance your data by joining it with information from Living Atlas layers using the Join tool, or use Arcade functions to calculate field values using the Calculate field tool
- Easily integrate and clean data in ArcGIS Online with an easy-to-use drag-and-drop interface
- Create reproducible, no-code data prep workflows
Check out the video below to see Data Pipelines in action.
How do you build a Data Pipeline?
Data pipeline workflows consist of the following element types:
- Inputs–These are the connections to data sources used to read in the data you want to prepare. You can add one or multiple inputs to build your workflow. A full list of supported inputs can be found here.
- Tools–Once you’re connected to your data, you can configure tools to prepare and transform your data. For example, you can filter for certain records using queries, integrate datasets by using joins, merge multiples datasets together, or calculate a geometry field to enable location. A full list of the available tools can be found here.
- Outputs–Once your data is prepared, it can be written to feature layers. You can create a new feature layer or update existing feature layers. For more detailed information on configuring data pipeline outputs, see the output feature layer documentation.
The image below shows an example workflow using the three elements:
- An input–In this case, a connection to a public Google BigQuery dataset for Citi Bike locations.
- A tool–This workflow uses the Create geometry tool to define the geometry for each location from coordinates.
- An output–In this example, writing the Citi Bike point locations to a feature layer.
You can also make a more complicated data pipeline, with multiple inputs, tools, and outputs.
Interactive data prep
Data Pipelines provides an interactive experience for investigating your data while building out your prep workflow. While working with your data, you may want to check that each step is completed as you expect. You can do this through the preview option. At each step, you can visualize your data in a table or a map to get a better understanding of how it has been processed, or identify any remaining processing steps that you need to do.
Previewing means you can easily identify any remaining steps in preparing your data. If you realize you’ve missed a step, you can update exiting tool parameters or add or delete tools in your diagram.
Get started with Data Pipelines
You’ve seen how you can now connect to an external data source, use a suite of data preparation tools, and save the results in ArcGIS Online. Data Pipelines takes one of the most challenging parts of your GIS workflows, and simplifies it by giving you the power to easily prepare and ingest your data. We hope that you feel empowered to try it out. To get started with Data Pipelines, check out the following resources:
- Documentation
- Tutorial: Create your first data pipeline
- Video: Take an introductory tour of Data Pipelines!
More information
Data Pipelines consumes credits based on the amount of time the editor session is active (active means in a connecting or connected state). While the session is active, you can continuously preview and run your data pipeline workflow. To learn more about credits in ArcGIS Online, see the Understand credits topic. For more specific details on how and when Data Pipelines consumes credits, see the Data Pipelines FAQ on credit consumption.
For more information and additional details about Data Pipelines, see the Data Pipelines documentation. Consider checking out the FAQ topic to find answers to any specific questions. For any other questions or suggestions, please post in the Esri Community forum where one of the Data Pipelines team members will be happy to help you out.
If you’re interested in data preparation workflows in ArcGIS Pro, see the blog Explore and Prepare Your Data with ArcGIS Pro Data Engineering.
Article Discussion: