The R-ArcGIS Bridge is the R integration for ArcGIS Pro that opens up ways for enriching GIS workflows with rich statistical analysis packages of the R language. arcgisbinding
, the R package developed by the R-ArcGIS Bridge team, has new enhancements that we are excited to share at the 2021 Esri Developer Summit. In addition to our tech workshop on leveraging the R-ArcGIS Bridge, we will be showcasing the new features of arcgisbinding
at the plenary.
Technology Highlights
- Working with R notebooks alongside ArcGIS Pro
- Creating interactive maps in R notebooks
- Calling geoprocessing tools from R
Problem Definition
Ecoregions are geographic regions of ecological systems based on vegetation, climate conditions, and land cover.
In Part 1 of this blog series, we learned about the famous Bailey’s Ecoregions map. This map is an expert-driven interpretation of the geography of US ecoregions, which was hand-drawn by US Forest Service researchers in 1994.
Bailey’s Ecoregions are hierarchical in their size. The largest regions are domains and they group spatial units with respect to their similarities in precipitation and temperature. Divisions are subgroups inside domains and are defined with respect to similarities in precipitation and temperature levels and patterns. Lastly, divisions are made up of provinces, which are differentiated based on vegetation and other natural land cover similarities.
Ecoregion provinces defined by Bailey are given in the map below:
Bailey's ecoregion provinces for the conterminous United States
Using datasets representing climate and vegetation conditions from the same period of the 1990s, we applied a series of regionalization (clustering) algorithms in ArcGIS and R to create several data-driven interpretations of these US ecoregions.
Working with R Notebooks Alongside ArcGIS Pro
We will use R notebooks inside an ArcGIS Pro Conda environment. The Conda package r-arcgis-essentials will be used to set up this computational environment. If you would like to learn the detailed steps of setting up an R notebook environment to run alongside ArcGIS Pro, please visit this blog.
Once the r-arcgis-essentials
package is installed, it enables R notebooks, the R-ArcGIS Bridge, and commonly used spatial R packages, such as sf
, sp
, and raster
.
With our Conda environment correctly configured to power an R notebook, we can bring in the Bailey’s Ecoregions as a spatial R data frame for visualization and analysis.
Working with Cloud-Based Data Sources
Feature services and image services can be directly read in as spatial R data frames using the R-ArcGIS Bridge. A detailed explanation about accessing remote data sources in R can be found in this blog. The Bailey’s Ecoregions feature service will be directly brought in as follows:
base_ecoregion_url <- 'https://services3.arcgis.com/oZfKvdlWHN1MwS48/arcgis/rest/services/Ecoregions/FeatureServer/0'
base_ecoregion_obj <- arc.open(base_ecoregion_url)
Using the R-ArcGIS Bridge’s conversion functions, the arc
type data frame can be converted to commonly used spatial R data types such as sf or sp using arc.data2sf
or arc.data2sp
, respectively.
The Bailey’s Ecoregions dataset can be easily interactively mapped using the R-ArcGIS Bridge’s integration with esri-leaflet. The script below is used to create an interactive map of Bailey’s Ecoregions:
1. First, data is converted to an sf object (the original arc object can also be used, however the current version of leaflet requires the WGS84 projection):
base_ecoregion_arc <- arc.select(base_ecoregion_obj)
base_ecoregion_sf <- arc.data2sf(base_ecoregion_arc)
2. A color palette is defined for every unique province:
num.clust <- length(unique(base_ecoregion_sf$PROVINCE))
cluster.pal <- colorFactor(rainbow(num.clust), domain=base_ecoregion_sf$PROVINCE)
3. Lastly, a leaflet object is created using the sf
object for Bailey’s Ecoregions and the associated color palette:
L<-leaflet(elementId='ecoregion_map') %>%
addProviderTiles(providers$Esri) %>%
addPolygons(data = st_transform(base_ecoregion_sf, 4326),
fillOpacity = 1,
color=~cluster.pal(base_ecoregion_sf$PROVINCE),
label=~sprintf("Ecoregion Province: %s", base_ecoregion_sf$PROVINCE))
Note that the label
parameter defines interactive, data-driven text to be displayed, which provides information on the province that is being hovered over.
Calling ArcPy Geoprocessing Tools from R
One of the R packages that we imported into our R notebook was reticulate
. Reticulate is a commonly used package for calling Python functions from R. It is frequently used for calling low-level Python functions and returning results from the Python function as an R data type, thus allowing Python analysis to be performed from an R session. The new reticulate integration in the R-ArcGIS Bridge allowed us to import ArcPy
, which is Esri’s Python package containing hundreds of functions for spatial data science, data conversion and management, and map automation. Importing ArcPy allows us to call and execute geoprocessing tools directly in the R notebook, side-by-side with our R code.
The geoprocessing tool we used to perform our first data-driven interpretation of the Bailey’s Ecoregions map is ArcPy’s Spatially Constrained Multivariate Clustering tool. This tool defines spatially contiguous clusters (regions) based on a set of attributes, by assigning spatial units with similar attribute values to the same cluster. It also allows the user to force a “spatial constraint” on the clusters, which ensures that each cluster is spatially contiguous.
The attributes used to create the clusters represent a time-series of different climatic and land-cover variables summarized in each spatial unit for the year 1994. The most impactful variables for defining ecoregions were determined through trial-and-error, and are discussed in more detail in the previous blog in this series:
1. Maximum FAPAR
2. Mean FAPAR
3. Min FAPAR
4. Range FAPAR
5. Max LAI
6. Mean LAI
7. Minimum LAI
8. Range of LAI
9. Maximum Precipitation
10. Mean Precipitation
11. Minimum Precipitation
12. Range Precipitation
13. Maximum Temperature
14. Minimum Temperature
15. Standard Deviation of Temperature
The following code snippet is used to call the Spatially Constrained Multivariate Clustering function from ArcPy:
ARCPY$stats$SpatiallyConstrainedMultivariateClustering
R-ArcGIS Bridge’s reticulate
integration allows writing the result out to an in-memory feature and seamlessly reading it in as a spatial R data frame as follows:
skater_regions <- arc.select(arc.open(out.fc), fields = c('CLUSTER_ID'))
skater_regions.sf <- arc.data2sf(skater_regions)
Performing Ecological Regionalization Using vegclust
Our second data-driven interpretation of the Bailey’s Ecoregions map was created using the R package vegclust
. The vegclust package provides methods for performing clustering on ecological data, so it is appropriate for this analysis on ecoregions.
Like the Spatially Constrained Multivariate Clustering tool, the vegclust function requires us to specify the attributes of interest. We define a generic (non-spatial) R data frame from the original Bailey’s Ecoregions:
vars <- c("FAPAR_MAX_ZONAL", "FAPAR_MEAN_ZONAL", "FAPAR_MIN_ZONAL", "FAPAR_RANGE_ZONAL",
"FAPAR_STD_ZONAL", "LAI_MAX_ZONAL", "LAI_MEAN_ZONAL", "LAI_MIN_ZONAL", "LAI_RANGE_ZONAL",
"LAI_STD_ZONAL", "PRECIP_MAX_ZONAL", "PRECIP_MEAN_ZONAL", "PRECIP_MIN_ZONAL", "PRECIP_RANGE_ZONAL",
"PRECIP_STD_ZONAL", "TEMP_MAX_ZONAL", "TEMP_MEAN_ZONAL", "TEMP_MIN_ZONAL", "TEMP_RANGE_ZONAL",
"TEMP_STD_ZONAL")
eco.vars <- st_set_geometry(ecoregions_data.sf[vars], NULL)
Clusters (regions) are defined using the following function, which specifies the number of clusters to create, and the clustering model. Given that we want each input polygon to be a member of only one cluster, we chose Hard C-Medoids (KMdd) as the clustering method:
eco_groups <- vegclust(x = eco.vars, mobileCenters=num.clust, method="KMdd", nstart=20)
We then use arc.write
to convert our R ecoregions to a local feature class for further analysis in ArcGIS Pro:
arc.write(out.fc.vegclust, ecoregions_data.sf, overwrite=TRUE)
Lastly, both data-driven ecoregion maps created using ArcGIS and the vegclust
package in R are compared against the original, expert-driven Bailey’s Ecoregions map using the new Spatial Association Between Zones tool. For details, please refer to the original blog post on defining data-driven ecoregions.
Inspired? Come Join Us at 2021 Dev Summit
New enhancements to the R-ArcGIS Bridge make it possible to work with multiple programming languages, leveraging functionality from ArcGIS Pro directly, and creating interactive maps for visualizing spatial data to its full potential. If you are interested to learn more come visit the virtual booth at the 2021 Developer Summit, and check out our product page.
Article Discussion: