ArcGIS Blog

Apr 27, 2020

Mapping large datasets on the web

By Kristian Ekenes

At the Esri 2020 Developer Summit plenary I presented a web app, One Ocean, which allows you to interactively explore data from Esri’s open Ecological Marine Unit (EMU) dataset. This app is one example of how you can prepare a massive dataset for data exploration in the browser using the ArcGIS API for JavaScript (ArcGIS JS API).

Ocean currents and temperature at 5m below the sea surface. — One Ocean allows you to explore ocean temperature and salinity at various sea depths.

The app allows the user to explore temperature, salinity, direction, and velocity at various depths throughout the world’s ocean. It renders the data in the Spilhaus projection, which displays what we typically consider multiple oceans as one continuous body of water.

You can view the five-minute demo of this app as it was presented at the Developer Summit in the video below.

Jeremy Bartley and I recorded a one hour Developer Summit virtual session, Best Practices for Building Web Apps that Visualize Large Datasets, in which we discuss and demonstrate how to optimize large datasets for web applications. Check out the video below.

You can also access the session slides and demos in my Dev Summit 2020 GitHub repo. This blog will highlight some of the steps that went into creating the One Ocean app to demonstrate the principles Jeremy and I discussed in the session above.

Get to know your data

I’ve been interested in the EMU data since it was made available to the public a few years ago. In fact, anyone can freely download the full dataset as an ArcGIS Pro package.

Upon downloading the package, I came to the daunting realization that I was dealing with a massive dataset – a 4 GB geodatabase containing nearly 52 million points.

An item details page in ArcGIS Online showing a scream emoji next to the size of the data (4GB). — 4 GB is too much data to load in the browser. Additional processing is required to explore it in a meaningful way.

These are the key points I learned prior to processing it:

The data is represented as 52 million points in a single layer
It requires 4 GB of data
Each point is located at every ¼ degree of latitude/longitude
Up to 102 points are stacked at each location (from sea surface to ~5,000m depth)
There are 12 attributes per point (temperature, salinity, phosphate, oxygen, silicates, EMU cluster, etc.)

Since data of that size is too large to work with in the browser, Jeremy and I brainstormed ways to work with it in a meaningful way.

Data processing

Users frequently ask about the limit for the number of points or features the ArcGIS JS API can render in a web application. Ultimately, it depends on a number of factors: the device, the browser, the speed of the internet connection, network latency, etc. But 4 GB of data is too large in any scenario.

Our goal was to reduce the size to be as small as possible without compromising the story or the desired experience. That involved processing our data using the following methods:

Table pivot
Geometry thinning
Attribute thinning

Table pivot

Our first order of business was to reduce the number of rows in the table. Since the original layer has up to 102 points (or rows) at each x/y location, we felt we could flatten the table to one row for each x/y coordinate.

Preparing all the attribute data for each geometry in a single row allows the ArcGIS JS API to update the underlying data values driving the visual properties of the renderer at up to 60 fps without reprocessing geometries on the GPU. This prevents flashing when the renderer updates with new data.

The goal was to take a table that looked like this:

table with multiple rows representing data for the same geometry — A table recording temperature and salinity values at the same location at various depths. Rows with matching colors indicate duplicate x/y locations with data values captured at various depths.

And pivot it to a table with fewer rows (one unique row per location), but more columns…

table with two rows and multiple columns representing different attributes for each row — The flattened table has only one row per x/y location and records each attribute value in a separate column for each depth level.

Conveniently, my colleague Keith VanGraafeiland had already done this. By using a series of spatial joins in ArcGIS Pro, he was able to create a new layer with one point per x/y location. Now our data looked like the following:

The number of points was reduced to 677,109 features! Now the layer contained one row per x/y location with fields for each attribute at each depth level.
This introduced a new problem – Now the layer has 1,224 fields, or 102 fields for each variable for each depth level (e.g. temp_1, temp_2, temp_3, etc. ). In other words, the temperature at depth level 1, depth level 2, etc. at each location.

Geometry thinning

While the table pivot significantly reduced the number of points, 677,000 features is still too many for a global extent. Knowing each point was ¼ degree apart, we experimented with thinning points to every ½ degree.

We accomplished this using a Select Layer By Attribute operation in ArcGIS Pro with the SQL expression in the image below.

The results of a select layer by attribute operation in ArcGIS Pro. — The SQL expression in the results of this tool show how to select points at every 1/2 degree of latitude and longitude.

This reduced the number of points all the way down to 84,711 features. This resulted in a new layer symbolized with blue points in the image below.

EMU data points at every 1/4 degree are orange. Every 1/2 degree the points are blue — Orange points indicate the original 1/4 degree locations of the EMU grid. The blue points show the results of the Select Layer by Attribute operation described above. The blue points were exported as a new layer.

While the data looks more coarse at this scale, the ½ degree resolution still works great at a worldwide extent.

Points located at every 1/2 degree cover the world's ocean. — The EMU layer thinned to 1/2 degree intervals at a worldwide scale. At this scale, the data still maintain good resolution.

Attribute thinning

Feature services are limited to 1,024 columns. So I removed about 300 fields to make the layer publishable at 960 fields.

After publishing, I created a hosted Layer View and limited the number of fields to only the attributes I was concerned with: temperature, salinity, direction, and velocity.

After all the processing we were able to reduce the data to a manageable 84,711 points and 317 attributes. However, that many attributes for that many points still results in more than 311 MB of uncompressed data.

Thanks to performance improvements made in the ArcGIS JS API and hosted and enterprise feature services over the last year, we can reduce that size down to 145 MB of compressed data.

How the ArcGIS JS API and ArcGIS Online improve web app performance

There are a number of things the ArcGIS JS API and ArcGIS Online do in tandem to improve the efficiency of rendering more data more efficiently in the browser. These can be viewed by opening your developer tools and inspecting the network traffic for the feature service queries.

The image below shows an example of a query the ArcGIS JS API makes to a hosted ArcGIS Online feature service.

query parameters from a network request in the browser. — Query parameters from a feature tile request made in the One Ocean app.

PBF – Data is requested in protocol binary format (PBF) to improve vertex encoding and therefore speed up drawing times.

outFields – The ArcGIS JS API only requests fields required for rendering plus any specifically requested by the developer. This can significantly reduce payload size, improving download speed. (In the One Ocean app I requested all fields for reasons mentioned in the Fast, client-side analytics section below.

Quantization – Quantization parameters ensure the client only requests one vertex per pixel based on the view tolerance. The query in the image above indicates each pixel had an approximate size of 4.9 km. Vertices in the returned geometries are guaranteed to conform to that tolerance, except for point data. Since point geometries aren’t dropped from the response, quantization doesn’t make much of a difference in performance for the One Ocean app. However, it can have a major impact in reducing payload size for polygons and polyline geometries.

Tiled requests – The resultType: tile parameter is the API’s way of telling the server the query can be repeated in multiple clients and should therefore be cached as a feature tile.

The responses for these requests will look like those the image below. Note the outlined columns.

network requests made in the One Ocean app with some headers highlighted.

Content-Encoding – Data is compressed using Brotli compression for efficient data transfer.

Caching – The x-cache: Hit from cloudfront indicates the response comes from the CDN cache. The x-esri-ftiles-cache-compress: true indicates the response is cached as a feature tile on the server. ArcGIS Online takes advantage of several tiers of feature tile caching on the server and CDN (for public layers only). This reduces load on the server and underlying database, significantly improving performance. Paul Barker wrote an excellent blog detailing how this works.

All of these concepts help improve the time it takes to load the data. This is evident in large layers where these improvements go a long way in improving performance.

Fast, client-side analytics

Despite those performance improvements, the One Ocean app still makes heavy queries, totaling up to 145 MB of data. That requires the user to wait about 30 seconds before they can use the app on a desktop browser (we chose to block mobile devices from loading the app because of the enormous size of the data). While 30 seconds is a long amount of time, the wait is worth it. Once all that data is loaded, you can explore authoritative data spanning the vast global ocean without the need for fancy software or an advanced knowledge of oceans. An ocean full of data is at your fingertips!

Loading a whopping 145 MB of data takes some time, but exploring that amount of data with a smooth UX is worth the wait.

While the ArcGIS JS API dynamically requests data as needed by the renderer, I opted to request it all up front by setting the layer outfields to [*].

const emuLayer = new FeatureLayer({
  portalItem: {
    id: "9ff162e1ea1a4885acd8c4ceeab588a5"
  },
  outFields: ["*"]
});

Even though this makes for a very long initial load time of just under 30 seconds, this allows the app to update the underlying data values driving the visual properties of the renderer without reprocessing geometries on the GPU or requesting additional attributes from the server.

Every time the user explores the data by switching a variable or moving the depth slider, the renderer updates in the following way:

Click to show/hide the code

emuLayer.renderer = {
  type: "simple",
  symbol: {
    type: "web-style",
    name: "tear-pin-1",
    styleName: "Esri2DPointSymbolsStyle"
  },
  visualVariables: [{
    type: "color",
    // temp_1, temp_2, temp_3, etc
    field: `temp_${getLevelFromDepth(depthSlider.values[0])}`,
    stops: [
      { value: 0, color: "#1993ff" },
      { value: 5, color: "#2e6ca4" },
      { value: 15, color: "#403031" },
      { value: 27, color: "#a6242e" },
      { value: 30, color: "#ff2638" }
    ]
  }, {
    type: "size",
    // Velocity_1, Velocity_2, Velocity_3, etc
    field: `Velocity_${getLevelFromDepth(depthSlider.values[0])}`,
    minDataValue: 0,
    maxDataValue: 0.5,
    // update size based on view scale
    minSize: {
      type: "size",
      valueExpression: "$view.scale",
      stops: [
        { value: 7812500, size: "12px" },
        { value: 15625000, size: "8px" },
        { value: 31250000, size: "6px" },
        { value: 62500000, size: "4px" },
        { value: 125000000, size: "2px" },
        { value: 250000000, size: "1px" }
      ]
    },
    maxSize: {
      type: "size",
      valueExpression: "$view.scale",
      stops: [
        { value: 7812500, size: "32px" },
        { value: 15625000, size: "20px" },
        { value: 31250000, size: "16px" },
        { value: 62500000, size: "12px" },
        { value: 125000000, size: "6px" },
        { value: 250000000, size: "4px" }
      ]
    }
  }, {
    type: "rotation",
    // Direction_1, Direction_2, Direction_3, etc
    field: `Direction_${getLevelFromDepth(depthSlider.values[0])}`,
    rotationType: "geographic"
  }]
};

Note the red outlines in the snippet below. It’s not the color and size properties that change with each slider update. Rather, it’s the attribute names driving the visualization that update.

JavaScript code highlighted in red showing where the data value is replaced each time the renderer updates.

Each slider value corresponds to a depth level used in the field names (e.g. temp_1 indicates the temperature value for a point at the first depth level. temp_2 is the temperature at the same location at the next depth level, etc.). For each slider value change, I simply update the renderer to point to each of the field names for the depth level indicated by the slider.

Then the ArcGIS JS API takes over and quickly makes the update so we can see how temperature, salinity, current speed, and current direction change for each location at various depths of the world’s ocean.

Equatorial currents at 5m below the sea surface. — Equatorial currents at 5 meters below the sea surface.

Equatorial currents at 95 meters below the sea surface. Notice how the same locations show colder water and slower currents moving in the opposite direction as those closer to the surface.

Wrap up

There’s a lot the ArcGIS JS API and ArcGIS Online do to help improve web app performance. But that doesn’t excuse the app developer from doing what they can to reduce the size of large datasets in the browser.

Remember to do the following to reduce the size of your data if possible:

Familiarize yourself with the data
Preprocess the data (e.g. table pivots) to leverage powerful JS API capabilities
Thin the geometries
Thin the attributes

Here are some related resources to check out:

Kristian Ekenes

Kristian Ekenes is a Principal Product Engineer at Esri specializing in data visualization on the web. He works on the ArcGIS Maps SDK for JavaScript, ArcGIS Arcade, and Map Viewer in ArcGIS Online. Kristian's work focuses on researching and developing new and innovative data visualization capabilities of geospatial data in web maps, Arcade integration in web maps, and applications of generative AI assistants in web maps. Prior to joining Esri, he worked as a GIS Specialist for an environmental consulting company. Kristian has degrees from Brigham Young University and Arizona State University.

Article Discussion:

3 Comments

Oldest

Newest

Inline Feedbacks

View all comments

April 24, 2020 | Julie Powell | Announcements

DevSummit 2020 – ArcGIS API for JavaScript Recordings Available!
October 3, 2024 | Paul Barker | Data Management

Scalable hosted feature layers in ArcGIS Online: Tile queries and response caching
January 29, 2020 | Kristian Ekenes | Mapping

How and why to adjust symbol size by scale in web maps