The world is producing data at an extraordinary pace, and this growth is expected to escalate with the widespread adoption of generative artificial intelligence. This has created a pressing need for skilled data scientists across industries and disciplines. For example, the US Bureau of Labor Statistics predicts that data scientist jobs will grow by 35 percent (about 59,400 positions) between 2022 and 2032.
Data scientists are also increasingly expected to not only interpret data through spatial data science, but also explain the why and where things happen. This is where the application of geospatial technologies and methodologies is invaluable.
“The result is more powerful and actionable intelligence,” said Wendy Keyes, PhD, principal data scientist at Esri.
As part of Esri’s Professional Services [per Compass] division, Keyes specializes in customer enablement and consulting. She leverages data to help customers comprehend their challenges, make informed decisions, troubleshoot workflows, and assess methodologies.
“I love that I get to play with data all day,” Keyes said. “I’m like a kid in a candy store. But mostly I love the variety of things that I get to work on because I get to work across industries.”
As the need for spatial data scientists expands, so do the educational programs designed to train them. In this interview, Keyes provides insight into this burgeoning field.
This interview has been edited and condensed.
Q: What is spatial data science?
Keyes: Spatial data science combines advanced spatial concepts and GIS features such as location; proximity; topography; autocorrelation; and connectivity with data science tools and processes such as AI, machine learning, advanced statistics, and quantitative solutions. These include clustering, neural networks, deep learning, and computer vision.
Spatial data science is part of the larger field of data science. The way I think of it is to break it down into capital-D and lowercase-d roles. A data scientist with a capital D has a comprehensive set of skills in scientific processes, mathematics, scientific methodologies, and advanced problem-solving. Data scientists perform various tasks, including data exploration, advanced statistical modeling, experimentation with AI, and building and tuning machine learning models. Having a geospatial skill set in this field can expand job opportunities, though, allowing individuals to apply their knowledge in new contexts and solve diverse problems. Capital-D data science careers span various industries, including environmental conservation, urban planning, public safety, technology, and business.
In comparison, lowercase-d data scientists aren’t part of a scientific discipline, yet some of the tasks they perform are considered data science tasks and may require spatial data science skills. These roles include business intelligence analysts, data analysts, and computer vision specialists.
Q: Why are spatial data science skills becoming more important?
Keyes: There’s so much data from Internet of Things devices, telemetry, and satellite imagery—we need to make sense of it all. Technologies allow us to process massive amounts of data and do complex analyses to understand the world better. This is where spatial data comes in. Many of our challenges have a spatial component. For instance, the rapid melting of ice caps is not just a matter of rate. We must also consider the volume of the oceans, slopes, landmasses, and where the water will flow. This involves physics, data, and geography.
Spatial data science offers a crucial perspective for addressing complex issues like climate risks and social justice. It complements other fields such as epidemiology and sociology by analyzing spatial relationships. This integration contributes to more comprehensive solutions for multidimensional problems.
Q: What are some common misconceptions about spatial data science?
Keyes: How we talk about our work varies by person. Even within the data science field, terms like GeoAI and spatial data are misunderstood by traditional capital-D data scientists because of learning barriers and a lack of knowledge about spatial science applications.
Spatial data science involves more than just overlaying data and mapping for visualization or plotting information on a map. Similar to how the medical field encompasses various specializations such as internal medicine, general practice, oncology, cardiology, and neurology, spatial data science is a distinct specialization within the broader field of data science.
This variability can lead to misconceptions and confusion about what a data scientist’s job entails, and [also] what jobs are not data scientist roles but [could] use some data science skills. And these misconceptions extend into the hiring process. Many recent graduates may have experience with certain aspects of data science or familiarity with spatial data. However, they often lack the broad skill set required for a role as a capital-D data scientist.
Q: What advice do you have for aspiring data scientists?
Keyes: Before pursuing a data science career, students should ask themselves if they enjoy problem-solving and innovation. To assess this, they should consider how they approach assignments. Do they just complete them for a grade, or view them as challenges to showcase their best work? Do they continue after the grade is assigned?
When considering a career in data science, it’s important to decide if you want to focus on theory or apply your knowledge to solve specific problems. This distinction will impact how you choose your studies.
Individuals pursuing this career should possess quantitative and programming skills. This means having an intuition about the data, anticipating findings, and spotting code errors to evaluate results better.
I’ve always had very strong quantitative skills and a passion for engaging in dialogues with data. I love uncovering the stories that data tells us and deriving insights from them. I have the best job in the world.
Q: What factors should students consider when selecting a university’s spatial data science program?
Keyes: Many of today’s data science programs are new due to the recent popularity of and demand for data scientists.
I would advise students to look for rigorous and accredited programs that develop mathematical prowess and a foundation in curiosity, problem-solving, innovation, and a scientific approach. There should be a balance of methodology and theory behind the program. It’s important to note that it doesn’t necessarily have to be a data science program. Many other courses of study, such as statistics or physics, can provide you with the necessary skill sets.
In addition, college students should examine the coursework. Is the focus more on theory or application? Is it engaging enough that you can envision yourself studying it for a significant amount of time? It’s also advisable to review Hugging Face, a collaborative community website for AI and machine learning resources, and compare it to the program’s curriculum to see if it’s up-to-date with what’s being discussed on the site.
If spatial data science is what you want to specialize in, it’s worth researching the curriculum to see how the university has adopted spatial data. Is it only mentioned in a course title? Or from reading the course description, do you get a sense that you will learn how location and spatial relationships affect methodologies? In other words, does the program clearly highlight what makes spatial data unique and different from traditional statistical concepts?
Read the publications of the faculty teaching in that program. This will give you an idea of their work, interests, and understanding of spatial analysis as well as research opportunities for assistants.
Lastly, does the program require a comprehensive project, thesis, capstone, or dissertation? Don’t just aim for a good grade. As a future data science practitioner, you need to finish the job. A comprehensive project builds your portfolio and shows [that] you can see a challenge through.
Note: Esri offers a wide variety of massive open online courses, tutorials, and other resources for geospatial data science.