Epidemiologists have the critical task of investigating patterns and causes of disease and injury. This investigative work begins by conducting a cohort study. A cohort study is a type of epidemiological study in which a group of people with a common characteristic are followed over time to discover how many reach certain health outcomes. During the COVID-19 pandemic, public health surveillance became critical, and a state health agency was ready to respond.
One important aspect of this research was trying to understand the impacts of COVID-19 on maternal and infant health outcomes. Maternal health epidemiologists first needed to identify women that were pregnant when they contracted COVID-19. They partnered with Esri to modernize their disease surveillance program and used ArcGIS Knowledge to accelerate cohort identification.
Challenge
To create the 2020 cohort of 7,000 people for this study, the maternal health epidemiologists reviewed 3 million case records, 30 million lab records, and over a million public birth and death certificate records. This data resided in 8 different databases and varying formats such as spreadsheets. The data review process took the team two years and hundreds of hours to complete.
The amount of time dedicated to data management completion was time-consuming because of how manual, slow, and inconsistent it was to work accurately through the data. Defining a cohort needs a fully dedicated person, and it is hard to repeat their work because methods differ depending on who is doing the work. They needed to significantly cut the amount of time spent doing data management to achieve precise and repeatable results for the rapid detection of risk signals and the ability to address clinical questions related to maternal health.
An epidemiologist believed graph data and graph analytics might be a promising way to analyze the kind of data they had.
“The data will get larger, the number of labs, access to data. The amount of data we need to analyze will only increase. Esri has a product without SQL database design for Public Health Services. I needed to stay with the times,” says epidemiologist at a state health agency.
Solution
The agency turned to Esri and became an early adopter of ArcGIS Knowledge. ArcGIS Knowledge is an enterprise extension that seamlessly provides powerful new graph analysis capabilities to ArcGIS Enterprise. ArcGIS Knowledge would enable them to modernize the state’s integrated disease surveillance system by using a graph database as the foundation to interact with public health data from multiple data sources. The knowledge graph would help them better identify and validate the right candidates for the cohort among the 30+ million people by connecting names, addresses, and other relationships that may be different in lab records.
This process required epidemiologists to first understand if a woman in a lab result reported to the state was the same woman listed on a birth record for a new baby based on other supporting attributes. To address this requirement, the department of health worked with Esri Partner Senzing. Senzing offers entity resolution software for advanced data matching. This partnership allowed the agency to transition from earlier methods of identity resolution that use deterministic and probabilistic matching techniques with snapshots of data to identity matching in real-time as latest information gets reported.
After creating a new data pipeline and resolving the identity matching concern among the variety of data records, the agency used the ArcGIS Knowledge graph database as a data warehouse for analysis of this clean data.
Benefits
Using the graph visualization tools in ArcGIS, the epidemiologist and team can now easily validate collected data and see the full picture of their selected cohort, including the location patterns of disease discovery and impact. The team can also get more details on any person of interest within the cohort for further investigation. Querying the knowledge graph for a cohort of people with a COVID-19 diagnosis became a more streamlined exercise that returned the 12,000 expected results for the 2021 cohort in seconds.
With the new data warehouse in place leveraging the new entity resolution workflows, the epidemiologist can define and find additional cohorts of pregnant women with COVID-19 in minutes instead of the hundreds of hours it took previously. These time savings allow his team of epidemiologists to focus on higher value analyses.
Now, hundreds of other epidemiologists at the state health agency are also poised to leverage this disease surveillance modernization work, using the same graph data warehouse to find other cohorts of people impacted by known and future diseases with public health risks. With this new infrastructure, it may be possible to gain information quickly enough to establish preventative measures and save lives.
Contact your Esri representative to learn more about ArcGIS Knowledge.
Get started with ArcGIS Knowledge in this technical workshop. Learn how you can extend your workflows to explore, edit and visualize knowledge graphs, discover new data patterns through graph analysis, and more! Watch the introduction here.
See how ArcGIS Knowledge was used to track foodborne illnesses in Ohio by integrating knowledge graphs and analytics. ArcGIS Knowledge allowed field investigators to connect spatial and non-spatial data through entities and relationships. See how it works.
Commenting is not enabled for this article.