Today, humans are awash in an endless stream of online information channeled by social media platforms. The content, posted mainly by the general public, includes commentary, criticism, observations, photographs, and videos about virtually anything and everything. Often, the postings prove to be little more than a distraction in people’s hectic, day-to-day lives.
Still, social media is a powerful communication mechanism. Market research company Statista estimates that there are 2.77 billon regular users of social media today. Technology has driven its growth with progressively more powerful servers, faster internet speeds, and mobile devices that are increasingly common throughout the world. In addition, social media platforms often deliver application programming interfaces (APIs) that allow developers to access their databases so they can create apps that make use of social media data. In turn, data scientists can aggregate and analyze social media posts in real time and apply that data to a wide range of studies. This includes figuring out how to mitigate traffic congestion on roads, monitoring and responding to national and international events as they unfold, and detecting potential disease outbreaks and uncovering their impacts on society.
At San Diego State University (SDSU) in San Diego, California, the Center for Human Dynamics in the Mobile Age (HDMA) was founded in 2013 with the aim of transforming academic research into serviceable information that can lead to making real-time decisions and sound public policy changes. HDMA works across disciplines, including geography, computer science, civil engineering, sociology, public health, linguistics, management information systems, accounting, communication, social work, digital humanities, and public affairs. And it develops scientific theories and computational models for human dynamics (which includes people’s interactions on communication networks), big data, social media, and data science.
“Programs of this type point to the future of education because they have the potential to address real needs, rather than purely academic research,” said Ming-Hsiang Tsou, a professor of geography at SDSU and the director of HDMA.
Tsou also believes that geography is the key to understanding real-world big data and integrating it into people’s daily activities and operations to provide actionable solutions. That’s because geographically based data sources, such as GPS devices, environmental sensors, and other monitoring instruments, specify where things occur and provide contextual knowledge for big data.
“By its nature, this data is big, messy, unstructured, and noisy,” said Tsou. “The key concepts of geography—place, time, and scale—can help data scientists clean the noise, understand the context, and answer the questions when and where. This can provide greater insight and knowledge for data analytics.”
For example, analyzing a flu outbreak using Twitter posts that include a geographic location can help public health agencies allocate vaccines to the right places at the right time.
“Our research focuses on the location-based analysis of geotagged social media data,” said Tsou. “This allows us to identify hot spots and compare our results within city districts or between local regions for analysis and action.”
Recently, HDMA did a project to determine urban land-use patterns in Beijing, China. The group collected 9.5 million geotagged social media messages from the social media platform Sina-Weibo for six months in the urban core areas of Beijing and compared them with 385,792 commercial points of interest (POI) from Datatang, a Chinese digital data content provider. To estimate urban land-use types and patterns, the team created a grid measuring 400 x 400 meters to divide the urban core areas into 18,492 cells.
“By analyzing the temporal frequency trends of social media messages within each cell using the K-means clustering algorithm, we identified seven types of land-use clusters in Beijing: residential areas, university dormitories, commercial areas, work areas, transportation hubs, and two types of mixed land-use areas,” explained Tsou. “Text mining, word clouds, and the distribution analysis of POI were used to verify the estimated land-use types successfully. This methodology can help urban planners create and analyze up-to-date land-use patterns in a cost-effective manner and better understand dynamic human activity patterns within a city.”
HDMA has developed several computer programs to automatically or semiautomatically collect social media data from Twitter, Sina-Weibo, Google Places, and Reddit. The data is saved in MongoDB, a NoSQL database. NoSQL provides a way to store and retrieve data that is not modeled in tabular relations, the method used in relational or SQL databases.
The data is also exported from MongoDB to the ArcGIS platform to create point and kernel density maps, among other analyses. In addition, HDMA has installed ArcGIS Enterprise for its GIS and ArcGIS GeoEvent Server capabilities for a special project it’s doing for the County of San Diego’s Office of Emergency Services. The project involves collecting and processing real-time traffic data feeds from Waze, the community-based traffic and crowdsourcing navigation app. GeoEvent Server will host the data layers created for an ArcGIS Online service the county is putting together.
The center has also created two software toolsets to analyze and display the data it collects from social media platforms. The first is SMART Dashboard, a search tool for geotagged social media messages. It monitors and aggregates the dissemination of information related to changes in social behavior and provides insight into how a local population is responding to an event or situation. It has been used to track the spread of Ebola, ovarian cancer clusters, wildfires, hurricanes, and marijuana legalization initiatives.
The second is GeoViewer, a web-based mapping app that visualizes the results of the geotagged social media analyses performed by HDMA. Its geospatial functions are easy to use and include the ability to display hot spot and cluster data layers; store multimedia images, including photos and videos; and map historic and real-time social media data. Tsou believes that this can be an important asset for emergency response.
“The metadata collected by social media platforms includes quite a bit of information, such as the identity of the author, when the post occurred, a geotagged location of the post, the content of the post itself, number of reposts, and so on,” said Tsou.
While all that data is incredibly useful, it also brings up concerns about privacy, which Tsou and his team are acutely aware of.
“Our research at HDMA is concerned about privacy issues, and we try to protect the users’ privacy as much as we can,” he said. “For example, our GeoViewer software includes geomasking techniques to randomize the actual geotagged locations of users within a 100-meter radius, even though this may slightly reduce the accuracy of our spatial analysis results.”
Tsou thinks that GIScience and data science will only become more tightly integrated.
“That will allow the creation of a new discipline that I call geospatial data science,” he said. “I see it as a transdisciplinary field that will extract knowledge and insight from geospatial big data using high-performance computing resources, spatial and nonspatial statistics, spatiotemporal analysis models, GIS algorithms, machine learning methods, and geovisualization tools.”
He also expects the Esri Geospatial Cloud to play a key role in this new field because of its comprehensive cyberinfrastructure.
“Its seamless technology stack includes a geodata hub for sharing assets and facilitating community engagement, cloud services, online analytic tools, real-time big data processing, and a nice set of presentation options,” Tsou explained. “I believe that geospatial data science will facilitate critical spatial thinking and problem solving for various applications and industries and enable the exploration of new scientific theories.”