Summer 2005 |
|||||||
|
Managing Spatial Data in ArcGIS 9.2 |
|||||||
In recent years, two major trends have had a profound impact on GIS data management. First, data volumes have expanded greatly and are continuing to increase dramatically. Ten years ago, 100 GB was thought to be a large GIS database. Today, 10 TB is considered a large GIS database, and it will not be long before GIS users are working with databases measured in petabytes. Second, GIS enterprises are becoming increasingly distributed, that is, users at different (sometimes mobile) geographic locations want to use data stored at several different locations. This has major data management implications. Users in different locations want transactional access to common enterprise databases; therefore, databases in different locations must be synchronized. Similarly, some users want to take parts of an enterprise database into the field for viewing or editing. Geodatabases Are Esri's Solution for Managing Geographic InformationEsri uses the term "geodatabase" to describe an integrated collection of geographic information. Geodatabases are managed using ArcGIS software and can store and retrieve virtually any type of spatially referenced data. Geodatabases can manage large volumes of data with high performance in a multiuser environment. Geodatabases manage all the basic geodata types, including simple feature vector data types (points, lines, and polygons), as well as more advanced features that use rules for defining relationships, topologies, and behaviors of features. Geodatabases also manage feature attributes, feature-linked annotation, terrains, survey measurements, addresses, 3D objects, CAD drawings, and images. ArcGIS software is used to maintain data quality and make it easy to control the editing of work flows. The result is that geodatabases can model the world better than any other geographic database management environment. Geodatabases Manage TransactionsUpdates and HistoryGeodatabases implement advanced multiuser access procedures (versioning) that manage long transactions and design alternatives that are common in GIS applications, such as land administration and utility work order applications. These transactions can last for long periods of time: minutes, hours, weeks, and even years. Geodatabases also need to support multiple participants being able to reconcile updates that may be in conflict. Esri's ArcGIS software and the geodatabase environment manage versions in a seamless and high-performance environment. Geodata Management Is Key for a Successful Enterprise GIS ImplementationData management is a very important and serious business. Data integrity and security are vital because building and maintaining spatial databases are time consuming and expensive and can be central to the core mission of some organizations (e.g., land records administration). Data management often constitutes a large portion of the GIS activities of enterprise GIS organizations. Geodatabases Leverage DBMS Technology
Esri generally recommends that large multiuser geodatabases be stored and managed using industrial-strength DBMS technology. ArcGIS has been engineered to work openly with a variety of different DBMS platforms, including IBM DB2, IBM Informix, Microsoft SQL Server, and Oracle. This gives Esri users flexibility and avoids the requirement to standardize on a single DBMS vendor. This open platform strategy has been implemented using Esri's ArcGIS data access technology commonly known as ArcSDE. ArcSDE is optimized to give maximum performance when accessing data in DBMS from any of Esri's products: ArcGIS Desktop, ArcGIS Engine, and ArcGIS Server. ArcSDE stores GIS features in a DBMS using a binary storage format, which experience has shown provides the fastest query performance and the most compact storage (least disk space) of any known technology. At ArcGIS 9.2, Esri will be able to work fully with the Oracle spatial data type that implements simple feature support in the Oracle DBMS. While generally slower, ArcGIS supports the same GIS functionality when the Oracle type storage option is used. This is in addition to the existing Esri spatial type implementations for data storage and access on IBM DB2 and Informix databases. GIS Data Management Requires More Than DBMS TechnologyWhile DBMS offers excellent tools for managing tabular data and providing distributed access, it does not deal with the many significant issues introduced by GIS work flows (e.g., data compilation and editing, ensuring spatial data integrity, supporting long transactions, and reconciling versions in distributed databases). Where sufficient for the task, core DBMS technology is used for data management (e.g., administration, accessing data, database replication, and security). However, Esri uses GIS-specific functionality for complete spatial data management work flows. These work flows make it possible for multiple users to access a central database from any wired or wireless network connection (subject to the usual access privileges). For example, users can check out/check in a version for use in a remote, field-based editor. ArcGIS 9.2 Data Management EnhancementsAt ArcGIS 9.2, Esri is making some major improvements and enhancements to the data management capabilities of the software as part of a long-term commitment to provide state-of-the-art spatial data management capabilities to GIS users. Enterprise IntegrationRecognizing that spatial data has many unique characteristics and management requirements, Esri's goal is to manage it using the same industry-standard DBMS products that are used to manage other enterprise data assets. Esri's approach is to build on top of these enterprise technologies to accommodate the specific workflow requirements of advanced GIS applications. ArcGIS 9.2 adds three new features for integrating GIS data with other enterprise data. Nonversioned EditingPrior to ArcGIS 9.2, a geodatabase could only be edited by multiple users if it had been versioned. This included all spatial and nonspatial database tables. While this presented few problems for GIS-only databases, it did create some difficulties for organizations that used the same database for GIS and non-GIS applications. At ArcGIS 9.2, multiuser editing is possible without versioning. Esri has added a short transaction editing model for simple feature databases that can be applied on a table-by-table (feature class-by-feature class) basis. In this way, both GIS and non-GIS applications can share access to a common DBMS without adding the overhead of versioning to those applications that do not need it. Versioned Data ReplicationThe challenge of providing distributed users access to federated databases (a single logical database spread over several network nodes) is addressed in ArcGIS 9.2, which allows a version of a geodatabase to be replicated to another geodatabase. Users can choose to replicate all or only some of the datasets in a version and can further restrict the data that is replicated using spatial and attribute queries. At ArcGIS 8.3, single-generation (checkout/check-in) replication was added to the software to allow users to take a portion of an enterprise database into the field, edit it, and then check in their changes. ArcGIS 9.2 addresses the more difficult problem of synchronizing two or more databases with multiple generations of edits to each database. For performance or enterprise workflow reasons, organizations sometimes need to have editable copies of databases in two or more locations. Synchronization requires that all edits made in each of the databases be transferred in a robust way to the other databases. ArcGIS 9.2 extends the checkout/check-in model, allowing the checked-out version to be periodically refreshed from the master geodatabase and allowing multiple check-ins. Because of the need to handle long transactions and be able to reconcile conflicting edits, the replication procedure builds on top of the Esri versioning model. In fact, database changes are moved between the databases as changes to versions, and the standard reconcile and post mechanisms are used to integrate (synchronize) the changes. In this way, changes can be moved between databases without a network (e.g., using DVDs) or periodic transfer over loosely coupled slow networks, such as the Internet. Spatial SQL for OracleArcGIS 9.2 adds full support for a spatial SQL interface for the geodatabase when implemented with Oracle. This interface was requested by many users and allows access to ArcSDE simple features using standard ISO Multi-Media/Open Geospatial Consortium SQL statements. Esri already supports spatial SQL interfaces for IBM DB2 and Informix. This interface allows users to access, create, update, and delete spatial data via standard SQL, the de facto database access language. The interface also accesses the geodatabase environment with an open and standards-based set of functions. The SQL capability makes Esri's geodatabase features available for use by any SQL developer or user and provides open access to geodatabases. Data is stored using the Oracle large object data type. Esri's spatial SQL does not require Oracle Locator or the Oracle Spatial extension and does take advantage of the faster performance, indexing, and greater data compression of the ArcSDE technology. Information Model EnhancementsArcGIS features several significant enhancements to the geodatabase information model: Archiving Geodatabase HistoryA key goal of ArcGIS 9.2 is to allow efficient storage and query of historical database states. Prior to 9.2, this was possible using the geodatabase versioning model, but performance degraded over time as history archives grew in size. A new implementation that extends the geodatabase schema with new tables, a mechanism to automatically transfer changes to the archive tables, and a new query interface makes it easy to store and query database history. For example, querying a history-enabled geodatabase to find the area of all land parcels owned by a specific individual is a simple spatial and attribute query. TerrainEsri is introducing an innovative approach to working with massive terrain datasets (billions of points) at ArcGIS 9.2. Terrains are defined as a collection of feature classes that contain terrain elements (e.g., mass points; break lines; and special polygons, such as lakes). Like all feature classes, they are stored in geodatabases. Terrains are defined on the fly as the user displays and queries the data. High performance is achieved by using terrain pyramids. In this way, large terrains built from vast lidar and other datasets can be handled with ArcGIS. Double PrecisionThe entire GIS stack (ArcGIS Desktop, ArcGIS Engine, ArcGIS Server, the geodatabase, and ArcSDE) now stores and processes data using double-precision mathematics (technically 53 bits). This allows a single spatial domain to be used for the whole globe, which greatly simplifies the creation and definition of spatial datasets in a geodatabase. UnicodeSingle and multibyte characters are also now fully supported throughout the ArcGIS software stack. Geodatabase Storage Format
In the past, users have always equated geodatabases with storing data in a DBMS. At ArcGIS 9.2, this will no longer be true because Esri is introducing a file-based implementation of the geodatabase. File-Based GeodatabaseAt ArcGIS 9.2, the complete functionality of a personal geodatabase and its complete information model have been implemented on top of a file system. File-based geodatabases fully support vector; raster; terrain; annotation; and all other geodatabase data types, rules, and relationships without the performance and geodatabase size limitations of a Microsoft Access database. In performance tests to date, the file-based geodatabase implementation not only outperforms the Microsoft Access personal geodatabase, but it also outperforms the shapefile for display and query operations. As with the Microsoft Access personal geodatabase, the file-based geodatabase has a single-user editing model and does not support versioning. The file-based geodatabase will be a standard part of core ArcView, ArcEditor, ArcInfo, ArcGIS Engine, ArcIMS, ArcMap Server, and ArcGIS Server. While Microsoft Access personal geodatabases will continue to be a supported option, users will be encouraged to adopt file-based geodatabases as the native format for ArcGIS at 9.2 and beyond (converting data from Microsoft Access geodatabases to file-based geodatabases is a simple copy/paste operation). Since they can be compressed and are cross platform (Linux, Solaris, and Windows), file-based geodatabases are a good choice for data publishing. Like it did with shapefiles, Esri is providing an open API that will allow anyone to create and use file-based geodatabases. ConclusionClearly, ArcGIS 9.2 encompasses major enhancements to the already substantial data management capabilities of the ArcGIS platform. Optimized performance, closer enterprise integration, support for a wider range of data types and work flows, a low cost, and an easy-to-use geodatabase option are some of the many advantages ArcGIS 9.2 offers a wide range of users. ArcGIS 9.2 also consists of major enhancements in mapping and visualization, spatial analysis and modeling, and the developer framework that will be the subject of other articles. See also "Open Geodatabases." |