GIS - Computational Problems: GIS Overview

GIS - Computational Problems: § 1: GIS Overview

Instructors: G.X. Ritter and M.S. Schmalz

In the vast majority of human societies that have developed in this historical epoch, land ownership or control is or has been the primary basis for wealth. Agriculture, access to fisheries, mining, and access to shelter are all constrained by control of land. Upon these basic activities civilization is constructed, including ancient institutions such as barter markets, universities, places of worship, or seats of government as well as more modern institutions such as manufacturing, technology, or finance. Today, we denote land control by two phrases -- land ownership and regulation of land use.

Background. Different cultures have different policies and methods for regulating land use. For example, in the United States, much land is publically owned (e.g., over 70 percent of the land area in the states of Nevada and Alaska is directly owned and controlled by the Federal government). Hence, local, state, and federal governments have much to say about how land is used (e.g., zoning, pollution control, public works construction, etc.) In contrast, in the United Kingdom, where very little land is publically owned, land use policy tends to be approached more from the perspective of consensus among private individuals or organizations who may be guided by a government advisory institution, group, or individual.

In order to optimize land usage for the benefit of society as well as the individual, one must know the attributes of a given land area. For example, in agriculture one might ask if the terrain is suitable for farming, what is the climate and rainfall, and who is (are) the landowner(s)? Also, what is the proximity of one's prospective farm site to nearby markets, and what is the flow of commerce (e.g., market potential) in those areas?

To help answer such questions, the discipline of cartography (i.e., the creation and study of maps) was united with computer science subdisciplines of database design and management, computer graphics, image processing, and with the mathematical subdiscipline of spatial analysis, to produce an area of study called geographic information systems (GIS). This subdiscipline of computer science did not emerge full-grown, but evolved as computer hardware and software became more capable and available. The GIS evolutionary process took approximately 30 years to reach its present state, and geographic information systems are still in a state of adolescence, as evidenced by numerous implementational problems. For example, algorithms for database query and search are not sufficiently fast to accomodate the very large amounts of GIS data currently accumulated. Similarly, workstations generally have insufficient computational power for merging GIS datasets in real time, with high accuracy. In some cases, dataset merging algorithms are not available that perform the desired types of feature detection, recognition, and combination.

In this course, we briefly examine the history and culture of GIS, then consider a number of key computational problems. The purpose of this course is to (a) provide orientation to the concepts and general practice of GIS, (b) provide sufficient background information to facilitate analysis of theory and system performance, and (c) pose research questions and support design and development at the leading edge of GIS theory and algorithms, analysis of GIS data, and efficient, accurate GIS implementation.

Definition. A geographic information system is:

A computer software system (with supporting hardware)
that manages data pertaining to land, water, and air resources,
such that one can store and retrieve such geographic information
as well as analyze stored information or additional parameters derived from stored information.

Section Overview. In this section, for purposes of orientation, we summarize basic concepts of GIS. It is important to note that this section does not have as its goal the teaching of technique for using any particular GIS software system. The material is structured as follows:

In Section 1.1, we trace the development of GIS from map databases to the present state-of-the-art systems with graphical map interfaces and integrated map imagery or positional analysis capabilities. Section 1.2 contains an overview of GIS applications, which range from resource management to determining the suitability of a given land area for setting up a particular type of business. In Section 1.3, we discuss the high-level organization of GIS systems and data, which introduces a summary of GIS software and data analysis tools in Section 1.4. The historic and detailed topic of map projections is overviewed in Section 1.5, with comments on the adaptation of classical map projection theory to the needs of GIS system designers. Resources for obtaining GIS map data and satellite/airborne imagery are discussed in Sections 1.6 and 1.7, with special emphasis devoted to on-line resources available via the World-Wide Web.

1.1. Brief History of GIS System Development.

The following synopsis is abstracted largely from Coppock and Rhind's excellent history of GIS presented in [Magu91], with details added from [Toml88].

Manual GIS systems evolved from the discipline of cartography, where architects or site designers needed to visually compare the building plan with the site survey. With more elaborate urban planning in the 19th and early 20th centuries, map overlays became popular. These were translucent paper (and later, plastic) sheets upon which maps were drawn or printed. For example, a map of site drainage could be overlaid on a topographic map of the site, or on a street plan. This could help the urban planner or site designer determine whether or not to locate houses or other buildings at a given place (e.g., close to a creek or river floodplain). The same overlay process can be used to superimpose structural data on an aerial photograph. This technique was often used by site designers or architects when presenting designs to a prospective client.

It was the experience of publishing atlases of national and international scope which convinced mapmakers that computers could provide cost-effective means of map drawing, cataloging, and analysis [Bick63]. Perhaps the earliest attempt to automate map production was the Atlas of British Flora, which used a punch-card tabulator to produce maps on preprinted paper from cards that recorded map coordinates of plant occurrences [Per62]. Although not repeated due to its primitive nature at a time when computers were evolving rapidly, this effort anticipated later (and frequent) practices of producing approximate maps via line printer. Slightly later work by Bertin (circa 1967) involved modification of IBM Selectric typewriters driven by punch-card readers to produce maps with proportional symbols.

In North America, the earliest ancestors of GIS appeared at the University of Washington in the early 1950s, where geographers and transportation engineers developed quantitative methods for analyzing transportation study data [Duek74]. In 1964, IBM introduced the System 360 computer, which was the first available general-purpose machine, having 400 times greater speed and 32 times greater storage than its predecessor, the IBM 1401. By 1965 the US Bureau of the Budget compiled an inventory of automatic data processing (ADP) in the United States (US), which noted the significant use of computers to handle land use and land title data [Cook66]. In 1967, a symposium on comprehensive unified land management systems at the University of Cincinnati was advised about usefulness and design constraints of ADP machines for these purposes.

The first demonstrations of address matching, computer mapping, and small area data analysis were provided through the 1967 New Haven Census Use Study [USBC69]. Launch of the DIME workshops in 1970 and the development and distribution of ADMATCH (address matching) software influenced the creation of the Spatially Orientated Referencing Systems Association (SORSA), which still holds international conferences. The increasing availability of computers to universities strongly motivated development of the quantitative revolution in academic geography in the early 1960s [Jame78, Huds79], particularly in the field of spatial analysis. These applications, despite their capability of handling geographic data, had little interaction with computer mapping, since the statistical methodology was primarily aspatial. An exception was the previously-mentioned making of crude maps using a line printer [Rush69].

In the 1960s, computers available to domestic government agencies and university laboratories had few graphics capabilities, generally operated in batch mode, and were quite expensive and unreliable by today's standards. Despite this, the US National Ocean Survey (NOS) was creating charts on mechanical plotters for production of bathymetric maps, and military organizations such as Rome Laboratory (then the Rome Air Development Center) and the CIA were active in this area [Diel98, Toml72]. By the end of the 1960s, map production by computer was widespread. Although little cost analysis information is available, it appears as though the automated methods were not yet competitive with manual map production [Toml85], due largely to the high cost of procuring and maintaining computer hardware.

Unlike the United Kingdom, where automated map digitization was in progress, with automated map production from 1973 onward, the United States Geological Survey's (USGS) Topographic Division did not implement automatic production of topographic maps until the 1980s, which severely hindered the development of many GIS projects in the US. However, the universities, hampered by lack of funds, continued to produce line printer maps. Although crude, this method allowed sufficient visualization capabilities that effort could be devoted to the development of primitive analysis software, which was designed to operate in conjunction with the cartographic systems.

In 1963, a grant was obtained from the Ford Foundation to initiate the Laboratory for Computer Graphics at Harvard University, where Fisher and his colleagues built a team of scientists, engineers, and programmers who eventually created SYMAP. This was a computer mapping package that could produce isoline, choropleth, and proximal maps on the line printer. It was particularly useful for analyzing census data, and was widely distributed (approximately 500 legal copies, about half of which were in universities). A subsequent package, CALFORM, which used pen plotters and produced more accurate maps, was less successful, probably due to the high cost of plotter hardware. SYMAP was the first widely distributed package for handling geographical data and introduced large numbers of users to the potential of computer mapping. Steinitz and Sinton produced a cell-based program, GRID, which permitted overlays of data in conjunction with SYMAP. The Laboratory also produced a number of professional cartographers and computer scientists who contributed to the design and construction of ODYSSEY, the prototype of contemporary vector GIS [Chri88].

At approximately the same time that Fisher was developing computer mapping at Harvard, Tomlinson (working with the Canadian government) guided the creation of what may have been the first GIS [Toml88]. Indeed, Tomlinson is often hailed as the "father of GIS", due to his persuasion of the Canadian Government that the creation of the Canada Geographic Information System (CGIS) in 1966 was worthwhile. This effort dated from the early 1960s, when Tomlinson worked for Spartan Air Services and had numerous conversations with colleagues and Department of Agriculture (DOA) administrators regarding the utility of digital computers in mapping. Tomlinson eventually worked within the Agricultural Rehabilitation and Development Administration (ARDA) which, in cooperation with IBM, led to the following significant developments in GIS technology:

A drum scanner for rapid digitization of maps, which was based on IBM's earlier work in digitizing aerial photographs;
An efficient data indexing scheme called the Morton Index (1966), which was widely copied; and
Topological coding schemes for map region boundaries that involved the first known use of the link/node concept for encoding lines and regions.

The system Tomlinson helped developed was fully operational in 1971 and contained (as of 1991) over 1,000 maps on more than 100 different topics. Macguire [Macg91] states that, excluding systems based on remote sensing data and the recent TIGER system, CGIS may be the largest GIS in operation and the only one to cover a continental area in great detail. Other factors, such as land management policy in Canada, the apparently passive nature of CGIS' custodianship, and apparent lack of computer networking within the various agencies that administered CGIS tended to contribute to its under-utilization. Despite his success at ARDA, Tomlinson left in 1969 and became a private consultant in GIS, continuing to chair the International Geographic Union's Commission on Geographical Data Sensing and Processing, from its establishment in 1968 through 1980.

By 1976, there were at least 285 computer software packages that handled spatial data and were developed outside USGS [Toml76], which number rose to over 500 in 1980 [Marb80]. Due to lack of contact between software developers, much duplication resulted. Large states in the US developed their own GIS software, some using CGIS, others adapting SYMAP, and still others generating proprietary, in-house products. Tomlinson [Toml88] describes the 1970s as a period of "lateral diffusion" rather than innovation. In the defense community the CIA developed a world data bank that subsequently was made available in the public domain [Ande78]. No comprehensive history exists of these local approaches, but the Bureau of the Census and USGS have made representative progress, namely linking cover from the USGS 1:100,000 topographic maps with tracts for the 1990 census, which has motivated further GIS developments in the US [Call84].

The Census Bureau's involvement in geographical data processing began with the New Haven Census Use Study in 1967 and led to the Dual Independent Map Encoding (DIME) scheme that featured data encoding for census areas and experimental computer-generated maps of census data [Schw73]. DIME recorded the topological relationships of streets, but did not use coordinate information in its earliest versions. During 1972, the Bureau developed (with Harvard Graphics Laboratory) the Urban Atlas Project, which digitized approximately 35,000 metropolitan census tracts in a cost-effective manner. This required software development, which was supported mathematically by Corbett's paper (Corb79) on the topological principles underlying cartography and GIS. From these beginnings emerged DIME, ARITHMICON (an improved system with analytical capabilities), and TIGER, a large, comprehensive civilian GIS [Rhin91].

USGS was concurrently involved in the creation of the Geographical Information Retrieval and Analysis System (GIRAS), developed specifically for handling information on land use and land cover. Input was manually-produced maps at a scale of 1:250,000 derived from aerial photography, which are currently updated automatically using Landsat imagery [Mitc77]. As graphics hardware became available the batch-mode GIRAS-1 evolved into an interactive version (GIRAS-2).

An advanced project undertaken at the state level was the Minnesota Land Management Information System (MLMIS), which transitioned from early developments in 1976 at the University of Minnesota's Center for Urban and Regional Analysis to the state level, where it operated on a "fee for service" basis. MLMIS was based on a digital land use map of the state that was prepared from aerial photography, and was unfortunately based on a coarse grid. Despite its shortcomings, MLMIS has supported several hundred successful GIS projects in its lifetime [Robi91].

As mentioned previously, after the termination of the original Ford Foundation grant, Harvard's Graphics Laboratory produced the vector-base GIS system, ODYSSEY, which was in operation by 1979. Unfortunately, the commercial vendor, ISSCO, who contracted with the Laboratory, withdrew after early advertisement of the software, incurring heavy debt that eventually caused termination of the Laboratory. A more fortunate circumstance occurred with the establishment of the Environmental Science Research Institute by R. Dangermond in 1969 [Dang88], who eventually produced the ARC/INFO package that has become something of a standard in GIS. Intergraph, ComputerVision, and Synercom were also major players in the 1970s, and most of these approached GIS from the CAD/CAM area. However, ESRI's excellent record of high-quality products is exemplar, as discussed in the following paragraph.

ESRI began as a not-for-profit organization that developed the cell-based GRID package, which remained its main applications system until the introduction of ARC/INFO in 1982. A three dimensional version of GRID was called GRID TOPO, and in the late 1970s ESRI marketed a vector-based system called the Planning Information Overlay System (PIOS). ESRI has been the most successful vendor throughout the 1980s and 1990s, due to its ARC/INFO system, now in Revision 7 and available cross-platform for a wide variety of applications. ESRI's committment to high-quality products and their ongoing service provision have led many state and local governments to adopt ARC/INFO as their standard.

In the 1990s, GIS has matured somewhat, with research directed away from the basic issues of map production and encoding, which have been solved after a fashion. The current issues of importance are more related to engineering concerns, such as map accuracy, precise co-registration of maps and imagery, and the vexing problem of combining datasets of diverse formats. In the latter case, much data has accumulated from earlier vector-format GIS, which is difficult to accurately integrate with current GIS systems. It is issues such as these, including the troublesome problem of errors in GIS data, that we will address in this course.

1.2. Overview of GIS System Applications.

As we have seen from the historical overview, GIS have been used widely in land management, census, forestry, resource management, and other environmental areas. What has not been discussed is the utility of GIS in two key areas, namely, military and commercial applications.

In the military sector of government, one needs to answer several questions, namely,

What resources are deployed over a given area?
What are the temporal and spatial relationships among various resources?
What are the attributes of each resource that are of salient operational interest?

Additionally, battle planners must reason over these databases to extract information that describes, for example, shortest (and safest) routes and travel times across various types of friendly or hostile terrain, probable destinations of mobile threats, or likelihood of engagement under various force projection scenarios. Military GIS systems are often integrated with battle simulation systems (wargaming software) to yield high information content in support of key tactical decisions. This capability is expected to enhance the current trend toward speedup of warfare, which will further reinforce the development of military GIS applications.

In practice, DoD's mapping effort has grown manyfold in the 1990s, and currently emphasizes terrestrial mapping at a putative resolution of one meter (although interpolation schemes have been developed that enable much greater resolution). The data storage, manipulation, and analysis technology required to support global warfare using such large-scale systems challenges the best of computer science and technology. Indeed, this thrust serves as a motivator for much of the research discussed in this course.

The second application of GIS is in commercial forecasting, for example, in mass marketing of products and services. GIS are widely available for personal computers and workstations that have large marketing databases which can be searched in a content-based fashion using graphical user interface (GUI) directed queries based on displayed map information. This fusion of spatial analysis with commercial databases is currently being accelerated by the field of database mining, which provides useful information on temporal trends by exploiting archival data acquired over periods ranging from years to decades. Since much of the census data in the US is publically available, and datasets about consumer behavior are frequently bought and sold, there are rich, extensive resources to support GIS-based commercial analyses.

As an example of different types of queries that may be issued to a GIS, consider the following query and analysis modes:

Resource Inventory -- What objects and object types are available within a given geographic area? For example, what tourist attractions are available for children ages 3-10 within a 20 mile radius of the Orlando, FL metropolitan area?
Network Analysis -- What routes and travel times are available within a given geographic area? For example, how does one get to the attractions mentioned above, and how long will each trip take at a given time of day? What are optimal routes under given road conditions? (Note that a current topic of research interest is the integration of such information with local weather and traffic reports).
Terrain Analysis -- Examples of information derived from three-dimensional data include degree and direction of slope, analysis of visibility between locations and the generation of maps that show points visible from a given point under prespecified conditions. For example, one might investigate the siting of a tourist attraction in relation to an upscale neighborhood whose residents might not care to see such structures. Additionally, one could analyze watershed patterns in a given area to determine optimal drain placement for street and parking lot runoff during times of heavy rain.
Layer-based Analysis -- Consider a GIS that has various layers that describe terrain; surface structures such as buildings, roads, and bridges; as well as surface natural resources such as crops, timber, and bodies of water. Now, add to this data on subterranean structure, mineral deposits, etc.; together with land use data such as traffic patterns, types of buildings and construction, etc. Given such a database, one could (for example) analyze the probability of producing an aesthetically objectionable or environmentally dangerous siting of a phosphate mine, a limestone quarry, or a landfill near metropolitan Orlando. The combination and reasoning processes that occur between layers are currently the topic of much research, and will be discussed in this course.
Location Analysis -- Suppose that one plans to site a gravel pit in the Orlando area near tourist attractions, recreational watershed areas, and housing developments. One would use the aforementioned types of data and analysis to answer questions similar to the following:
- Where are the most promising gravel deposits located?
- How far are such deposits from major roads?
- What would the travel times be to the nearest Department of Transportation depot or sand-and-gravel companies? What types of roads would be encountered, having what truck weight restrictions?
- If a gravel pit was dug at location x, what would be the probability of environmental damage due to runoff at locations y or z?
From answers to these questions, one could formulate (either manually or by an optimization procedure) a list of locations that might be suitable for such an enterprise. Such data would be integral to a business plan, and could support one's request for bank loans or tax relief from local government.
Spatio-Temporal Information -- GIS are often equated with static spatial information such as maps. Unfortunately, this restricted view neglects the importance of change. In temporal GIS, data are referenced to space, time, and attribute. Temporal analysis is within the reach of present GIS data-handling capabilites, but is still an elusive goal after years of research. The problem lies mainly with the lack of a rigorous, complete temporal calculus. This deficiency will be discussed in lectures and may constitute a project or exam question.
In practice, one would use temporal data to answer questions such as:
- Which streets in a given neighborhood changed name over the last 30 years?
- Assuming the availability of real estate data, what areas of a given city experienced the most frequent home sales in a given price interval?
- At a more sophisticated level, what is the pattern of migration of small factories out of the upper New England states since World War II?
More interesting is the combination of temporal and spatial analyses, to answer questions about time history of the adjacency of different objects or object types. For example, what was the incidence of factories within 1,000 feet of residential areas between 1950 and 1990? How did such trends correlate with movement of residential populations to the suburbs?

In order to better understand how GIS systems deal with such queries, we next examine GIS organization and datatypes.

1.3. Organization of GIS Systems and Data.

GIS have four primary components, namely,

Data, which may be of type spatial, temporal, or attribute;
Engines that perform various data storage, retrieval, analysis, reporting, and communication functions;
Interfaces such as GUIs having widgets based on toolboxes such as X-Windows or MOTIF; and
Hardware, including workstations and networks, disk and tape storage, digitizers, plotters, and communications devices.

Since we assume that students in this course are well acquainted with interfaces and hardware, we will concentrate first on data, then develop the theory, algorithms, and implementations for specialized GIS engines as the course progresses. Engines are further discussed in Section 1.4.

1.3.1. Data and Databases.

GIS data has traditionally been classified as raster (array-based) or vector (a line segment defined by its endpoints).

Raster data are array-based, and are able to represent a large range of computable objects, although at limited resolution. Rasters stored as uncompressed raw data can be extremely inefficient spacewise. Compression of rasters to meet feasible storage requirements increasingly involves error due to approximation in the compression and decompression transforms. This is a matter of some concern where land use data are involved in commercial practice (e.g., lot boundary surveys in congested areas) or when targeting precision is required in military applications.

Vector data have the advantages of storage efficiency and infinite resolution within the limits of accuracy and precision at which the data was acquired or can be computed or displayed. Vectors are appropriate for a wide range of spatial data, especially as found in maps, due to the fact that region boundaries tend to occupy only a small fraction of a map or image. Hence, the vector information can be extremely compact in relation to uncompressed raster data at equivalent resolution. Unfortunately, vectors imply a hard boundary model that does not match observations of gradations between region boundaries in the natural world. For example, consider the boundary between a meadow that is gradually submerged as it blends into an adjacent swamp. Where does the swamp start and the meadow end?

A serious drawback of the raster/vector dichotomy is the conversion between formats, which is a key topic in this course. Rasterization of vector data has associated quantization errors, as does vectorization of raster data. Furthermore, approximation errors in coarse-grained rasters can often lead to serious misinterpretations of data, with unfortunate results for GIS users who are not aware of these problems. In Section 2, we discuss the related problem of data format conversion and coregistration of datasets, with theory given for error analysis and profiling. The format conversion and merging of GIS datasets is a topic of keen research interest that is discussed in detail in this course.

1.3.2. Data Quality and Usefulness.

The value of an database (which the scope of GIS includes) derives primarily from the quality (e.g., accuracy, precision, scope, and depth) and usefulness of its data. The following issues pertain:

Correctness and accuracy, which relates to the consistency and completeness of the link between data and the source area from which the data was collected. Accuracy has several constituents, including accuracy of attribute values and, for geographic data, accuracy of spatial and temporal references. Data quality is especially important when measurements are to be derived from the data, due to propagation (and increasing) of input error through (by) the subsequent cascade of computations.
Timeliness, relevance, and cost -- Timeliness is especially important when (a) GIS are used in short-term applications where response time is critical, or (b) temporal analysis is required. Relevance to the analysis is an understandably important (but often overlooked) feature of GIS data. This implies that the data faithfully portray the applications domain being modelled, with appropriate scope and depth. Cost is generally computed in terms of storage cost, effort required to obtain and validate the data, monetary cost of procurement and possible licensing, as well as peculiar features of the data that increase processing cost beyond that incurred by more routine data.
Usability and Accessibility imply several considerations. Accessibility depends first upon one's ability to reach data repositories through computer networks, and second upon the ability to read and write the data. For example, if data take a long time to read and write, they are accessible but not efficiently so. Additionally, data formats must be tractable to existing GIS tools and I/O libraries. In large databases, this can be a serious problem due to lack of version control and update, leading to data formats that differ unpredictably.

A database is a repository of data that should be logically unified but may be physically distributed. GIS databases are created and maintained using database management systems (DBMS), which have the following requirements for usefulness:

Security -- Prevention of unauthorized access and use, with allowance for different levels of access, is a basic feature of secure GIS. This holds especially in military, medical, and financial data processing. For example, one should be able to read one's own medical records or checkbook balance, but not the records of others.
Reliability -- Databases should be available and efficiently accessible when needed. Data contained in the database should be accurate within the tolerance implied or dictated by a given application for which the data is certified for use. Additionally, physical database implementations should be invariant to routine physical interruptions such as power surges or failures.
Correctness and Consistency -- GIS data should be consistent spatially, i.e., be coregistered or able to be coregistered with minimal effort. Furthermore, data specified at a given map coordinate should be referenced to that coordinate, with ground truth available to test spatial accuracy. This is a problem with older GIS databases, where map data was digitized using mechanical scanners that had accuracy less than one map linewidth. For example, in map production at a scale of 1:250,000, a digitizing error of 0.01 inches (which can occur frequently due to paper warping with temperature and humidity) implies a ground measurement error of 250,000 × 0.01 = 2,500 inches, or approximately 208 feet.
Transparency -- Data revisions and upgrades should be transparent to the user, with minimal notification of upgrade presented without demand and maximal information available to the user if requested. For example, in subsequent notes, we assert that an error standard for GIS data should evolve that could be associated with each GIS data set. This publication of associated error measures would represent a significant advance in data processing, and would bring GIS data standards up to current practice in other engineering disciplines.

Associated requirements also exist for data capture and display. Since the focus of this course is on data manipulation, we simply assume that data carry a prespecified error and concentrate on error propagation. Worboy's excellent text [Worb95] and Macguire's comprehensive text [Macg91] both furnish ample discussion of the sources and propagation of errors in GIS datasets.

1.4. GIS Software and Data Analysis Tools.

Thus far, we have discussed GIS modes of operation, including data generation, storage, modification, and retrieval. The utility of graphical interfaces for manipulation of data in terms of map coordinates has also been mentioned. In addition to map creation and printing and the formatting and printing of data analysis reports, these can be viewed as the basic capabilities of GIS.

1.4.1. Software Capabilities.

In order to provide these capabilities to users, GIS systems usually include the following software modules:

Database of spatial, attribute, and temporal data for a variety of regions, features, and modalities;
Database management system (DBMS) that organizes and attempts to optimize user and system access to, as well as creation/modification of, data stored in the GIS database;
Computer graphics software that is used to draw a map, which includes keyboard and pointing-device interfaces;
Mapping engine that links selected items from the database to map coordinates;
Analytical engines that can be applied to the database contents to answer basic spatial and temporal questions, provide sophisticated statistical analyses, and determine trends in various datasets; and
Data maintenance engines that are distinct from the DBMS and provide sophisticated support for dataset merging, combination of imagery and GIS data, and estimation of error between various GIS datasets.

1.4.2. Graphics Capabilities.

Graphics is a key component of GIS that has progressed from low-level routines that were device- and manufacturer-dependent, to general purpose device-independent software packages. Modern graphics hardware and software can be used by many different software packages (e.g., spreadsheets, word processors, drawing programs, computer games, multimedia software, and GIS). Until the 1990s, most graphic displays were two-dimensional, but 3-D software and rendering devices are now widely available.

Graphics software typically provides the capability of performing many operations (e.g., rotation, translation, scaling, reflection) on a collection of graphics primitives (e.g., point, line segment, polyline, B-spline, rectangle, circle, and ellipse). User-friendly features such as differing pen widths, patterns, colors, and shape modification extend previous graphical interfaces to be quite paper-like. This conformance to evolved manual practice is a win-win situation for the user and the software manufacturer. That is, the user is able to work within a paradigm that has evolved to meet human needs and limitations, which is familiar to him. The software manufacturer is able to provide a convenient, efficient user interface with a "look and feel" that is familiar to the user. Thus, the user is happy with the software, which can increase productivity, and the manufacturer has a loyal customer base, which could increase profitability.

Common standards for graphics packages are the Graphics Kernel System (GKS), established in 1985 and extended to three dimensions in 1988, and the programmer's Hierarchical Interactive Graphics System (PHIGS, 1988). PHIGS supported full three-dimensional capabilities from the outset, and allows the definition of hierarchies of graphics primitives. Furthermore, PHIGS provides the convenient feature of defining 3D primitives in device-independent coordinate space.

1.4.3. Hardware Customarily Employed with GIS.

GIS, having evolved in part from the CAD/CAM field, in part from cartography, and being increasingly involved with image processing, has a diverse base of hardware requirements and resources. A brief list of useful hardware devices follows:

Computational capabilities that are intensive in floating-point arithmetic are usually required for GIS, due to the use of transcendental functions and complex transformations in map projection and high-level graphics rendering that is performed in software. Although 32-bit workstations have proven adequate, as GIS resolution improves, 64-bit precision may be required to efficiently compute the multiresolution map projections and image/data compression algorithms required for GIS' commercial feasibility.
I/O devices include the usual keyboard and pointing device, as well as magnetic tape, removable magnetic disks (including magneto-optical storage), plotters, printers, and digitizers, map scanners (with attendant software), and photographic film digitizing and output devices.
Ancillary hardware that is not properly a part of a GIS but is nevertheless useful includes modems or telephone switches that facilitate high-bandwidth communications among GIS users over local-area networks (LANs) and wide-area networks (WANs). Also considered as ancillary devices, but fast becoming a part of the core GIS system, are hardware-based data compression devices. These objects replace slower and more space-consumptive image compression algorithms that are customarily implemented in software. Due to the quickly growing data resources devoted to GIS, compression is being almost universally adopted as the method of choice for storage optimization.

We next consider on-line GIS resources that provide lists of software implementations, as well as available GIS data.

1.4.4. GIS Resources on the World-Wide Web.

Here follows a list of pointers to Web pages that contain information about GIS software and mapping capabilities. This list will be updated as the course progresses.

The United States Geological Survey's GIS Page.
Edinburgh University's List of GIS Resources
GISlinx - A new website covering the GIS industry, including software, hardware, and sources for base map data. Some sections are still under construction.

The United States Geological Survey's GIS Page has numerous links to access various types of data. Here follow several key links for cartographic and map data:

"Best Sellers" -- A quick guide to the most popular USGS geospatial data products.
Product Category -- An alphabetic listing of all major categories of USGS geospatial data.
Theme -- Data organized by topic.

1.5. Map Projections - Theory and Practice.

Many of us have scanned through atlases and have seen a given map displayed in several different ways. For example, consider the polar projection, which looks like a bird's-eye view of the Earth. The Mercator projection allows most (but not all) of the globe to be mapped to a cylinder that can be "unrolled" to yield a flat map. Similarly, there are map projections that maintain the continental areas approximately in the same proportion as they are when one rotates a globe of the Earth and inspects the continents visually. An example of this projection is the equal-area projection.

In this section, we discuss map projections in terms of underlying concept, theory, and practice. Much of this discussion is based on Maling's excellent summary [Mali91] in [Macg91], to which the reader is referred for further detail.

1.5.1. Basic Concepts.

In GIS, map projections transform spatial data to facilitate coregistration with other spatial data. The results of analyses on such data can be output as cartographic documents called maps. The primary sources for GIS spatial data are the large databases of paper maps maintained by government entities. These maps, when digitized, are converted into machine readable form in one of two coordinate systems:

Terrestrial Coordinates in three dimensions, denoted in the Cartesian coordinate system by (x,y,z); and
Plane Coordinates in two dimensions (latitude and longitude), denoted by (, ).

Plane coordinates may be rectangular or polar coordinates, a raster grid, or a map projection. Prior to the emergence of high-capacity storage devices, plane coordinates were preferred to terrestrial coordinates, which are now being more extensively adopted.

The key reason for using map projections is the transformation of digitized map data into a uniform system of spatial referencing within a given GIS. This obviates preprocessing when layers of a GIS are compared, analyzed, or rendered, and is especially important for combining vector and raster data.

In this section, we briefly review concepts, theory, and practice of map projections, which are fundamental to GIS map rendering. This is not a course in map projection, but is designed to provide background so that students can understand subsequent terminology. A more detailed notational background in given in the Overview of Image Algebra

1.5.2. Fundamental Theory.

Definition. A geographic map a is a mathematical mapping from a spatial domain X (customarily a subset of Euclidean n-space) to a value set F. We write a : X -> F, which can be expressed more concisely as a

F^X.

Definition. If a F^X, then X = domain(a) and F = range(a).

Observation. In GIS practice, X R³, the set of three-dimensional terrestrial coordinates.

Definition. The graph of a F^X is denoted by:

G(a) = {(x, a(x)) : a(x) F, x X} .

Notice that this formalizes the association of a given map coordinate with its attributes in F.

Observation.We also write a G(a), where denotes equivalence. Note that p₁(G(a)) = X = domain(a) and p₂(G(a)) = F = range(a), where p_k denotes projection onto the k^th coordinate.

Map projections are based on spatial transformations, which manipulate the map domain.

Definition. Let a denote a map in F^X, and define a spatial transformation f: Y -> X. For purposes of simplicity, assume that f is a one-to-one and onto mapping.

The composition of a with f is denoted by:

b = a o f {(y, b(y)) : b(y) = a(f(y)), y Y} .

Observation. Each map projection can be defined as the spatial transformation of a given map domain. Prior to the development of GIS, such transformations were well understood (since they were the basis for cartography), but inverse transformations were often not well defined. GIS requires that a map transformation have an inverse, so one can create, query, or modify GIS data in one map projection (which may be more convenient for that type of data), then revert to the previous map projection for further manipulation.

1.5.3. Map Projection Practice.

The simplest method for comparing map data is the grid cell, which comprises a close pattern of spherical quadrilateral cells that are derived by subdividing the spatial domain of a map into one degree or one-half degree units. Grid cells are not rectangular, since their sides are formed by curved lines that describe two meridians (lines of longitude) and two parallels (lines of latitude). The meridians converge to the poles. Therefore, grid cells are similar to a trapezoid.

GIS generalizes the concept of grid cells by using map projections to modify spatial data, for the following two reasons:

The database for a large area (e.g., nation or continent) will be large and does not necessarily lend itself to a contiguous spatial model. Thus, it is reasonable to segment the map into grid cells that can be further subdivided using techniques discussed below.
In the case of large areas, convenient approximations (e.g., a flat, plane Earth model with rectangular grid cells) yield infeasibly large errors, which are manifested as spatial distortions. Thus, a nonplanar map model must be employed.

1.5.3.1. Conventions in GIS Map Projection Practice.

The GIS framework most likely to be employed in large-area surveys is the Universal Transverse Mercator (UTM) or the Lambert Conformal Conical (LCC) projection. However, since the coverage of GIS increasingly extends over multiple governmental entities, the different reference points and projections that are local standards must be reconciled. Although the need for a common map projection has been discussed extensively in the literature [Brig89,Moun91], little progress has been made in this area. For example, in the design of an environmental GIS for Antarctica, a collection of LCC and stereographic projections was used in the role of raster grids [Siev89]. This rather involved framework formed the mathematical basis for the system.

Another consideration is limiting resolution, i.e., the size of the smallest object that can be shown legibly on a map. This is usually assumed to be approximately 0.15mm (approximately 0.006 inch) [Mali91] and is often called the zero dimension (a term from earlier cartographic practice). If a particular computation affects spatial location by less than the zero dimension, then a simpler (e.g., approximate) computation may be used. In this sense, the zero dimension permits one to make assumptions about the accuracy of spatial data and its utility in various scenarios. This will be discussed in later sections of these notes and in the class.

An additional issue in GIS is that the complex shape of the geoid (Earth) can be approximated by a spheroid. Unfortunately, there is no standard spheroid, and hence map projections exhibit a certain amount of error that is a function of the accuracy with which their reference points are mapped to a given spheroidal model. A further problem is that some paper maps were made with particular spheroidal or ellipsoidal models in mind. Given these more complex (or different) models, errors accrue when digitizing paper maps to fit simpler models (e.g., planar or spherical models). For local surveys, planar models may be apropos. (Aside: This leads to the interesting implementational concept of the surface of a flatbed plotter as a datum plane for each map produced on that plotter.)

1.5.3.2. Examples of Map Projections.

The Cartesian coordinates (x,y) of a point on a map are related to latitude and longitude via the projections x = f_x(

) and y = f_y(

), which can be computed via three methods:

Analytical transformation, whereby points are located and plotted from their geographical coordinates, in an approximation to classical cartographic methods;
Direct or grid-on-grid transformation, whereby a map domain referenced to a grid is warped using, for example, a linear or affine transformation; and
Polynomial transformation, which provides a mechanism for allowing control points within a grid cell to constrain the projective geometry (which need not be linear).

The conversion from geographical to plane coordinates is the normal practice in cartography, which is called forward transformation. The inverse transform, which yields geographical coordinates from map coordinates, is a more recent development, due to the need for spatial format conversion (e.g., transformation between different map projections) in GIS. The vast majority of map projection texts only provide theory for forward transformations. In this review, we will exemplify both forward and inverse transformation using the Mercator projection.

Definition.

x = f_x(, ) = r · and y = f_y(, ) = r · ln(/4 + /2) ,

Definition. The inverse Mercator projection is arrived at by assuming a datum meridian ₀ from which longitudes are measured (e.g., the Greenwich Meridian), as follows:

= f_y^-1(y) = /2 - 2 · tan^-1(^-y/r) and = f_x^-1(x) = x/r + ₀.

%%%LEFT OFF HERE%%%

1.5.3.3. Implementational Issues in Map Projection.

Unfortunately, although the spheroidal assumption is justified in most cases, non-map information often has boundaries that are not specified in map coordinates (e.g., vegetation or land use data). Additionally, much time is spent revising files of various governmental entities, to keep place names, survey boundaries, etc., up to date. These are problems that map projections cannot necessarily address. Furthermore, as use of the Global Positioning System (GPS) becomes more widespread, elevation and spatial data must be revised, which also impacts the accuracy of map projections.

Alternatives to computationally burdensome map projections include interpolation schemes, where the detail in a small square or quadrangle on a map may be transformed to another projection by using sparsely specified control points only. Although this approach is attractive from the computational cost perspective, it can lead to large errors when terrestrial data is modelled over discontinuous surfaces (e.g., cliffs or bluffs).

1.6. GIS Map Databases.

Important Note: Do not download data until you find out if it is a priced product!

FGDC (Federal Geographic Data Clearinghouse) Metadata Search Form -- Searchable database of GIS data whose search parameters are:
1. Spatial Coverage: Entered in the interactive search form as a bounding rectangle on a visible map. Returned datasets will include those that cover, intersect, or are covered by this bounding rectangle. To operate the map, click a Western corner followed by an Eastern opposite corner on the map or type in values in the text input boxes.
2. Temporal Range: Choose a "date type" (single, range, relative) by pressing one of the radio buttons. Next select an operator and date, date range, or number of days from present with which to apply the search.
3. Keywords and Search Fields: A variety of index terms are supported that reference various types of maps, surface features, etc. This is similar to the attribute data discussed previously.
This searchable database was developed as a joint collaboration between the following agencies:
- The Federal Geographic Data Clearinghouse ( FGDC)
- The North Carolina Center for Geographic Information and Analysis ( NC CGIA)
- The Clearinghouse for Networked Information Discovery and Retrieval (CNIDR)
- A/WWW Enterprises.
- The Naval Research Lab's Master Environmental Library (MEL).
As-of Sat 14 Jun 1997, individual databases in the FDGC system included the following:
Another valuable resource features Digital Elevation Models (DEMs) from the USGS, Defense Mapping Agency, etc.
The preceding link contains digital records of terrain elevations for ground positions at regularly spaced horizontal intervals which are derived from United States Geological Survey (USGS) 7.5-minute, 15-minute, 30- by 60-minute, and 1-degree (1:250,000-scale) quadrangle map by hypsographic data and / or photogrammetric methods. Metadata are provided.
Availability: National for 7.5-minute and 15-minute quadrangles (Alaska); conterminous United States and Hawaii for 30- by 60-minute quadrangles.
Information Content: Elevation data spacing varies from 30 meters for 7.5-minute DEMs to 3 arc seconds for 1:250,000 scale maps. Complete U.S. coverage is only available for the 1:250,000-scale digital line graph (DLG) data. All DEM data are similar in logical data structure and are ordered from south to north in profiles that are ordered from west to east.
- 7.5-minute DEM data are produced in 7.5-minute units which correspond to USGS 7.5-minute topographic quadrangle map series. 7.5-minute DEM data consist of a regular array of elevations referenced horizontally on the Universal Transverse Mercator (UTM) coordinate system of the North American Datum of 1927 (NAD 27). These data are stored as profiles with 30-meter spacing along and between each profile.
- 15-minute DEM data correspond to USGS 15-minute topographic quadrangle map series in Alaska. The unit sizes in Alaska vary depending on the latitudinal location of the unit. 15-minute DEM data consist of a regular array of elevation referenced horizontally to the geographic (latitude/longitude) coordinate system of North American Datum 1927 (NAD 27). The spacing between elevations along profiles is 2 arc seconds of latitude by 3 arc seconds of longitude.
- 30-minute DEM data cover 30-minute by 30-minute areas which correspond to the east half or west half of the USGS 30- by 60-minute topographic quadrangle map series for the conterminous United States and Hawaii. Each 30-minute unit is produced and distributed as four 15- by 15-minute cells. 30-minute DEM data have the same characteristics as the 15-minute DEM data except that the spacing of elevations along and between each profile is 2 arc seconds.
- 1-degree DEM data are produced by the Defense Mapping Agency in 1-degree by 1-degree units which correspond to the east half or west half of USGS 1- by 2- degree topographic quadrangle maps series, for all the United States and its territories. 1-degree DEM data consist of a regular array of elevations referenced horizontally using the geographic (latitude/longitude) coordinate system of the World Geodetic System 1972 Datum. A few units are also available using the World Geodetic System 1984 Datum. Spacing of the elevations along and between each profile is 3 arc seconds with 1,201 elevations per profile. The only exception is DEM data in Alaska, where the spacing and number of elevations per profile varies depending on the latitudinal location of the DEM.
Information about USGS product categories for which digital data sets are available can be accessed from the following list:
- Digital Elevation Models (DEM's) -- Discussed above
- Digital Line Graphs (DLG's) -- Digital representations by points, lines and areas of planimetric information derived from 7.5- and 15-minute scale topographic quadrangle maps (large-scale); 30- by 60-minute intermediate scale maps; and 1:2 million-scale (small scale) sectional National Atlas maps.
- Digital Orthophoto Quadrangles (DOQ's) -- A digital orthophoto is a digital image of an aerial photograph in which displacements caused by the camera and the terrain have been removed. It combines the image characteristics of a photograph with the geometric qualities of a map.
  The standard digital orhthophoto produced by the USGS is a black-and-white, or color infrared, 1-meter ground resolution quarter quadrangle image.
  The accuracy and quality of USGS digital orthophotos must meet National Map Accuracy Standards at 1:12,000 scale for 3.75-minute quadrangles and at 1:24,000-scale for 7.5-minute quadrangles.
- Digital Raster Graphics (DRG's) -- A digital raster graphic (DRG) is a scanned image of a U.S. Geological Survey (USGS) standard series topographic map, including all map collar information. The image inside the map neatline is georeferenced and fit to the Universal Transverse Mercator projection. The horizontal positional accuracy and datum of the DRG matches the accuracy and datum of the source map. The map is scanned at a minimum resolution of 250 dots per inch.
- Digital Satellite Images -- Digital images of the United States are available from the Earth Resources Observation System (EROS) Data Center, including satellite Advanced Very-High Resolution Radiometer (AVHRR) useful for tracking vegetation growth on a regional basis. The EROS Data Center also serves as the central ordering agency within the Federal Government for Land Satellite (LANDSAT) Thematic Mapper (TM) and Multispectral Scanner (MSS) digital data.
- Geologic Data -- Geological data consist of earthquake, volcano, and landslide hazards research, geologic framework and process studies, global change research, marine and coastal geologic surveys, and mineral and energy resource surveys.
- Geographic Names Information System (GNIS) -- Geographic names for all known places, features, and areas in the Unites States that are identified by a proper name. Each feature is located by State, county, and geographic coordinates; and referenced to the appropriate 1:24,000-scale U.S. Geological Survey (USGS) topographic map on which it is shown.
- Hydrologic Unit Maps -- These maps include a hydrologic unit map of the United States, and hydrologic unit maps for each State. They delineate the hydrographic boundaries of major river basins and show numeric codes assigned to each river basin. The maps were prepared in a cooperative project between the United States Geological Survey (USGS) and the U.S. Water Resources Council, which was initiated in 1972. Boundaries and numeric codes are depicted for 21 regions, 222 subregions, 352 accounting units, and 2,100 cataloging units. River basins are delineated that have drainage area greater than 700 square miles. Also included on the maps are State and county codes that use the Federal Information Processing Standards (FIPS). State maps are published at a scale of 1:500,000; the U.S. map (unfortunately, out of print) is at a scale of 1:2,500,500.
  The report "Hydrologic Unit Maps", (USGS Water-Supply Paper 2294), describes the maps and contains the numeric codes for the river basins. Digital data sets for hydrologic units are available at scales of 1:2,000,000 and 1:250,000. Each is a single coverage for the conterminous United States. Attributes of the 1:2,000,000-scale version include basin names.
- Land Use and Land Cover -- Land use and land cover (LULC data are derived from thematic overlays registered to 1:250,000-scale base maps and a limited number of 1:100,000-scale base maps. Land use and land cover data provides information on urban or built up land, agricultural land, rangeland, forest land, water, wetlands, barren land, tundra, and perennial snow or ice. Associated maps display information in five data categories: (1) political units, (2) hydrologic units, (3) census county subdivisions, (4) Federal land ownership, and (5) State land ownership.
- National Uranium Resource Evaluation (NURE) Data -- Maps and reports containing geologic, geophysical, and geochemical data obtained under the Department of Energy's National Uranium Resource (NURE) Program from 1974 to 1980. The data were transferred to the United States Geological Survey (USGS) in 1984. Type of NURE data available from USGS are: Geologic maps, aerial radiometric data, aeromagnetic data, data from hydrological and stream-sediment sample analyses, geochemical data from rock sample analyses, radiometric data from borehole logging, and evaluation data for uranium resource estimates.
- National Water-Quality Assessment (NAWQA) Program -- The Nation's water resources are the basis for life and our economic vitality. These resources support a complex web of human activities and fishery and wildlife needs that depend upon clean water. Demands for good-quality water for drinking, recreation, farming, and industry are rising, and as a result, the American public is concerned about the condition and sustainability of our water resources. The American public is asking: Is it safe to swim in and drink water from our rivers or lakes? Can we eat the fish that come from them? Is our ground water polluted? Is water quality degrading with time, and if so, why? Has all the money we've spent to clean up our waters, done any good? The U.S. Geological Survey's National Water-Quality Assessment (NAWQA) Program was designed to provide information that will help answer these questions.
  NAWQA is designed to assess historical, current, and future water-quality conditions in representative river basins and aquifers nationwide. One of the primary objectives of the program is to describe relations between natural factors, human activities, and water-quality conditions and to define those factors that most affect water quality in different parts of the Nation. The linkage of water quality to environmental processes is of fundamental importance to water-resource managers, planners, and policy makers. It provides a strong and unbiased basis for better decisionmaking by those responsible for making decisions that affect our water resources, including the United States Congress, Federal, State, and local agencies, environmental groups, and industry. Information from the NAWQA Program also will be useful for guiding research, monitoring, and regulatory activities in cost effective ways.
- Upper Mississippi and Lower Missouri Data Base -- These metadata describe a database designed and built for the Scientific Assessment and Strategy Team (SAST) for a study of the Great Flood of 1993. This is a database level implementation of the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, and is here to augment more detailed dataset and file specific information.
- Water Resources Information -- Water quantity and quality data for geographic regions of the United States are available in print form and as machine-readable files.
  Many USGS studies of water resources have used a Geogaphic Information System (GIS) to produce digital geospatial data sets for a wide variety of water topics. Themes include basic hydrologic data as well as ancillary data to support hydrologic studies.

1.7. Satellite and Airborne Imagery Libraries.

Many USGS data products are available in the Spatial Data Transfer Standard (SDTS) format, a new Federal standard to ensure data compatibility, or in the older Digital Line Graph (DLG) format. Others are in more specialized formats, which are described in their metadata. Whenever you retrieve a data set, it is important to retrieve the metadata as well, since the metadata provides important information for using the data set. The Internet browser software used to view this page is likely able to print the metadata files or save them locally.

Several sites where map and imagery data may be obtained follow:

Digital Satellite Images from the Earth Resources Observation System Data Center. Types of imagery include:
1. AVHRR imagery is collected in 5 channels measuring visible, near infrared, and thermal infrared radiation. Applications include cloud temperatures, sea-surface temperature, land temperature and vegetation index. The metadata at this Web site describe the holdings from the sensors that are carried on NOAA's Polar Orbiting Environmental Satellites (POES) beginning with TIROS-N in 1978. This is a data-set level implementation of the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata.
2. The LANDSAT MSS dataset covers temporal and spatial maps between 1972 and the present (oapproximately 25 years of data).
  Datasets include:
  - LANDSATs 1 and 2 MSS (4 spectral bands) imagery is archived between January 1972 and February 1982.
  - LANDSAT 3 MSS (4 spectral bands) imagery is archived between March 1978 and March 1983. LANDSATs 4 and 5 MSS (4 spectral bands) is archived between July 1982 and present.
  Query forms are available, since the LANDSAT_MSS dataset is a very large collection of data. You can specify Geographic Coordinates and Acquisition Date(s) for your area of interest.
3. LANDSAT Thematic Mapper (TM) (7 spectral bands) images are archived between July 1982 and present. The TM data have their own Query forms that are similar to the query forms for the LANDSAT MSS data.
Product Delivery Format:

References.

{Doa97} Doaks, J. Another Introduction to GIS, New York: Bogus Press (1997).

{Doe97} Doe, J.H. An Introduction to GIS, New York: Bogus Press (1997).

This concludes our introductory discussion of GIS issues.
We next consider computational problems related to GIS features.