GIS - Computational Problems: § 4: Advanced Problems

Instructors: G.X. Ritter and M.S.Schmalz


The majority of GIS datasets are currently represented in vector format, and have an inherent representational error that arises from sensor errors as well as from discretization or polygonalization processes that we discussed in detail in Section 2. GIS algorithms propagate this dataset error through various stages of computation, yielding a GIS product or result that often has unexpected errors. Such errors influence coregistration (e.g., map overlay) and tend to corrupt the derivation of range or elevation data from stereo imagery. This further impacts the integration of surface models (e.g., spline models based on elevation data) with GIS datasets.

Section Overview. In this section, we discuss errors in GIS datasets as well as error and complexity measures, then investigate the effect of such errors on elevation and surface modelling with GIS. The section is structured as follows:

In Section 4.1, we present an in-depth discussion of errors in spatial datasets, together with measures for quantifying such errors and various types of error analysis. Section 4.2 details techniques for error analysis as well as the estimation of complexity in GIS algorithms, both of which are required for the design and implementation of accurate, efficient software. The difficult problem of automatic analysis of stereophotogrammetric data is presented in Section 4.3, and an extension of this problem to multi-modal GIS (e.g., derived and measured elevation data) is discussed in Section 4.4.

Goals. For the purposes of the current course, goals for this section include:

  1. Learning error/complexity analysis terminology and background;
  2. Exploring two error-related problems instantiated as stereophotogrammetry and data integration of vector- and image-format elevation data;
  3. Understanding the underlying issues of error detection, measurement, modelling, and management in GIS; and
  4. Prediction of computational cost and accuracy (e.g., the space-time-error bandwidth product) for a given GIS algorithm.

We begin our discussion with a distinction between cartographic error and GIS error.

4.1. Estimating Error in GIS Datasets

Note that map accuracy is a relatively minor issue in cartography. Users of maps are rarely aware of this problem, due to their familiarity with map notation and its underlying assumptions. For example, if a map shows that a drainage ditch runs parallel to a road, one assumes from world knowledge that the ditch is located close to the road. However, on the map, for purposes of cartographic license, the map may appear to be offset laterally from the road by as much as the width of the road.

4.1.1. Background.

In contrast to cartographic practice, GIS has evolved under the following circumstances:

  1. GIS precision is limited only by computational hardware, for example, ALU register size, cost of I/O and memory, or speed of storage devices.

  2. All spatial data have limited accuracy which may be expressed in terms of positional error, abstraction or generalization error, measurement error, etc. Consider the following examples:

  3. Precision of GIS processing exceeds data accuracy -- GIS processing is performed at high (16- to 32-bit) arithmetic precision, but GIS data have much lower precision (e.g., four to ten parts per 10,000).

  4. In conventional map analysis, precision is usually adapted to accuracy

  5. The ability to change scale and combine data at different scales implies that GIS precision is not necessarily adapted to accuracy.

  6. The accuracy of complex spatial objects is not well understood. The accuracy of points, lines, and simple neighborhood areas (e.g., circle or rectangle) has been analyzed extensively. However, methods of abstracting error as a function of lateral position is not clear.

  7. The goal of GIS error analysis is a measure of uncertainty associated with every GIS product (e.g., datasets, maps, and analyses).

Hence, the key issues in this section are measurement, estimation, and prediction of GIS error. The following observations pertain:

4.1.2. Current Approaches to GIS Error Modelling.

The following discussion is adapted from Veregin (1994).

The error modelling process can be decomposed into the following steps or levels:

At Level 1 (error isolation), the following types of error are found:

In this course, we will concentrate upon positional and attribute error primarily, and thus examine Levels 1-5 of the preceding hierarchy for error sources.

Level 1 (Error isolation) sources of error are classified by type as shown in the following examples:

Level 2 (Error detection and measurement) focuses on methods of assessing accuracy levels in spatial data, for example:

Level 3 (Error propagation modelling) focuses on error propagation and error production, which are distinguished in the following examples:

Level 4 (Error management) focuses on coping with error and making specifiably accurate decisions in the presence of error. In GIS, this can involve:

Level 5 (Strategies for error reduction) includes but is not limited to the following methods:

4.1.3. Examples of Key GIS Errors.

The majority of GIS errors can be exemplified by the following three case studies. In Section 4.1.3.1, we discuss polygon fragmentation in map overlay. Section 4.1.3.2 contains an overview of thematic and attribute errors, and Section 4.1.3.3 summarizes category-based error.

4.1.3.1. Case Study: Polygon Fragmentation Error.


Figure 4.1.2. Two types of vertex interleaving in digitized datasets.

4.1.3.2. Case Study: Thematic and Attribute Error.

Attribute error requires a different modelling approach, since it may not be rigorously mapped to an indexing set. For purposes of convenience, we assume raster data.

4.1.3.3. Case Study: Errors in Categorical Data.

Attributes that are classified taxonomically (i.e., categorized) may be difficult to map to an index set in a rigorous manner. Hence, one may want to deal with misclassification and erroneous assignment in a probabilistic fashion.

This concludes our brief discussion of category-based errors. We next consider the error prediction and modelling process, which includes sensitivity analysis.

4.2. Prediction of Error and Complexity in GIS Algorithms

In order to employ statistical methods in analysis and prediction of GIS algorithm error, confidence limits are required for each dataset. Such measures can be propagated through error models, which can be designed to test algorithm sensitivity to a variety of input perturbations. In Section 4.2.1, we discuss the generation and interpretation of suitability maps, which can portray the utility of a dataset in a given GIS processing scenario. Section 4.2.2 overviews the process of GIS sensitivity analysis, and Sections 4.2.3 and 4.2.4 discuss the use of specialized software to determine error bounds on GIS algorithms. The analysis of complexity in GIS algorithms is summarized in Section 4.2.5.

4.2.1. Suitability Map Generation

The following high-level model describes the process of suitability map generation from primary maps that describe observed physical reality (e.g., ground truth).


Figure 4.2.1. Underlying model of suitability analysis applied to confidence limits.

Definition. A primary map is a map of an existing geographic entity of interest (e.g., geologic or vegetation map).

Definition. A rank map has attributes that are ordinal-, interval-, or ratio-valued and pertain to rankings that represent mathematical relations between attributes of more than one map. For example, a soil map could be ranked to determine potential for landslides or rate of water percolation.

Definition. Geographic suitability analysis applies a given operation or transformation to the attributes of one or more maps, for example, a rank map or an overlay.

Notation. Given n primary maps that are overlaid, let salient variables be defined as follows:

Definition. The result of a suitability analysis restricted to the p-th polygon of the resultant suitability map has an attribute rp that is a function of the weight vector w = (w1, w2, ... , wn) and the attribute vector ap = (ap1, ap2, ... ,apn)', as follows:

rp = f(w,ap) ,

which yields the resultant vector r = (r1, r2, ... , rP(n)).

Observation. If ap is replaced by point or line attributes (e.g., raster data is employed), then the preceding formulation attains full generality, including most commonly known GIS suitability analyses. In particular, the weighted intersection overlay is represented by:

rp = f(w,ap) = wi · ap,i , p = 1,2,...,P(n) .

Observation. The multidimensional scaling approach derives rp from the primary map attributes and a given attribute ci, where i = 1..n, as follows:

rp = f(w,ap) = ( wi · ap,i)1/2 , p = 1,2,...,P(n) .

Note that the scalar ci is chosen for each of the primary or rank maps. Hence, c = (c1, c2, ..., cn).

Given the preceding theory, we now address the problem of sensitivity analysis of GIS algorithms.

4.2.2. GIS Sensitivity Analysis

The difference between error propagation analysis (or error analysis) and sensitivity analysis is as follows:

Definition. Confidence limits for attribute errors indicate the range of validity in an output attribute value given the error ranges in the input attribute values (e.g., in the primary maps).

Algorithm. The process of suitability map generation involves the following steps:

Observation. One can represent the attributes of the suitability map as

rp = wi · ap,i = wi · Ti(bp,i) , p = 1,2,...,P(n) .

to obtain overlay suitability and

rp = ( wi · (ap,i - ci)2 )1/2 = ( wi · (Ti(bp,i) - ci)2 )1/2, p = 1,2,...,P(n) ,

for target suitability, where ci denotes the most or least preferred value of the i-th rank map.

The matrix product provides a concise, convenient expression for the preceding two equations. For example, consider the following equation:

Likewise, given the ideal attributes I1 through In, if t = (r12, r22, ... , rP(n)2), then the target attributes are given by:

We next consider measures of geographic sensitivity. Given an expression for Tn, the sensitivity of individual polygons can be determined. At least five sensitivity measures for entire maps have been defined by Lodwick [Lod94], as follows:

  1. Attribute Sensitivity Measures (ASMs) describe the net magnitude of changes in attribute values from their unperturbed values.

  2. Position Sensitivity Measures (PSMs) describe how many of the attributes change with respect to their rank order from their unperturbed (original or source) ranking.

  3. Map Removal Sensitivity Measures (MRSMs) describe how the sensitivity associated with removing a set of maps from a suitability analysis. This measure derives from perturbation of the weight vector w that is equal in magnitude byt opposite in sign to its value in the unperturbed suitability analysis.

  4. Polygon Sensitivity Measures (PoSMs) determine which polygons are more sensitive to perturbations.

  5. Areal Sensitivity Measures (ArSMs) compute the total area over which attribute changes occur.

We examine each type of sensitivity measure, as follows.

1. Attribute Sensitivity Measures (ASMs):

2. Position Sensitivity Measures (PSMs): 3. Map Removal Sensitivity Measures (MRSMs):

4. Polygon Sensitivity Measures (PoSMs):

5. Area Sensitivity Measures (ArSMs): Algorithm. The following procedure outlines the methodology employed in computing geographic sensitivity analyses. Given a GIS that can a) perform intersection overlay and b) access individual attributes of n primary maps, perform the following steps:

4.2.3. Confidence Limits Associated with Geographic Sensitivity.

LEFT OFF HERE (middle of p.28 in notes)

4.2.3. Theory of Error Bounds in Discrete Algorithms

4.2.4. Software for Determining Error Bounds in Algorithms

4.2.5. Overview of Complexity Analysis for GIS Algorithms

4.3. Determining Elevation from Stereo Images in GIS Data

4.4. Integrating Surface Models with Elevation Data


References.

[Bib77] Bibby, J. "The general linear model: A cautionary tale", in Analysis of Survey Data 2:35-80, Eds. C. O'Muircheartaigh and C. Payne, New York: John Wiley (1977).

[Bis75] Bishop, Y., S. Fienberg, and P. Holland. Discrete Multivariate Analysis: Theory and Practice, Boston, MA: MIT Press (1975).

[Chr94] Chrisman, N.R. "Modelling error in overlaid categorical maps", in Accuracy of Spatial Databases, Eds. M. Goodchild and S. Gopal, London: Taylor and Francis, Second Printing (1994).

[Goo78] Goodchild, M.F. "Statistical aspects of the polygon overlay problem", Harvard Papers on Geographic Information Systems, Volume 6, Reading, MA: Addison-Wesley (1978).

[McA71] McAlpine, J.R. and B.G. Cook. "Data reliability from map overlay", in Proceedings of the 43rd Congress of the Australian and New Zealand Association for the Advancement of Science (1971).

[Ver94] Veregin, H. "Error modeling for the map overlay operation", in Accuracy of Spatial Databases, Eds. M. Goodchild and S. Gopal, London: Taylor and Francis, Second Printing (1994).


This concludes our introductory discussion of GIS issues.
We next consider computational problems related to GIS features.