GIS - Computational Problems: § 1: GIS Features

Instructors: G.X. Ritter and M.S.Schmalz


The majority of GIS datasets are currently represented in vector format, which is convenient due to storage efficiency but can be difficult to manipulate analytically. To achieve tractability, vector format GIS data is often rasterized (e.g., conformed to a grid-cell representation). The processes involved in vectorization of map data as well as rasterization of vector data have indiosyncratic errors that manifest as representational error in a given GIS system.

Section Overview. In this section, for purposes of orientation, we overview the vectorization of GIS data, then discuss theoretical and practical issues associated with feature and dataset merging. It is important to note that this section does not have as its goal the development of a specific "best" technique or algorithm for performing vectorization or rasterization operations. The material is structured as follows:

In Section 2.1, we outline basic problems in dealing with vector-format GIS datasets, then discuss vectorization, rasterization, and dataset merging. Sections 2.2 and 2.3 detail the unsolved problem of determining feature type from map data, satellite imagery, etc., and resolving conflicting feature types or attributes in rasterized vector datasets or grid-cell data. Section 2.4 extends the preceding concepts to include the merging of disjoint datasets for which some coordinate information is available. Section 2.5 details a related sub-problem, namely, the coregistration of map information, imagery, and GIS data.

Goals. For the purposes of this course, goals for this section include:

  1. Learning GIS terminology and background;
  2. Exploring two dataset-related problems instantiated as merging and alignment of vector- or raster-format data;
  3. Understanding GIS feature types, attributes, and errors inherent in GIS data, with the ability to estimate output error for GIS algorithms when input error is known; and
  4. Prediction of computational cost and accuracy (e.g., the space-time-error bandwidth product) for a given algorithm.
Orientation, Terminology, and Problems. The process of meeting our goals will be more efficient if we understand some differences between GIS and the better-established discipline of cartography or map-making (which is often confused with GIS). GIS terms can often be confused with corresponding terminology and notation in cartography. Hence, we offer the following definitions: Key Problems of Interest in this Section include the feature conflict resolution problem and the dataset merging problem. Exact or approximate solutions to these problems discussed in this section are intended to support further research in the higher-level problems of heterogeneous dataset merging and alignment.

As an aside, note that in the extension of the corresponding points problem, if the disparity map d is a constant image, then coregistration can be a trivial process, provided that significant data does not lie outside the common domain W. Otherwise, we say that d represents nonuniform disparity. In cases where the magnitude or direction component of the disparity d is a nonlinear function of x, we say that there exists anisomorphic disparity. For example, consider the simple cases of optical distortions such as barrel distortion or pincushion distortion, which nonlinearly warp an input image.

We next consider the processes of map segmentation into vector data sets, as well as rasterization and merging of vector data sets.

2.1. Polygonalization and Merging of Vector Data Sets.

Given a scene as shown in Figure 2.1, a corresponding map a on a two-dimensional domain X R2 contains |X| points x X that may be expressed in terms of the map coordinates (x,y) X. For example, one could have the customary coordinates of latitude and longitude, or survey measure from a given benchmark.


Figure 2.1. Schematic diagram of (a) natural scene and (b) corresponding 2-D map.

A given domain point x X customarily has measurement error associated with locating an object on the map a. One way to express such error is via an abstraction called Circular Error Probable (CEP), as shown in Figure 2.2. For example, given North-South error NS (sometimes called Northing error) or East-West error EW (sometimes called Easting error), we can approximate CEP, denoted by CEP, as follows:

CEP = ((NS2 + EW2)/2)1/2 ,

which assumes RMS error accumulation that may or may not hold in practice.


Figure 2.2. Schematic diagram of circular error probable.

The concept of statistical characterization of spatial error will be important to our subsequent analyses of GIS data and algorithm accuracy. In practice, spatial error propagates through the various phases of GIS data acquisition and processing algorithms, as shown in Figure 2.3. Hence, if we have analytical expressions for the constituent processes shown in Figure 2.3, we can apply well-known error analysis procedures to obtain error functions of these processes, from which estimations of GIS dataset error can be made.


Figure 2.3. Schematic diagram of error propagation in a GIS data acquisition, preprocessing, and storage system.

In order to better understand the mathematics of algorithms involved in GIS data processing and archival, we present the following discussion of vectorization and rasterization processes.

2.1.1. Overview of Vectorization and Rasterization

Map data can be converted to a more compact format that describes (a) the boundary coordinates of regions within the map, and (b) the contents of each region. By coercing each bounded region to have one and only one feature type, one can construct a map from a set of boundary coordinates to a set of feature types, which we call the vectorization map.

High-Level Vectorization Algorithm. In general, vectorization proceeds according to the following steps:

Rasterizing might initially appear to be the inverse of vectorization, since it might appear that one merely converts between vector and map data, or vice versa. However, this is an oversimplification that yields little information about the subprocesses and errors that are idiosyncratic to each of these two important and markedly distinct processes.

In particular, we note that vectorization is concerned primarily with the spatial definition of map neighborhoods or regions in terms of a set of vector components (e.g., line segments or arcs) that delimit a region having one and only one feature type or attribute set. In contrast, rasterization assumes a given spatial segmentation scheme (e.g., an image pixel grid) that is not necessarily data-dependent, then associates one or more feature types or attribute sets per grid cell. A key problem in vectorization is accurately portraying the region boundary with a limited set of vector primitives, whereas rasterization can be seen as primarily emphasizing the resolution of feature type or attribute conflicts within a fixed, prespecified spatial framework.

Discussion. Vectorization is acustomary procedure that has many different implementations. Errors inherent in vectorization include:

By choosing curvilinear vector segments or by decreasing the error bound, it may be possible to reduce vector alignment and quantization errors, albeit with increased computational cost. However, quantization errors, which derive from Nyquists's sampling theorm, are generally considered irreducible unless they can be averaged to zero over the course of a computation. Although such errors still remain as local features of the GIS dataset, they are considered (erroneously) to be "averaged out" over a larger area of the dataset domain.

High-Level Rasterization Algorithm. The following steps are generally employed in rasterization:

Discussion. Rasterization tends to have similar errors to those exhibited by vectorization, which include: Note that quantization error derives from the Nyquist criterion (sampling theory) and decreases as q approaches zero. Additionally, when two conflicting feature types are present within a given grid cell, feature type or attribute resolution errors are unavoidable.

Dataset Merging Problem. The dataset merging problem in GIS can be stated as follows:

Inherent in the process of combining datasets is feature conflict resolution. Assume, for purposes of simplicity, that each GIS dataset is registered on a common, two-dimensional spatial domain X. Then, PDMP can be reduced to the following question: Inherent in this specification of PDMP is the assumption that GIS data can be coregistered on a common spatial domain. Raster format data (e.g., an MxN-pixel rectangularly-structured domain) is the domain most commonly employed, which is also called grid cell format. Hence, if a dataset is in raster format, it may have to be resampled to produce a raster of different resolution. If the data is in vector format, then rasterization may be required. Thus, we next examine vector-to-raster conversion in detail.

2.1.2. Rasterization of Vector-format GIS Data.

We have previously shown that the rasterization problem can be reduced to the problem of resolving which feature attribute a grid cell should be labelled. Additionally, we discussed area-based methods for feature label assignment. Three pertinent cases will be discussed in this section, which are listed in increasing order of difficulty, as follows:

Fortunately, the area of each polygonal partition of a grid cell can be accurately computed in every case using a simple algorithm derived from the plane geometric formula for the area of a trapezoid.

Recall. Given a trapezoid T having parallel sides of length a and b and height h, the area of T is given by

< T > = h · (a + b)/2 .

Observation. If a given polygon A in a vectorized map can be decomposed into trapezoids A1, A2, ..., Ai, ..., An, each of which has parallel sides of length ai and bi, with height hi, then the area of A would be computed as:

< A > = < Ai > = hi · (ai + bi)/2 .

Applying this observation to Case I, we see that there are two sub-cases: In Case II, the grid cell is bisected by a sequence of line segments whose vertices are given by the sequence S = {x1, x2,..., xi, ..., xn} X. Each consecutive pair of vertices xi,xi+1 defines the slant side of a trapezoid, as shown in Figure ___a. Summing the areas of the component trapezoids is a simple matter, unless the sequence reverses direction, as shown in Figure ___b. In this case, we must employ a more generalized algorithm for specifying the area of an irregular polygon, as follows.

Algorithm. Given a map domain X R2, let the image a ,

%%%%LEFT OFF HERE%%%%

2.2. Determining Feature Type from GIS/Image Data.

2.3. Resolving GIS Attribute Conflicts.

2.4. Merging Disjoint Heterogeneous Regions.

2.5. Coregistration of GIS Features and Neighborhoods.


References.

{Doa97} Doaks, J. Another Introduction to GIS, New York: Bogus Press (1997).

{Doe97} Doe, J.H. An Introduction to GIS, New York: Bogus Press (1997).


This concludes our introductory discussion of GIS issues.
We next consider computational problems related to GIS features.