SECTION 1
WHAT IS SPATIAL ANALYSIS?
Section 1 - What is spatial analysis? - Basic GIS concepts for spatial analysis - GIS functionality - Integrating GIS and spatial analysis - Issues of error and uncertainty:
- Definition of spatial analysis, major types and areas for application.
- How should an analyst view a spatial database? Objects, layers, relationships, attributes, object pairs, data models.
- How to organize the functions of a GIS into a coherent scheme.
- Levels of integration of GIS and spatial analysis - loose and tight coupling, and full integration. Scripts and macros, lineage and analytical toolboxes.
- The uncertainty problem - why is it such an issue in spatial analysis? What can we do now about data quality?
What is spatial analysis?
A set of techniques for analyzing spatial data
- used to gain insight as well as to test models
- ranging from inductive to deductive
-
- finding new theories as well as testing old ones
- can be highly technical, mathematical
-
- can also be very simple and intuitive
Definitions
"A set of techniques whose results are dependent on the locations of the objects being analyzed"
- move the objects, and the results change
-
- e.g. move the people, and the US Center of Population moves
- e.g. move the people, and average income does not change
- most statistical techniques are invariant under changes of location
-
- compare the techniques in SAS, SPSS, Systat etc.
-
"A set of techniques requiring access both to the locations of objects and also to their attributes"
- requires methods for describing locations (i.e. a GIS)
- some techniques do not look at attributes
- mapping is a form of spatial analysis?
-
Is spatial analysis the ultimate objective of GIS?
Some books on spatial analysis:
- Anselin L (1988) Spatial Econometrics: Methods and Models. Kluwer
- Bailey T C, Gatrell A C (1995) Interactive Spatial Data Analysis. Harlow: Longman Scientific & Technical
- Berry B J L, Marble D F (1968) Spatial Analysis: A Reader in Statistical Geography. Prentice-Hall
- Boots B N, Getis A (1988) Point Pattern Analysis. Sage
- Burrough P A, McDonnell R A (1998) Principles of Geographical Information Systems. New York: Oxford University Press
- Cliff A D, Ord J K (1973) Spatial Autocorrelation. Pion
- Cliff A D, Ord J K (1981) Spatial Processes: Models and Applications. Pion
- Fischer M, Scholten H J, Unwin D J, editors (1996) Spatial Analytical Perspectives on GIS. London: Taylor & Francis
- Fotheringham A S, O'Kelly M E (1989) Spatial Interaction Models: Formulations and Applications. Kluwer
- Fotheringham A S, Rogerson P A (1994) Spatial Analysis and GIS. Taylor and Francis
- Fotheringham A S, Wegener M (2000) Spatial Models and GIS: New Potential and New Models. London: Taylor and Francis
- Fotheringham A S, Brundson C, Charlton M (2000) Quantitative Geography: Perspectives on Spatial Data Analysis. London: SAGE
- Getis A, Boots B N (1978) Models of Spatial Processes: An Approach to the Study of Point, Line and Area Patterns. Cambridge University Press
- Ghosh A, Imgene C A (1991) Spatial Analysis in Marketing: Theory, Methods, and Applications. JAI Press
- Ghosh A, Rushton G (1987) Spatial Analysis and Location-Allocation Models. Van Nostrand Reinhold
- Goodchild M F (1986) Spatial Autocorrelation. CATMOG 47, GeoBooks
- Griffith D A (1987) Spatial Autocorrelation: A Primer. Association of American Geographers
- Griffith D A (1988) Advanced Spatial Statistics. Special Topics in the Exploration of Quantitative Spatial Data Series. Kluwer
- Haggett P, Chorley R J (1970) Network Analysis in Geography. St Martin's Press
- Haggett P, Cliff A D, Frey A (1977) Locational Methods. Wiley
- Haggett P, Cliff A D, Frey A (1978) Locational Models. Wiley
- Haining R P (1990) Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press
- Harries K (1999) Mapping Crime: Principle and Practice. Washington, DC: Crime Mapping Research Center, Department of Justice
- Haynes K E, Fotheringham A S (1984) Gravity and Spatial Interaction Models. Sage
- Hodder I, Orton C (1979) Spatial Analysis in Archaeology. Cambridge: Cambridge University Press
- Leung Y (1988) Spatial Analysis and Planning under Imprecision. Amsterdam: North Holland
- Longley P A, Batty M, editors (1996) Spatial Analysis: Modelling in a GIS Environment. Cambridge: GeoInformation International
- Mitchell, A (1999) The ESRI Guide to GIS Analysis, Volume 1: Geographic Patterns and Relationships. ESRI Press
- Odland J (1988) Spatial Autocorrelation. Sage
- Raskin R G (1994) Spatial Analysis on the Sphere: A Review. Santa Barbara, CA: National Center for Geographic Information and Analysis
- Ripley B D (1981) Spatial Statistics. Wiley
- Ripley B D (1988) Statistical Inference for Spatial Processes. Cambridge University Press
- Taylor P J (1977) Quantitative Methods in Geography: An Introduction to Spatial Analysis. Houghton Mifflin
- Unwin D (1981) Introductory Spatial Analysis. Methuen
- Upton G J G, Fingleton B (1985) Spatial Data Analysis by Example. Wiley
Geographic Information Systems and Science
Paul Longley, Mike Goodchild, David Maguire, and David Rhind
Wiley, 2001
Some background slides:
Landsat image of New York area
Indianapolis database
Snow map of Soho, 1854
the pump
Openshaw GAM map of NE England
Atlantic Monthly mystery map
Northridge earthquake epicenters
Environmental justice in LA
World map
England and Wales demography
South Wales demography
Vandenberg service station
Service station subsurface
Service station plume
How does an analyst/modeler/decision-maker work with a GIS?
What tools exist for helping/conceptualizing/problem-solving?
Assumption: these (analysis, modeling, decision-making) are the primary purposes of GIS technology.
The cost of input to a GIS is high, and can only be justified by the benefits of analysis/modeling/decision-making performed with the data.
- 60 polygons per hour = $1 per polygon
- estimates as high as $40 per polygon
- 500,000 polygon database costs $500,000 to create using the low estimate
- $20m using the high estimate
What types of analysis can justify these costs?
- Query (if it is faster than manual lookup)
-
- very repetitive
- highly trained user
- Analyses which are simple in nature but difficult to execute manually
-
- overlay (topological)
- map measurement, particularly area
- buffer zone generation
- Analyses which can take advantage of GIS capabilities for data integration
-
- Browsing/plotting independently of map boundaries and with zoom/scale-change
-
- seamless database
- need for automatic generalization
- editing
- Complex modeling/analysis (based on the above and extensions)
The list of possibilities is endless
- List of generic GIS functions has 75 entries
- ESRI's ARC/INFO has over 1000 commands/functions
-
How can we organize/conceptualize the possibilities?
- A taxonomy/classification of GIS functions
- A customized view of a spatial database designed for the needs of the analyst/modeler
- A set of tools to support analysis and database manipulation
- Associated tools for defining needs in the analysis/modeling area, and testing systems against those needs
- Methods for dealing with problems associated with analysis/modeling of spatial databases, particularly error/inaccurac
A geographical data model consists of the set of entities and relationships used to create a represention of the geographical world. The choices made when the world is modeled determine how the database is structured, and what kinds of analysis can be done with it. These choices occur when the data are captured in the field, recorded, mapped, digitized, and processed.
There are two distinct ways of conceiving of the geographical world.
In the field view, the world is conceived as a finite set of variables, each having a single value at every point on the Earth's surface (or every point in a three-dimensional space; or a four-dimensional space if time is included).
Examples of fields: elevation, temperature, soil type, vegetation cover type, land ownership
Some field-like phenomena: elevation, spectral response
To be represented digitally, a field must be constructed out of primitive one, two, three, or four-dimensional objects. There are six ways of representing fields in common use in GIS:
raster (a rectangular array of homogeneous cells)
grid (a rectangular array of sample points)
irregular points
digitized contours
polygons
TIN
Other methods can be found in environmental modeling, but not commonly in GIS.
finite element methods
splines
The field view underlies the following ESRI implementation models:
coverage
TIN
grid
but not shapefiles
in the Arc8 Geodatabase the distinction can be implemented in object behaviors
In the discrete object view an otherwise empty space is littered with objects, each of which has a series of attributes. Any point in space (two, three, or four dimensional) can lie in any number of discrete objects, including zero, and objects can therefore overlap, and need not exhaust the space.
objects can be counted
how many mountains are there in Scotland?
what's a mountain?
objects can be manipulated
they maintain integrity as they move
objects are homogeneous
the whole thing is the object
parts can inherit properties from the whole
Field and discrete object views can be implemented in either raster or vector forms
compare manipulation of shapefiles (objects) and coverages (fields)
the distinction concerns how the world is conceived, and the rules governing object behavior
a field can be represented as raster cells, points (e.g., spot heights), triangles (TIN), lines (contours), or areas (land ownership)
in many of these cases the primitive elements are not real (cannot be located on the ground), but are artifacts of the representation
If we ignore the field/discrete object distinction we may easily apply meaningless forms of analysis
buffer makes sense only for discrete objects
interpolation makes sense only for fields
Attributes can be of several types:
numeric
alphanumeric
quantitative
qualitative
nominal
ordinal
interval/ratio
cyclic
Spatial objects are distinguished by their dimensions or topological properties:
points (0-cells)
lines (1-cells)
areas (2-cells)
volumes (3-cells)
A class of objects is a set with the same topological properties (e.g. all points) and with the same set of attributes (e.g. a set of wells or quarter sections or roads). In the Arc8 Geodatabase a class also has the same behaviors, and may inherit behaviors from other classes. A class is associated with an attribute table.
Geodatabase introduces a consistent set of terms for primitive geometric objects
When a class represents a field, certain rules apply to the component objects. The objects belonging to one class of area or volume objects will fill the area and will not overlap (they are space-exhausting, they partition or tesselate the space, they are planar enforced).
the layer provides one value at every point (recall the definition of a field)
- e.g. soil type
- e.g. elevation
- e.g. zoning
Slide: Planar enforcement
Spatial objects are abstractions of reality. Some objects are well-defined (e.g. road, bridge) but others are not. Objects representing a discrete entity view tend to be well-defined; objects representing a field are not.
- A TIN or DEM is an approximation to a topographic surface, with an accuracy which is usually undetermined. Even if accuracy is known at the sampled points, it is unknown between them.
- We assume that all of the points within an area object have the attributes ascribed to the object. In reality the area inside the object is not homogeneous, and the boundaries are zones of transition rather than sharp discontinuities (e.g. soil maps, climatic zones, geological maps).
-
A topographic surface can be represented as either a TIN or a DEM.
Slides: Elevation model options
digital elevation model (raster)
digitized contours
triangular mesh
TIN
Advantages of TIN:
- sampling intensity can adapt to local variability
- many landforms are approximated well by triangular mosaics
- triangles can be rendered quickly by graphics processors
Advantages of DEM:
- uniform sampling intensity is suited to automatic data collection via e.g. analytical stereoplotter
- many applications require uniform-sized spatial objects.
A spatial database consists of a number of classes of spatial objects with associated attribute tables.
The methods used to store the attribute and locational information about the objects are not of immediate concern to the analyst/modeler.
- In fact this object/attribute view of the database may have little in common with the actual data structures/models used by the system designer.
-
A database encodes and represents the complex relationships which exist between objects.
- spatial relationships
- functional relationships
- A GIS must be capable of computing these relationships through such geometrical operations as intersection.
-
Spatial relationships include:
- Relationships between objects of different classes
- Relationships between objects of the same class
The potential set of relationships within a complex spatial database is enormous. No system can afford to compute and store all of them in the database.
A cartographic data structure stores no spatial relationships among objects.
- Since it must compute any relationship as and when needed it is inefficient for complex spatial analyses.
-
A topological data structure stores certain spatial relationships among objects. Common stored relationships are:
- ID of left and right polygons stored as attributes of shared boundaries (requires planar enforcement, so is associated with representation of a field).
- ID of incident links stored as attributes of nodes in line networks
UML relationship types
association
a functional linkage between objects in different classes
aggregation and composition
linkage between an object and its component objects
type inheritance
classes inherit properties from more general classes
Relations between objects
An object pair is a combination of objects of the same or different types/classes which may have its own attributes.
- e.g. the hydrologic relationship between a spring and a sink may have attributes (direction, volume of flow, flow through time) but may not exist as a spatial object itself.
The ability to generate object pairs, give them attributes and include them in analysis is an important component of a full GIS.
giving attributes to associations
Examples of object pairs:
- Matrix of distances between pairs of objects
- Traffic flows between origin/destination pairs
Object pairs in ESRI products
turntable (link-link pairs)
distance matrix (first object, second object, distance)
association class in UML
attributed relationship class in Geodatabase
Visio example
Example: Data Model for Traffic Routing
What are the essential components of a data model for route planning in a complex street network?
-
1. Links - attributes: length, street name, traffic count, terminal nodes. A street can be represented by a single link with attributes which include or by a pair of links with associated directions - one of the pair being omitted in the case of one-way streets.
2. Nodes or intersections - attributes: incident links, presence of traffic light.
- Turn prohibitions - are attributes not of nodes or links but of link/link object pairs ("turntable")
- Stop signs - are attributes of link/node object pairs.
-
Visio example
Data modeling examples
1. Design a database to capture and analyze data on recreational fishing in the Scottish Highlands, to support decision-making by the tourist industry and regulatory agencies. The database should be able to represent the following:
- locations of fishing (rivers, lakes)
- locations of accommodation (hotels, guest houses)
- preferences and rights (fishing locations owned by hotels, locations accessible to hotels)
2. Design a database to support analysis and modeling of shoreline erosion on the Great Lakes. It is necessary to represent conditions and processes transverse to the shoreline in much more detail than variation parallel to the shoreline.
3. Design a database to support water resource analysis and planning for complex hydrographic networks that include streams, rivers, lakes and reservoirs.
GEOGRAPHIC INFORMATION SYSTEM FUNCTION DESCRIPTIONS
A. BASIC SYSTEM CAPABILITIES
A1 Digitizing (di)
Digitizing is the process of converting point and line data from source documents to a machine-readable format.
A2 Edgematching (ed)
Edgematching is the process of joining lines and polygons across map boundaries in creation of a "seamless" database.� The join should be topological as well as graphic, that is, a polygon so joined should become a single polygon in the data base, a line so joined should become a single line segment.
A3 Polygonization (po)
Polygonizing is the process of connecting together arcs ("spaghetti") to form polygons.
A4 Labelling (la)
This process transfers labels describing the contents (attributes) of polygons, and the characteristics of lines and points, to the digital system.� This input of labels must not be confused with the process of symbolizing and labelling output described below.
A5 Reformatting digital data for input from other systems (rf)
Data previously digitized are made accessible through an interface or converted by software to the system format, and made to be topologically useful as well as graphically compatible.
A6 Reformatting for output to other systems (ro)
This function is the inverse of the previous one. Internal data is reformatted to meet the requirements of other systems or standards.
A7 Data base creation and management (db)
Data is typically digitized from map-sheets, and may be edgematched. The creation of a true "seamless" database requires the establishment of a map sheet directory, and may include tiling to partition the database.
A8 Raster/vector conversion (rv)
The ability to convert data between vector and raster forms with grid cell size, position and orientation selected by the user.
A9 Edit and display on input (ei)
This function allows continuous display and editing of input data, usually in conjunction with digitizing.
A10 Edit and display on output (eo)
The ability to preview and edit displays before creation of hard copy maps.
A11 Symbolizing (sy)
To create high quality output from a GIS, it is necessary to be able to generate a wide variety of symbols to replace the primitive point, line and area objects stored in the database.
A12 Plotting (pl)
Creation of hard copy map output.
A13 Updating (up)
Updating of the digital data base with new points, lines, polygons and attributes.
A14 Browsing (br)
Browse is used to search the data base to answer simple locational queries, and includes pan and zoom.
B. DATA MANIPULATION AND ANALYSIS FUNCTIONS
B1 Create lists and reports (cl)
This is the ability to create lists and reports on objects and their attributes in user-defined formats, and to include totals and subtotals.
B2 Reclassify attributes (ra)
Reclassification is the change in value of a set of existing attributes based on a set of user specified rules.
B3 Dissolve lines and merge attributes (dm)
Boundaries between adjacent polygons with identical attributes are dissolved to form larger polygons.
B4 Line thinning and weeding (lt)
This process is used to reduce the number of points defining a line or set of lines to a user defined tolerance.
B5 Line smoothing (ls)
Automatically smooth lines to a user-defined tolerance, creating a new set of points (compare B4).
B6 Complex generalization (cg)
Generalization which may require change in the type of an object, or relocation in response to cartographic rules.
B7 Windowing (wi)
The ability to clip features in the database to some defined polygon.
B8 Centroid calculation and sequential numbering (cn)
Calculate a contained, representative point in a polygon and assign a unique number to the new object.
B9 Spot heights (sh)
Given a digital elevation model, interpolate the height at any point.
B10 Heights along streams (hs)
Given a digital elevation model and a hydrology net, interpolate points along streams at fixed increments of height.
B11 Contours (isolines) (ci)
Given a set of regularly or irregularly spaced point values, interpolate contours at user-specified intervals.
B12 Elevation polygons (ep)
Given a digital elevation model, interpolate contours of height at user-specified intervals.
B13 Watershed boundaries (wb)
Given a digital elevation model and a hydrology net, interpolate the position of the watershed between basins.
B14 Scale change (sc)
Perform the operations associated with change of scale, which may include line thinning and generalization.
B15 Rubber sheet stretching (rs)
The ability to stretch one map image to fit over another, given common points of known locations.
B16 Distortion elimination (de)
The ability to remove various types of systematic distortion generated by different input methods.
B17 Projection change (pc)
The ability to transform maps from one map projection to another.
B18 Generate points (gp)
The ability to generate points and insert them in the database.
B19 Generate lines (gl)
The ability to generate lines and insert them in the database.
B20 Generate polygons (ga)
The ability to generate polygons and insert them in the database.
B21 Generate circles (gc)
The ability to generate circles defined by center point and radius.
B22 Generate grid cell nets (gg)
The ability to generate a network of grid cells given a point of origin, grid cell dimension and orientation.
B23 Generate latitude/longitude nets (gn)
The ability to generate graticules for a variety of map projections.
B24 Generate corridors (gb)
This process generates corridors of given width around existing points, lines or areas.
B25 Generate graphs (gr)
Create a graph illustrating attribute data by symbols, bars or fitted trend line.
B26 Generate viewshed maps (gv)
Given a digital elevation model and the locations of one or more viewpoints, generate polygons enclosing the area visible from at least one viewpoint.
B27 Generate perspective views (ge)
From a digital elevation model, generate a three-dimensional block diagram.
B28 Generate cross sections (cs)
Given a digital elevation model, show the cross-section along a user-specified line.
B29 Search by attribute (sa)
The ability to search the data base for objects with certain attributes.
B30 Search by region (sr)
The ability to search the data base within any region defined to the system.
B31 Suppress (su)
The ability to exclude objects by attribute (the converse of selecting by attribute).
B32 Measure number of items (mi)
The ability to count the number of objects in a class.
B33 Measure distances along straight and convoluted lines (md)
The ability to measure distances along a prescribed line.
B34 Measure length of perimeter of areas (mp)
The ability to measure the length of the perimeter of a polygon.
B35 Measure size of areas (ma)
The ability to measure the area of a polygon.
B36 Measure volume (mv)
The ability to compute the volume under a digital representation of a surface.
B37 Calculate - arithmetic (ca)
The ability to perform arithmetic, algebraic and Boolean calculations separately and in combination.
B38 Calculate bearings between points (cb)
The ability to calculate the bearing (with respect to True North) from a given point to another point.
B39 Calculate vertical distance or height (ch)
Given a digital elevation model, calculate the vertical distance (height) between two points.
B40 Calculate slopes along lines (gradients) (al)
The ability to measure the slope between two points of known height and location or to calculate the gradient between any two points along a convoluted line which contains two or more points of known elevation.
B41 Calculate slopes of areas (sl)
Given a digital elevation model and the boundary of a specified region (e.g., a part of a watershed), calculate the average slope of the region.
B42 Calculate aspect of areas (aa)
Given a digital elevation model and the boundary of a specified region, calculate the average aspect of the region.
B43 Calculate angles and distances along linear features (ad)
Given a prescribed linear feature, generalize its shape into a set of angles and distances from a start point, at user-set angular increments, and constrained to any known points along the linear feature.
B44 Subdivide area according to a set of rules (sb)
Given the corner points of a rectangular area, topologically subdivide the area into four quarters.
B45 Locations from traverses (lo)
Given a direction (one of eight radial directions) and distance from a given point, calculate the end point of the traverse.
B46 Statistical functions (sf)
The ability to carry out simple statistical analyses and tests on the database.
B47 Graphic overlay (go)
The ability to superimpose graphically one map on another and display the result on a screen or on a plot.
B48 Point in polygon (pp)
The ability to superimpose a set of points on a set of polygons and determine which polygon (if any) contains each point.
B49 Line on polygon overlay (lp)
The ability to superimpose a set of lines on a set of polygons, breaking the lines at intersections with polygon boundaries.
B50 Polygon overlay (op)
The ability to overlay digitally one set of polygons on another and form a topological intersection of the two, concatenating the attributes.
B51 Sliver polygon removal (sp)
The ability to delete automatically the small sliver polygons which result from a polygon overlay operation when certain polygon lines on the two maps represent different versions of the same physical line.
B52 Line of sight (ln)
The ability to determine the intervisibility of two points, or to determine those parts of pairs of lines or polygons which are intervisible.
B53 Nearest neighbor search (nn)
The ability to identify points, lines or polygons that are nearest to points, lines or polygons specified by location or attribute.
B54 Shortest route (ps)
The ability to determine the shortest or minimum cost route between two points or specified sets of points.
B55 Contiguity analysis (co)
The ability to identify areas that have a common boundary or node.
B56 Connectivity analysis (cy)
The ability to identify areas or points that are (or are not) connected to other areas or points by linear features.
B57 Complex correlation (cx)
The ability to compare maps representing different time periods, extracting differences or computing indices of change.
B58 Weighted modelling (wm)
The ability to assign weighting factors to individual data sets according to a set of rules and to overlay those data sets and carry out reclassify, dissolve and merge operations on the resulting concatenated data set.
B59 Scene generation (sg)
The ability to simulate an image of the appearance of an area from map data. The image would normally consist of an oblique view, with perspective.
B60 Network analysis (na)
Simple forms of network analysis are covered in Shortest route and Connectivity. More complex analyses are frequently carried out on network data by electrical and gas utilities, communications companies etc. These include the simulation of flows in complex networks, load balancing in electrical distribution, traffic analysis, and computation of pressure loss in gas pipes. In many cases these capabilities can be found in existing packages which can be interfaced to the GIS database.
Other groupings of GIS functions:
Berry, J.K., 1987, "Fundamental operations in computer-assisted map analysis". International Journal of GIS 1 119-36.
- Measuring distance and connectivity
- Characterizing neighborhoods
Goodchild, M.F., 1988, "Towards an enumeration and classification of GIS functions". Proceedings, IGIS '87
Tomlin, Dana, 1990. Geographic Information Systems and Cartographic Modeling. Prentice Hall.
based on a standard, semi-formal taxonomy of analytic functions for raster data
- Focal: operations that process a single cell
- Local: operations that process a cell and a fixed neighborhood
- Zonal: operations that process an area of homogeneous characteristics
- Global: operations that process the entire map
Maguire, David, 1991. Chapter 21: The Functionality of GIS. In D.J. Maguire, M.F. Goodchild and D.W. Rhind, editors, Geographical Information Systems: Principles and Applications. Longman, London.
A Six-way Classification of Spatial Analysis
1. Query and reasoning
based on database views
catalog
map
table
histogram
scatterplot
linked views
2. Measurement
simple geometric measurements associated with objects
area, distance, length, perimeter, shape
3. Transformation
buffers
point in polygon
polygon overlay
interpolation
density estimation
4. Descriptive summaries
centers
dispersion
spatial dependence
fragmentation
5. Optimization
best routes
raster version
network version
Paul's ride
best locations
6. Hypothesis testing
inference from sample to population
Integration of GIS and Spatial Analysis
1. Full integration (embedding)
- spatial analysis as GIS commands
- requires modification of source code
-
- difficult with proprietary packages
- analysis is not the strongest commercial motivation
- third party macros, scripting languages
2. Loose coupling
- unsatisfactory
-
- hooks too awkward
- loss of higher structures in data
- transfer of simple tables
3. Close coupling
- discretization problem
-
- discretization often not explicit in models
- e.g. slope, length
- user interface design
-
- models easy to use?
- the user-friendly grand piano
- user community is already frustrated
|