UNIT 4 - THE RASTER GIS
Compiled with assistance from Dana Tomlin,
The Ohio State University
A. THE DATA MODEL
B. CREATING A RASTER
Cell by cell entry
Digital data
C. CELL VALUES
Types of values
One value per cell
D. MAP LAYERS
Resolution
Orientation
Zones
Value
Location
E. EXAMPLE ANALYSIS USING A RASTER GIS
Objective
Procedure
Result
Operations used
REFERENCES
EXAM AND DISCUSSION QUESTIONS
NOTES
Although most of the material in this Curriculum
is designed to be as independent as possible from specific data
models, it is necessary to deal with this basic concept early
so that students can start hands-on exercises with a GIS program.
Following Unit 5, we return to the more fundamental concepts and
do not address specific vector GIS issues until Units 13 and 14.
There are other several places these topics could be placed in
a course sequence. We have tried to make Units 4 and 5 as independent
as possible so that you can move them within the Curriculum relatively
easily.
UNIT 4 - THE RASTER GIS
Compiled with assistance from Dana Tomlin,
The Ohio State University
A. THE DATA MODEL
- geographical variation in the real world is infinitely complex
- the closer you look, the more detail you see,
almost without limit
- it would take an infinitely large database to capture the
real world precisely
- data must somehow be reduced to a finite and
manageable quantity by a process of generalization or abstraction
- geographical variation must be represented
in terms of discrete elements or objects
- the rules used to convert real geographical variation into
discrete objects is the data model
- Tsichritzis and Lochovsky (1977) define a data
model as "a set of guidelines for the representation of the
logical organization of the data in a database... (consisting)
of named logical units of data and the relationships between them."
- current GISs differ according the way in which they organize
reality through the data model
- each model tends to fit certain types of data and applications
better than others
- the data model chosen for a particular project or application
is also influenced by:
- the software available
- the training of the key individuals
- historical precedent
- there are two major choices of data model - raster and vector
overhead - Major GIS data models
- raster model divides the entire study area into a regular
grid of cells in specific sequence
- the conventional sequence is row by row from
the top left corner
- each cell contains a single value
- is space-filling since every location in the
study area corresponds to a cell in the raster
- one set of cells and associated values is a
layer
- there may be many layers in a database, e.g. soil
type, elevation, land use, land cover
- vector model uses discrete line segments or points to identify
locations
- discrete objects (boundaries, streams, cities)
are formed by connecting line segments
- vector objects do not necessarily fill space,
not all locations in space need to be referenced in the model
- a raster model tells what occurs everywhere - at each place
in the area
- a vector model tells where everything occurs - gives a location
to every object
- conceptually, the raster models are the simplest of the
available data models
- therefore, we begin our examination of GIS
data and operations with the raster model and will consider vector
models after the fundamental concepts have been introduced.
B. CREATING A RASTER
- consider laying a grid over a geologic map
- create a raster by coding each cell with a
value that represents the rock type which appears in the majority
of that cells areas
- when finished, every cell will have a coded
value
overhead - Creating a raster
- this illustrates a more complex example
- in most cases the values that are to be assigned to each
cell in the raster are written into a file, often coded in ASCII
- this file can be created manually by using
a word processor, database or spreadsheet program or it can be
created automatically
- then it is normally imported into the GIS so
that the program can reformat the data for its specific processing
needs
- there are several methods for creating raster databases
Cell by cell entry
- direct entry of each layer cell by cell is simplest
- entry may be done within the GIS or into an
ASCII file for importing
- each program will have specific requirements
overhead - Typical ASCII file formats used in importing
- the process is normally tedious and time-consuming
- layer can contain millions of cells
- average Landsat image is around 7.4 x 106
pixels, average TM scene is about 34.9 x 106
pixels
- run length encoding can be more efficient
- values often occur in runs across several cells
- this is a form of spatial autocorrelation -
tendency for nearby things to be more similar than distant things
- data entered as pairs, first run length, then
value
0 0 0 1 1
0 0 1 1 1
0 0 1 1 1
0 1 1 1 1
would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4
1
- this is 16 items to enter, instead of 20
- in this case the saving is 20%, but much higher
savings occur in practice
- imagine a database of 10,000,000 cells and a layer which
records the county containing each pixel
- suppose there are only two counties in the
area covered by the database
- each cell can have one of only two values so
the runs will be very long
- only some GISs have the capability to use run length encoded
files
- note: Units 35 and 36 cover run length encoding and other
aspects of raster storage in more detail
Digital data
- much raster data is already in digital form, as images,
etc.
- however, resampling will likely be needed in
order that pixels coincide in each layer
- because remote sensing generates images, it is easier to
interface with a raster GIS than any other type
- elevation data is commonly available in digital raster form
from agencies such as the US Geological Survey
C. CELL VALUES
Types of values
- the type of values contained in cells in a raster depend
upon both the reality being coded and the GIS
- different systems allow different classes of values, including:
overhead - Raster data values
- whole numbers (integers)
- real (decimal) values
- alphabetic values
- many systems only allow integers, others which
allow different types restrict each separate raster layer to a
single kind of value
- if systems allow several types of values, e.g. some layers
numeric, some non-numeric, they should warn the user against doing
unreasonable operations
- e.g. it is unreasonable to try to multiply
the values in a numeric layer with the values in a non-numeric
layer
- integer values often act as code numbers, which "point"
to names in an associated table or legend
- e.g. the first example might have the following
legend identifying the name of each soil class:
0 = "no class"
1 = "fine sandy loam"
2 = "coarse sand"
3 = "gravel"
One value per cell
- each pixel or cell is assumed to have only one value
- this is often inaccurate - the boundary of
two soil types may run across the middle of a pixel
- in such cases the pixel is given the value
of the largest fraction of the cell, or the value of the middle
point in the cell
- note, however, a few systems allow a pixel to have multiple
values
- the NARIS system developed at the University
of Illinois in the 1970s allowed each pixel to have any number
of values and associated percentages
- e.g. 30% a, 30% b, 40% c
D. MAP LAYERS
- the data for an area can be visualized as a set of maps
of layers
- a map layer is a set of data describing a single
characteristic for each location within a bounded geographic area
- only one item of information is available for each location
within a single layer - multiple items of information require
multiple layers
- on the other hand, a topographic map can show
multiple items of information for each location, within limits
- e.g. elevation (contours), counties (boundaries),
roads, railroads, urbanized areas (grey tint)
- these would be 5 layers in a raster GIS
- typical raster databases contain up to a hundred layers
- each layer (matrix, lattice, raster, array)
typically contains hundreds or thousands of cells
- important characteristics of a layer are its resolution,
orientation and zone(s)
Resolution
- in general, resolution can be defined as the minimum linear
dimension of the smallest unit of geographic space for which data
are recorded
- in the raster model the smallest units are generally rectangular
(occasionally systems have used hexagons or triangles)
- these smallest units are known as cells, pixels
- note: high resolution refers to rasters with small cell
dimensions
- high resolution means lots of detail, lots
of cells, large rasters, small cells
Orientation
- the angle between true north and the direction defined by
the columns of the raster
Zones
- each zone of a map layer is a set of contiguous locations
that exhibit the same value
- these might be:
- ownership parcels
- political units such as counties or nations
- lakes or islands
- individual patches of the same soil or vegetation
type
overhead - Example raster database
- there is considerable confusion over terms here
- other terms commonly used for this concept
are patch, region, polygon
- each of these terms, however, have different
meanings to individual users and different definitions in specific
GIS packages
- in addition, there is a need for a second term
which refers to all individual zones that have the same characteristics
- class is often used for this concept
diagram
- note that not all map layers will have zones, cell contents
may vary continuously over the region making every cell's value
unique
- e.g. satellite sensors record a separate value
for reflection from each cell
- major components of a zone are its value and location(s)
Value
- is the item of information stored in a layer for each pixel
or cell
- cells in the same zone have the same value
Location
- generally location is identified by an ordered pair of coordinates
(row and column numbers) that unambiguously identify the location
of each unit of geographic space in the raster (cell, pixel, grid
cell)
- usually the true geographic location of one or more of the
corners of the raster is also known
E. EXAMPLE ANALYSIS USING A RASTER GIS
Objective
- identify areas suitable for logging
- an area is suitable if it satisfies the following criteria:
- is Jackpine (Black Spruce are not valuable)
- is well drained (poorly drained and waterlogged
terrain cannot support equipment, logging causes unacceptable
environmental damage)
- is not within 500 m of a lake or watercourse
(erosion may cause deterioration of water quality)
Procedure
overheads - Example project steps (1 page) and details (3
pages)
- recode layer 2 as follows, creating layer 4
- y if value 2 (Jackpine)
- n if other value
- recode layer 3 as follows, creating layer 5
- y if value 2 (good)
- n if other value
- spread the lake on layer 1 by one cell (500 m), creating
layer 6
- recode the spread lake on layer 6 as follows, creating layer
7
- n if in spread lake
- y if not
- overlay layers 4 and 5 to obtain layer 8, coding as follows
- y if both 4 and 5 are y
- n otherwise
- overlay layers 7 and 8 to obtain layer 9, coding as follows
- y if both 7 and 8 are y
- n otherwise
Result
- the loggable cells are y on layer 9
Operations used
- recode
- overlay
- spread
- we could have achieved the same result using the operations
in other sequences, or by combining recode and overlay operations
- e.g. overlay layers 2 and 3, coding as follows
- y if layer 2 is 2 and layer 3 is 2, n otherwise
- this would replace two recodes and an overlay
- e.g. some systems allow layers to be overlaid
3 or more at a time
- the names given to operations vary from system to system,
but most of the operations themselves are common across systems
REFERENCES
Star, J.L. and J.E. Estes, 1990. Geographic Information
Systems: An Introduction, Prentice Hall, Englewood Cliffs,
NJ. An introduction to GIS with a strong raster orientation.
Tomlin C. Dana 1990 Geographic Information Systems
and Cartographic Modeling , Prentice Hall, Englewood Cliffs,
NJ
Further references can be found following Unit 5.
EXAM AND DISCUSSION QUESTIONS
1. What types of geographical data fit the raster
GIS data model best? What types fit worst?
2. Review the issues involved in selecting a resolution
for a raster GIS project.
3. What resolutions would be appropriate for the
following problems: (a) determining logging areas in a National
Forest, (b) finding suitable locations for backcountry campsites,
(c) planning subdivisions to take account of noise from an airport?
4. Review the methods of planning described in Ian
McHarg's classic book Design with Nature (1969, Doubleday,
New York). In what ways would they (a) benefit and (b) suffer
from implementation using raster GIS?
- Using the documentation for the raster GIS program
you have, determine how that program uses (a) the concept of "zone"
as a contiguous group of cells of the same value, and (b) the
concept of several groups of cells that all have the same value.
Is there any ambiguity in the way your program deals with these
two concepts?
RETURN TO BEGINNING OF UNIT 4