1. Cartographic Issues of Data Collection

The Census Bureau and Sampling

Associated reading: Montello & Sutton, Chapter 8




Terms to know:

Sample/population/sample frame
Sampling design: nonprobability/probability sampling

For overview of Census 2000 results, see here

Census Resources

At the Census:

Boundary files
American FactFinder
Census forms. Here is the short form online. Five out of six households received this form. Here is the long form that some people got (Adobe Acrobat format). One household in 6 received the long form.


Genealogy information

Census and population resources

Decision making with the Census

Why collect a census?

The Bible: Numbers Chapter 1
First US Census 1790--mandated in Constitution
First UK Census 1801

Attempts to map the US Census:

1870, 1890, 1910


Sample, sampling frame, population

Population: entire set of entities of interest
Sample: incomplete subsets of the population

Sampling frame: the subset of the population from which cases are actually drawn

Eg., in a telephone poll you might identify people from a list of registered voters. The "sampling frame" would be:

All people on the list of registered voters whose phone number is correct, and were in town during study period

Eg., identifiying species of cacti by driving along a desert service road. Sampling frame:

All cacti visible from the roads driven by researcher

Sampling Design

Sampling design: how cases are drawn from the sampling frame

1. Nonprobability sampling: the probability of selecting a particular case is unknown

Examples include snowball sampling where a researcher uses a case already selected to find other cases, eg., study of spatial pattern of drug users might ask one user about other users;

Also Convenience sampling, where take every case convenient, eg., sampling down mining pits

2. Probability sampling: cases have a known probability of being selected

Examples include simple random sampling where each case has equal probability of being selected. Selecting a case does not affect probability of selecting other cases;

Also systematic random sampling, where probability of selecting a case does affect probability of affecting other cases;

Also stratified random sampling, where segment sample frame into subsets and sample from each subset. Eg., segment by race, ethnicity, age, socioeconomic status etc.

Also multistage area sampling or sampling at different spatial scales (states/counties/census tracts)

Nonprobability sampling

Probability sampling

Snowball sampling

Simple random sampling

Convenience sampling

Systematic random sampling


Stratified random sampling


Multistage area sampling

Implications of sampling

1. Representativeness: Is the sample representative of the sampling frame. And, is the sampling frame representative of the population?

2. Generalizability: what larger set can we draw inferences about based on the sample?

Most textbooks (and the Wright article below, p. 3-4) suggest a scientific sample is a probability sample.

Why? Because it makes the link between the sample and the sampling frame more (but not perfectly) certain

However (and it's a big however!) nonprobability designs are common, and perfectly worthwhile. They are common because probability sampling only tells you the sample--sampling frame link, not the sampling frame--population representativeness.

Eg., a migration study. Population = all people who are moving, who have ever moved, or who will ever move.

Obviously not possible. A realistic sampling frame is based on convenience or = eg., last two decades of US Census data

3. Spatial sampling from continuous data. Sampling from truly continuous data (eg temperature) requires breaking the surface into discrete objects (points, lines or areas). this is a spatial sampling frame.

How many objects (eg., rain gauges) required?
What density of sampling points?
What length transect (linear sampling)?
What size/shape areal units? (MAUP)



Politics of sampling

US Census 2000 was affected by political considerations of using sampling


Article by Tommy Wright

Census data comes at various scales:

Metropolitan Statistical Area (MSA); city containing at least 50,000, or urbanized area of 50,000 with total metropolitan population of 100,000
    Census Tract or subdivision of the MSA; contain about 2,500-8,000 people
    Census Block (smallest measurement)



Supplements to the 2000 Census

American Community Survey (ACS) continues to collect data.

Rank, State Name

Average Travel Time to Work (minutes)

United States


1 New York


2 Maryland


3 District of Columbia


4 New Jersey


5 Illinois


6 California


7 Georgia


8 West Virginia


9 Massachusetts


10 Virginia




Redistricting: carried out after every census. The 1999 elections determine who will have a role in redistricting. A recent Supreme Court ruling found that inter-decade redistricting was legal.

After the 1980 Census, 17 US congressional seats were moved from NE and North-central USA to southern and western states. This was due to population shifts within the United States.

Georgia's redistricting information was released on March 22, 2000. It shows:


Biggest Counties

Fulton 816,006
DeKalb 665,865
Cobb 607,751
Gwinnett 588,448
Clayton 236,517

Fastest Growing Counties 1990-2000

Forsyth 123.2%
Henry 103.2%
Gwinnett 66.7%
[Fulton 25.7%]

Fastest Growing Incorporated Places 1990-2000

Augusta-Richmond County 347.5%
Alpharetta City 168.1%
Athens-Clarke County 121.9%
[Atlanta City 5.7%]

Total Population






Problems with using census data

1. Key terms may vary from census to census.

Example: What is "race" and "ethnicity" in the Census?

Anthropologists assert that "race" and "ethnicity" are not biological in origin, but rather cultural. There is more genetic diversity within a race (eg., whites or Africa-Americans) than between races.

Race is therefore a cultural construct which can change over time. The word "race" in the modern sense dates back only to the late 18th century!

Eg., the category of "Hispanics" was only invented in the 1950 census--not possible to compare them before that


2. Gerrymandering/MAUP

3. Nonparticipation and the "undercount"

The undercount is the number of people missed by the census. Mostly an undercount, but also includes some overcount (eg., students counted in dorms and at parents)

However, this error is not distributed evenly. That is, some people are more likely to be undercounted than others. These include children, renters in rural areas, and racial minorities. For example, the actual undercount in 1990 among whites was 0.7%, but among African Americans 4.4%, and Hispanics 5%. This means that federal allocation of funding is not accurate.

There is also a geographical uneveness to the error. Some places in the US have higher error than others. For example, rural areas.


Undercount in California. Each dot = 50 people undercounted.

Source: California Department of Finance

Undercount in California. Rates by county. Notice correct usage of choropleth technique reveals noticeably different pattern (but not completely dissimiliar) as San Francisco/LA urban areas are equalled out.

You might ask how they know how many people they didn't count?

The answer is that it is a prediction based on sampling models.


Gerrymandering: "packing and cracking"




The original gerrymander (1812)

Ensuring privacy
Personal information collected by the Census is protected by laws. Personal information which allows identification of an individual is prohibited for as long as 100 years. Only aggregated information can be released (eg. down to census block level). Under a law passed in the 1950s Census data can be released after 72 years, which means that the 1930 census has been fully released.