Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

Statistics for Dummies (13 page)

Correlation and causation

Of all of the misunderstood statistical issues, the most problematic is the misuse of the concepts of correlation and causation.

Correlation
means that two numerical variables have some sort of linear relationship. For example, the number of times crickets chirp per second is related to temperature; when it's cold outside, they chirp less frequently, and when it's warm outside they chirp more frequently. (This actually happens to be true!) Another example of correlation has to do with police staffing. The number of crimes (per capita) has often been found to be related to the number of police officers in a given area. When more police officers patrol an area, crime tends to be lower, and when fewer police officers are present, crime tends to be higher. However, seemingly unrelated events have also been found to be correlated. One such example is the consumption of ice cream (pints per person) and the number of murders in certain areas. Now maybe having more police officers deters crime, but does having people eat less ice cream deter crime? What's the difference? The difference is that with correlation, a link or relationship is found to exist between two variables,
x
and
y.
With
causation
, one makes that leap and says "a change in
x
will cause a change in
y
to happen." Too many times in research, in the media, or in the public consumption of statistical results, that leap is made when it shouldn't be. When can it be done? When a well-designed experiment is conducted that eliminates any other factors that could have been related to the outcomes. For more on correlation and causation, see
Chapter 18
.

 

Part II:
Number-Crunching Basics
Chapter List
Chapter 4:
Getting the Picture—Charts and Graphs
Chapter 5:
Means, Medians, and More

Number crunching: It's a dirty job, but somebody has to do it. Why not let it be you? Even if you aren't a numbers person and calculations aren't your thing, the step-by-step approach in this part may be just what you need to boost your confidence in doing and really understanding statistics.

In this part, you get down to the basics of number crunching, from making and interpreting charts and graphs to cranking out and understanding means, medians, standard deviations, and more. You also develop important skills for critiquing someone else's statistical information and getting at the real truth behind the data.

 

Chapter 4:
Getting the Picture—Charts and Graphs

Someone once said that a picture is worth a thousand words. In statistics, a picture may be worth a thousand data points — as long as that picture is done correctly, of course. Data displays, such as charts and graphs, appear often in everyday life. These displays show everything from election results, broken down by every conceivable characteristic, to how the stock market has fared over the past few years. Today's society is a fast-food, fast-information society; everyone wants to know the bottom line and be spared the details. The main use of statistics is to boil down information into summary form, and data displays are a natural way to do that. But do data displays give you the whole picture of what's happening with the data? That depends on the quality of the data display and its intended purpose. Pictures can be misleading (sometimes intentionally and sometimes by accident), and not every data display that you see will be correct. This chapter helps you gain a better understanding of the use of charts and graphs in the media and the workplace and shows you how to read and make sense of these data displays. In this chapter, I also give you some tips for evaluating data displays and for spotting those (oh so many!) misleading displays.

Getting Graphic with Statistics

The main purpose of a data display is to make a certain point, make the point clearly and effectively, and make the point correctly. A chart or graph, for example, is used to give impact to a specific characteristic of the data, highlight changes over time, compare and contrast opinions or demographic data, or show links between pieces of information. Data displays "break down"
a statistical story that the author wants to relay about a data set, so that the reader can quickly see the issue at a glance and come to some conclusion. For this reason, data displays are powerful: Used properly, they can be informative and effective; used improperly, they can be misleading and destructive.

Data displays can impact your life in large and small ways — responding to these critically and understanding what they say and don't say helps you become a savvy consumer of statistical data. You want to become familiar with the various types of data displays that you're likely to come across and to explore how these displays are used in the media and the workplace.

Researchers and journalists use different ways to display each of the two major types of data: categorical data, which represent qualities or characteristics (such as gender or political party) and numerical data, which represent measured quantities (such as height or income).

The most common types of data displays for categorical data are as follows:

For numerical data, tables are commonly used to display the data. In addition, histograms should be commonly used to display numerical data (but often aren't), so I include them in the "
Picturing Data with a Histogram
" section.

In this chapter, I present examples of each type of data display, some thoughts on interpretation, and tips for critically evaluating each type.

 

Getting a Piece of the Pie Chart

The pie chart is one of the most commonly used data displays because it's easy to read and can quickly make a point. You most likely have seen them before — they seem so simple. Can anything go wrong with an innocent pie chart? The answer is yes.

A pie chart takes categorical data and breaks them down by group, showing the percentage of individuals that fall into each group. Because a pie chart takes on the shape of a circle, or pie, the "slices" that represent each group can easily be compared and contrasted to one another. Because each individual in
the group falls into one and only one category, the sum of all the slices of the pie should be 100% or close to it (subject to a bit of round-off error).

Tallying personal expenses

When you spend your money, what do you spend it on? What are the top three expenses that you have? According to the U.S. Bureau of Labor Statistics, the top three sources of consumer expenditures in 1994 were housing (32%), transportation (19%), and food (14%).
Figure 4-1
shows these results in a pie chart. (Notice that the "other" category is a bit large in this chart. But in this case determining which of the other items should be included as a single pie chart slice would be difficult, because so many different types of expenditures for different people are possible.)

Figure 4-1:
Total U.S. consumer expenditures for 1994.

How did the U.S. government get this information? From something called the Consumer Expenditure Survey. Many federal agencies are charged with collecting data (often using surveys) and disseminating the results through written reports. (The U.S. government is a good source of information about many aspects of everyday life in the United States.)

Sizing up the lottery

State lotteries bring in a great deal of revenue, and they also return a large portion of the money received, with some of the revenues going to prizes and some being allocated to state programs, such as education. Where does the money come from?
Figure 4-2
shows a pie chart showing types of games and percentage of revenue they generate for the Ohio lottery.

Figure 4-2:
Ohio lottery revenue breakdown (1993–2002).

You can see by this pie chart that most of the Ohio lottery sales revenues (49.25%) come from the instant (scratch-off) games. The rest of the revenues come from various lottery-type games in which players choose a set of numbers and win if a certain number of their numbers match those chosen by the lottery. Why do the instant games account for such a large portion of the lottery sales? One possible reason is that the payouts for instant games are frequent, even though they're not very big. Also, you get instant feedback with the scratch-off games; with the lottery games you have to wait until a drawing occurs before you know whether you're a winner. On the other hand, maybe people just enjoy the satisfaction of scratching off those boxes!

Notice that this pie chart doesn't tell you
how much
money came in, only
what percentage
of the money came from each type of game. In other words, you know how the pie is divided up, but you don't really know how big the pie is to begin with. This is something you may want to know as a consumer of this information. About half of the money (49.25%) came from instant scratch off games; does this revenue represent a million dollars, two million dollars, ten million dollars, or more? The pie chart in
Figure 4-2
doesn't tell you that information, and you can't determine it on your own without being given the total amount of revenue dollars. I was, however, able to find this information on another chart provided by the Ohio lottery: The total revenue for 2002 from Ohio lottery sales was reported as "1,983.1 million dollars" — which you also
know as 1.983 billion dollars. Because 49.25% of sales came from instant games, this represents a sales revenue of $976,676,750 over a 10-year period. That's a lot of scratching!

HEADS UP 

Pie charts often show the breakdown of the portion or percentage of the total that falls in each group or category. But they often do not show you the total number in each group, in terms of original units (number of dollars, number of people, and so on). This approach results in a loss of information, may not necessarily present the whole story behind the data, and leaves you wondering what the total amount is that's being divided up. You can always go from amounts to percents, but you can't go from percentages back to the original amounts without knowing a total. With survey results, this lack of information can be a real problem; oftentimes, pie charts show the percentage of people who answered the question in a certain way, but they don't tell you how many people responded to the survey — a critical piece of information needed to assess the accuracy of the results. (See
Chapter 10
for more on accuracy and margin of error for surveys.)

REMEMBER 

Always look for the total number of individuals when given any data display. If it's not directly available, ask for it!

The Florida lottery uses a pie chart to report where your money goes when you purchase one of its lottery tickets (see
Figure 4-3
). You can see that half of the Florida lottery revenues (50 cents of every dollar spent) goes to prizes, and 38 cents of every dollar goes to education. This pie chart does break down the way each dollar of revenue is spent, but you probably also want to know
how many
dollars are spent playing the Florida lottery. Florida lottery ticket sales for 2001 actually totaled $2,360.6 million (or $2.36 billion), which amounts to $147.70 per capita (that is, per person), as shown in
Table 4-1
.

Figure 4-3:
Florida lottery expenditures (fiscal year 2001–2002).
Table 4-1:
Top Ten Lotteries (2001)

Rank

Lottery

Population (Millions)

Ticket Sales (Millions)

Prizes (Millions)

Net Income (Millions)

Prize (% of Revenues)

Sales per Capita ($)

1

New York

18.976

4,178

2,274

1,447

54.4%

220.16

2

Mass.

6.349

3,923

2,774

865

70.7%

617.85

3

California

33.872

2,896

1,492

1,048

51.5%

85.49

4

Texas

20.852

2,826

1,639

865

58.0%

135.50

5

Florida

15.982

2,361

1,180

862

50.0%

147.70

6

Georgia

8.186

2,194

1,142

692

52.0%

267.98

7

Ohio

11.353

1,920

1,113

637

58.0%

169.11

8

New Jersey

8.414

1,807

991

695

54.8%

214.72

9

Pennsylvania

12.281

1,780

996

627

55.9%

144.93

10

Michigan

9.938

1,615

874

586

54.1%

162.49

Interestingly, the Web site for the Michigan lottery reports the amount, in dollars, that the lottery gives to education each year, but not the percentage of the total lottery revenue that goes to education. For example, the 2001 amount reportedly given to education by the Michigan lottery was $587 million. Because you know from
Table 4-1
that the total lottery sales revenue for Michigan was $1,615 million (aka $1.6 billion), you can calculate the percentage of revenue that was given to education in this state. In Michigan, about 36% ($587 million ÷ $1,615 million × 100%) of the lottery sales revenue was given to education.

Pie charts are easy to use to compare the sizes of slices within a single pie itself, but they can also be used to compare one entire pie to another. For example, the New York lottery reports its expenditures using a pie chart (see
Figure 4-4
).

Figure 4-4:
New York lottery expenditures (2001–2002).

Comparing
Figures 4-3
and
4-4
, you can see that for the New York lottery, 56% of the money goes to prizes (slightly more than the percentage for Florida), and 33% goes to education (slightly less than the percentage for Florida). Included with each of the New York lottery's pie charts is a table showing the actual dollar amounts, allowing you to see more of the whole story. (However, the New York lottery makes you add the total up for yourself; it's well over $4.5 billion dollars.)

The state of New York also wants you to realize how much money it's putting toward education, in terms of a piece of the school-revenue pie (which is a very smart move, politically speaking).
Figure 4-5
shows that whereas 4% of New York school revenue in 2001–2002 came from federal aid, 5% came from the New York lottery. Again, this pie chart also comes with a table showing the actual dollar amounts. (In actuality, the only amount that's needed is the grand total dollar amount, because from the grand total and the percents in the pie chart, you can generate the numbers in the table. Not having to do that extra work is nice, however.)

Figure 4-5:
New York schools' revenue (2001–2002).

Other books

The Crossing of Ingo by Helen Dunmore
Curse of the Dream Witch by Allan Stratton
The Scarlet Thread by Francine Rivers
Rhythm of the Spheres by Abraham Merritt
Dead Money by Grant McCrea
Soron's Quest by Robyn Wideman
The Other Side of Goodness by Vanessa Davis Griggs