The Bell Curve: Intelligence and Class Structure in American Life (110 page)

Read The Bell Curve: Intelligence and Class Structure in American Life Online

Authors: Richard J. Herrnstein,Charles A. Murray

Tags: #History, #Science, #General, #Psychology, #Sociology, #Genetics & Genomics, #Life Sciences, #Social Science, #Educational Psychology, #Intelligence Levels - United States, #Nature and Nurture, #United States, #Education, #Political Science, #Intelligence Levels - Social Aspects - United States, #Intellect, #Intelligence Levels

 

This situation is relevant to some of the outcome measures discussed in Chapter 14, such as short-term male unemployment, where the black and white means are quite different, but IQ has little relationship to short-term unemployment for either whites or blacks. This figure was constructed assuming only that there are factors influencing outcomes that are not captured by the predictor, hence its low validity, resulting in the low slope of the parallel regression lines.
11
The intercepts differ, expressing the generally higher level of performance by whites compared to blacks that is unexplained by the predictor variable. If we knew what the missing predictive factors are, we could include them in the predictor, and the intercept difference would vanish—and so would the implication that the newly constituted predictor is biased against whites. What such results seem to be telling us is, first, that IQ tests are not predictively biased against blacks but, second, that IQ tests alone do not explain the observed black-white differences in outcomes. It therefore often looks as if the IQ test is biased against whites.

More on Internal Evidence of Bias: hem Analysis
 

Laymen are often skeptical that IQ test items could measure anything as deep as intelligence. Knowing the answers seems to them to depend less on intelligence than on having been exposed to certain kinds of cultural or historical information. It is usually a short step from here to the conclusion that the tests must be biased. Pundits of varying sorts reinforce this intuition about test item bias, claiming that the middle-and
upper-class white culture infuses test items even after vigorous efforts to expunge it.

The data confirming Spearman’s hypothesis, which we discussed at some length in Chapter 13, provide the most convincing conceptual refutation of this allegation by providing an alternative explanation that has been borne out by many studies: the items on which blacks and whites differ most widely are not those with the most esoteric cultural content, but the ones that best measure the general intelligence factor,
g.
12
But many other studies have directly asked whether the cultural content of items is associated with the magnitude of the black-white difference, which we review here.

One of the earliest of the studies, a 1951 doctoral thesis at Catholic University, proceeded on the assumption that some test items are more dependent on exposure to culture than others.
13
Frank McGurk, the study’s author, consequently had large numbers of independent judges rate many test items for their cultural loading. On exploratory tests, he was able to establish each item’s general difficulty, which is defined simply as the proportion of a population that gets the item wrong. He could therefore identify pairs of items, one highly loaded with cultural information and the other not highly loaded but of equal difficulty. Now, finally, the crucial evaluation could be made with a sample of black and white high school students matched for schooling and socioeconomic background. The black-white gap, he discovered, was about twice as large on items rated as
low
in cultural loading as on items rated as high in cultural loading. Consider, for example, a pair of equally difficult test items. The one that is culturally loaded is probably difficult because it draws on esoteric knowledge; the other item is probably difficult because it calls on complex cognitive processing—
g.
McGurk’s results undermined the proposition that access to esoteric knowledge was to blame for the black-white difference.

Another approach in the pursuit of test-item bias is based on which items blacks and whites find hard or easy. Conceptually, this is much like McGurk’s approach, except that it does not require us to have items rated by experts, a subjective procedure that some might find suspect. Instead, if the cultural influence matters and if blacks and whites have access to different cultural backgrounds, then items that pick up these cultural differences should split the two groups. Items drawing on cultural knowledge more available to whites than to blacks should be, on average, relatively easier for whites than for blacks. Items lacking this
tip for whites or items with a tip for blacks should not be differentially easier for whites and may be easier for blacks.

This idea is tested by ranking the items on a test separately for whites and for blacks, in order of difficulty. That is, the easiest item for whites is the one with the highest proportion of correct answers among whites; the next easiest item for whites is the one with the second highest proportion of correct answers for whites; and so on. Now repeat the procedure using the blacks’ proportions of correct answers. This will result in two sets of rank orders for all the items. The rank-order correlation between them is a measure of the test-item bias hypothesis: The larger the correlation is, the less support the hypothesis finds. Alternatively, the proportions of correct responses within each group are transformed into standard scores and then correlated by some other measure of correlation, such as the Pearson product-moment coefficient.

Either way, the result is clear. Relative item difficulties are essentially the same for both races (by sex). That is, blacks and whites of the same sex come close to finding the same item the easiest, the same item next easiest, all the way down to the hardest item.
14
When the rank order of difficulty differs across races, the differences tend to be small and unsystematic. Rank order correlations above .95 are not uncommon for the items on the Wechsler and Stanford-Binet tests, which are, in fact, the tests that provide most of the anecdotal material for arguing that test items are biased. Pearson correlations are often somewhat lower but typically still above .8. Moreover, when items do vary in difficulty across races, most of the variation is eliminated by taking mental age into account. Since blacks and whites of the same chronological age differ on average in mental age, allowing a compensating lag in chronological age will neutralize the contribution of mental age. Compare, say, the item difficulties for 10-year-old blacks with that for 9-year-old or 8-year-old whites. When this is done, the correlations in difficulty almost all rise into the .9 range and above.
15

Because “item bias” ordinarily defined has failed to materialize, the concept has been extended to encompass item characteristics that are intertwined with the underlying rationale for thinking that an item measures
g.
For example, one researcher has found that the black-white gap is diminished for items that call for the subject to identify the one false response, compared to items requiring the subject to identify the one correct response.
16
Is this a matter of bias, or a matter of how well the two types of items tap the construct called intelligence? This in turn brings
us full circle to Spearman’s hypothesis discussed in Chapter 13, which offers an interpretative framework for explaining such differences.

More on Other Potential Sources of Bias
 

We turn now to one of the least precisely but most commonly argued reasons for thinking that tests are biased: Tests are a sort of game, and, as in most games, it helps to have played the testing game, it helps to get coaching, and it helps to be playing on the home field. Privileged groups get more practice and coaching than underprivileged groups. They have a home-court advantage; the tests are given in familiar environments, administered by familiar kinds of people. A major part of the racial differences in test scores may be attributed to these differences. In this discussion, we begin with coaching and practice, then turn to some of the other ways in which the testing situation might influence scores.

PRACTICE AND COACHING.
For IQ tests, coaching and practice are not a significant issue because coaching and practice effects exist only under conditions that virtually never apply. To get a sizable practice effect for an IQ test, it is necessary to use subjects who have
never
taken an IQ-like test, administer the
identical
test twice, and do so
quickly
(preferably within a few weeks).
17
If the subjects fail to meet any of those conditions, the chances of finding a practice effect are small, and the size of any effect, if one is found, will be just a few points. Coaching effects are even harder to obtain. We are unable to identify any IQ data in any study, large or small, in which the results are compromised because the IQ scores of part of the sample have been obtained after this kind of experience. That’s not the way that IQ tests have been administered anywhere to any significant sample at any time during the history of IQ testing—except to the samples used to assess practice and coaching effects, and sometimes to the subjects of intensive remedial programs such as those discussed in Chapter 17.

The story regarding practice and coaching for such tests as the Scholastic Aptitude Test (SAT), the Law School Admissions Test (LSAT), and the Medical College Admissions Test (MCAT) is much more contentious than the story about IQ. Many people do take these tests more than once, many people practice for them, and many people get extensive coaching. Moreover, these tests are supposed to be “coachable,” insofar as they measure the verbal, reasoning, and analytic skills
that a good education is supposed to enhance, and prolonged exposure to such coaching should produce better scores. Or to put it another way, two students with the same IQ should be able to get different LSAT and MCAT scores if one student has taken more appropriate courses and studied harder than the other student. That SAT scores declined by almost half a standard deviation from 1964 to 1980 strongly suggests that something coachable—or “negatively coachable” in this example—is being measured. In Chapter 17, we discuss the effects of coaching for the SAT, which are real but also smaller and harder to obtain than the widely advertised claims of the coaching industry.

The belief that coaching might explain part of the black-white gap often rests on a notion that, on the average, blacks receive less of the practice and coaching that might have elevated their scores than does the average white. We have already undermined this notion by showing that the tests are biased against blacks neither predictively nor in terms of particular item difficulties. There is, however, a literature that bears more directly on this idea, by looking for an interaction effect between practice or coaching and race.

If practice and coaching explain any portion of a group difference in scores in the population as a whole, then it necessarily follows that representative samples of those groups who are equally well practiced and well coached will show a smaller difference than is observed in the population at large. It is not enough that practice or coaching raises the mean score of the lower-scoring group; it must raise its mean score
more than
it raises the score of the higher-scoring group.

Several studies have investigated whether this is found for blacks and whites. In a well-designed study, representative samples of blacks and whites are randomly divided into two groups. The experimental black and white groups receive identical coaching (or practice), and the control groups receive no treatment at all. At the end of the experiment, the investigator has four different sets of results: test scores for coached blacks, uncoached blacks, coached whites, and uncoached whites. These results may be analyzed in three basic ways: One may compare blacks overall with whites overall, which will reveal the
main effect of race;
or the coached samples overall with the uncoached samples overall, which will reveal the
main effect of the coaching;
or the way in which the effects of coaching vary according to the race of the persons being coached, known as the
interaction effect.

One study found a statistically significant differential response to
practice, but not to direct instruction, on a reasoning test, between black and white college students.
18
The differential advantage of practice for blacks compared to whites was about an eighth of the overall black-white gap on this test. Other studies have failed to find even this much of a differential response, or they have found differential responses in the opposite direction, tending to increase the black-white gap after practice.
19
Taking the evidence as a whole, any differential coaching and practice effects by race (or socioeconomic status) is at most sporadic and small. If such a differential effect exists, it is too small to be replicated reliably. The scattered evidence of a differential effect is about as supportive of a white advantage from coaching as of a black advantage.

EXAMINER EFFECTS AND OTHER SITUATIONAL VARIABLES.
Is it possible that disadvantaged groups come to the test with greater anxiety than confident middle-class students, and this mental state depresses their scores? That, when a black student takes a bus across town to an unfamiliar neighborhood and goes into a testing room filled with white students and overseen by a white test supervisor, this situation has an intimidating effect on performance? What about the time limits on tests? Might these have more pronounced effects on disadvantaged students than on test-wise middle-class students? All are plausible questions, but the answer to each is the same: Investigations to date give no reason to believe that such considerations explain a nontrivial portion of the group differences in scores.

The race of the examiner has been the subject of numerous studies. Of those with adequate experimental designs, most have showed nonsignificant effects; of the rest, the evidence is as strong that the presence of a white examiner
reduces
overall black-white difference as that a white examiner exacerbates the difference.
20
Examinations of the results of time pressures fail to demonstrate either that blacks do better in untimed than in timed tests or that the test-taking “personal tempo” of blacks is different from that of whites.
21
Test anxiety has been investigated extensively but, as in so many other aspects of this discussion, the relationship tends to be the opposite of the expected one: To the extent that test anxiety affects performance at all, it seems to help slightly. Only a few studies have specifically addressed black-white differences in test anxiety; they have shown either nonsignificant results, or that the white subjects were slightly more anxious than the black subjects.
22

Other books

Michael Chabon by The Mysteries of Pittsburgh
Home is Goodbye by Isobel Chace
The Skull Throne by Peter V. Brett
Stepbrother Want by Tess Harper
The Scottish Companion by Karen Ranney
Darkness Exposed by Reid, Terri
The Adjustment by Scott Phillips
Angel by Stark, Alexia
Hitler's Forgotten Children by Ingrid Von Oelhafen
Dangerous to Know by Tasha Alexander