Not random enough
In the story about Research 2000
, a polling firm was caught out when statisticians noticed the data was not random enough. There have been other, more famous cases, where lack of enough randomness cast some doubt on statistical results.
Gregor Mendel (wiki
), from experiments on peas, discovered the basic laws of genetic inheritance. (His 1865 paper
in English). In 1936, R. A. Fisher, the Babe Ruth of statisticians, burrowed
through Mendel's data, assessing the goodness-of-fit of the reported data to the genetic theory. For example, in one set of "bifactorial" experiments, 529 plants were classified according to the genotype of their seeds' form (A = round or a = wrinkled) and color (B = yellow or b = green). (Thus each plant had two form letters and two color letters.) The results (as in Fisher's Table I):
|BB||38 (1)||60 (2)||28 (1)|
|Bb||65 (2)||138 (4)||68 (2)|
|bb ||35 (1)||67 (2)||30 (1)|
Theory says the observations should be in the ratio 1:2:1 for each row, and for each column, so that the interior of the table should be in the ratio shown in the table in parentheses. Do the data fit the theory? A chi-squared test has observed value 2.8110 on 8 degrees of freedom, which yields a p-value of 0.9457, very large, meaning the data fit very well.
Fisher looked at several other chi-squared tests. Here's a summary (from Fisher's Table V):
|Experiment||Chi-square statistic||Degrees of freedom|| p-value|
|3:1 ratio: seeds||0.2779||2||0.8694
|3:1 ratio: plants||1.8610||5||0.8680
|2:1 ratio: seed||0.5983||2||0.7414|
|2:1 ratio: seed||4.5750||6||0.5994|
Note that in every experiment, the data fit very well. In fact, the chance that overall data would fit as well as it did is only 3/100,000. The data fit a little too well for comfort. Fisher's observation brewed up quite a storm, one that still rages. Did Mendel cheat? My opinion is no, but he probably didn't report all his results, e.g., the ones that he didn't think fit well. (Note that every experiment fit better than the median fit, which suggests consistency to Mendel's approach.) Nowadays that behavior would be considered a no-no, but back then the standards for statistical experimentation had not yet been established.
An interesting paper, A Statistical Model to Explain the
Ana M. Pires and João A. Branco (Statistical Science
, 2010, Vol. 25, pp. 545-565), presents a model that possibly explains Mendel's results. Briefly, the idea
is that "the data to be presented can
be modeled by assuming that an experiment is repeated
whenever its p-value is smaller than α, where
0 ≤ α ≤ 1 is a parameter fixed by the experimenter, and then only the one with the largest p-value is reported." Mendel's reported data fits this model very well,
suggesting something close to this process could have been followed by Mendel, whether formally or not.
Another British scientist, the educational psychologist Cyril Burt, has had his research questioned because the data are not random enough (among other things). See The Cyril Burt Affair
. A key element of Burt's theory rested on data he collected on identical twins reared separately (so they'd have the same genetics, but different environments). The correlation coefficient between IQ's for the two individuals in 53 such pairs of twins was 0.771. Very high! The data for the 53 pairs were collected over time, and the correlation was updated as more data came in. The three stages yielded the following:
|Year reported||# Twins so far||Correlation coefficient so far|
That is, the 15 are part of the 21, and the 21 are part of the 53. What's suspicious is that adding 6 pairs only moved the correlation coefficient by 0.001 point, and the subsequent additional 32 pairs didn't move the correlation at all! Did he just make up those extra twins, to reenforce his theory? The controversy still rages. But there still is a question of how unlikely it is to have such closely agreeing correlations in such a situation, without cheating. How about simulation? First, I simulated 53 bivariate normal observations from a population with correlation 0.771; calculated the correlation coefficient for the first 15 observations, first 21 observations, and complete set of 53 observations; then rounded the three values to three decimal places. I did this a million times. The first few sets of coefficients:
0.838 0.722 0.800
0.767 0.821 0.786
0.792 0.805 0.819
0.816 0.786 0.847
0.760 0.756 0.753
What is the chance the three are as close to each other as (0.770, 0.771, 0.771) are? Of the million simulated triples, 47 triples were exactly equal (e.g., 0.749, 0.749, 0.749), and 247 were just as close as the observed (e.g., 0.832, 0.833, 0.833 and 0.796, 0.797, 0.797). Thus the answer is estimated to be 294/1,000,000, i.e., about 3/10,000. The chance is very small, but not infinitesimal.
The fit-too-well chances of 3/100,000 for Mendel and 3/10,000 for Burt were small enough to raise eyebrows, but not quite the slam-dunk that the Research 2000 data showed, chances like 10-228