Not random enough
In the story about
Research 2000, a polling firm was caught out when statisticians noticed the data was not random enough. There have been other, more famous cases, where lack of enough randomness cast some doubt on statistical results.
Gregor Mendel
Gregor Mendel (
wiki), from experiments on peas, discovered the basic laws of genetic inheritance. (His 1865
paper in English). In 1936, R. A. Fisher, the Babe Ruth of statisticians,
burrowed through Mendel's data, assessing the goodnessoffit of the reported data to the genetic theory. For example, in one set of "bifactorial" experiments, 529 plants were classified according to the genotype of their seeds' form (A = round or a = wrinkled) and color (B = yellow or b = green). (Thus each plant had two form letters and two color letters.) The results (as in Fisher's Table I):
Observed (Theory)  AA  Aa  aa 
BB  38 (1)  60 (2)  28 (1) 
Bb  65 (2)  138 (4)  68 (2) 
bb  35 (1)  67 (2)  30 (1) 
Theory says the observations should be in the ratio 1:2:1 for each row, and for each column, so that the interior of the table should be in the ratio shown in the table in parentheses. Do the data fit the theory? A chisquared test has observed value 2.8110 on 8 degrees of freedom, which yields a pvalue of 0.9457, very large, meaning the data fit very well.
Fisher looked at several other chisquared tests. Here's a summary (from Fisher's Table V):
Experiment  Chisquare statistic  Degrees of freedom  pvalue 
3:1 ratio: seeds  0.2779  2  0.8694


3:1 ratio: plants  1.8610  5  0.8680


2:1 ratio: seed  0.5983  2  0.7414 
2:1 ratio: seed  4.5750  6  0.5994 
Bifactorial  2.8110  8  0.9457 
Gametic ratios  3.6730  15  0.9986 
Trifactorial  15.3224  26  0.9511 
Plant variation  12.4870  20  0.8983 

Total  41.6056  84  0.99997 
Note that in every experiment, the data fit very well. In fact, the chance that overall data would fit as well as it did is only 3/100,000. The data fit a little too well for comfort. Fisher's observation brewed up quite a storm, one that still rages. Did Mendel cheat? My opinion is no, but he probably didn't report all his results, e.g., the ones that he didn't think fit well. (Note that every experiment fit better than the median fit, which suggests consistency to Mendel's approach.) Nowadays that behavior would be considered a nono, but back then the standards for statistical experimentation had not yet been established.
An interesting paper, A Statistical Model to Explain the
MendelFisher Controversy by
Ana M. Pires and João A. Branco (Statistical Science, 2010, Vol. 25, pp. 545565), presents a model that possibly explains Mendel's results. Briefly, the idea
is that "the data to be presented can
be modeled by assuming that an experiment is repeated
whenever its pvalue is smaller than α, where
0 ≤ α ≤ 1 is a parameter fixed by the experimenter, and then only the one with the largest pvalue is reported." Mendel's reported data fits this model very well,
suggesting something close to this process could have been followed by Mendel, whether formally or not.
Cyril Burt
Another British scientist, the educational psychologist Cyril Burt, has had his research questioned because the data are not random enough (among other things). See
The Cyril Burt Affair. A key element of Burt's theory rested on data he collected on identical twins reared separately (so they'd have the same genetics, but different environments). The correlation coefficient between IQ's for the two individuals in 53 such pairs of twins was 0.771. Very high! The data for the 53 pairs were collected over time, and the correlation was updated as more data came in. The three stages yielded the following:
Year reported  # Twins so far  Correlation coefficient so far 
1943  15  0.770 
1955  21  0.771 
1966  53  0.771 
That is, the 15 are part of the 21, and the 21 are part of the 53. What's suspicious is that adding 6 pairs only moved the correlation coefficient by 0.001 point, and the subsequent additional 32 pairs didn't move the correlation at all! Did he just make up those extra twins, to reenforce his theory? The controversy still rages. But there still is a question of how unlikely it is to have such closely agreeing correlations in such a situation, without cheating. How about simulation? First, I simulated 53 bivariate normal observations from a population with correlation 0.771; calculated the correlation coefficient for the first 15 observations, first 21 observations, and complete set of 53 observations; then rounded the three values to three decimal places. I did this a million times. The first few sets of coefficients:
0.838 0.722 0.800
0.767 0.821 0.786
0.792 0.805 0.819
0.816 0.786 0.847
0.760 0.756 0.753
What is the chance the three are as close to each other as (0.770, 0.771, 0.771) are? Of the million simulated triples, 47 triples were exactly equal (e.g., 0.749, 0.749, 0.749), and 247 were just as close as the observed (e.g., 0.832, 0.833, 0.833 and 0.796, 0.797, 0.797). Thus the answer is estimated to be 294/1,000,000, i.e., about 3/10,000. The chance is very small, but not infinitesimal.
The fittoowell chances of 3/100,000 for Mendel and 3/10,000 for Burt were small enough to raise eyebrows, but not quite the slamdunk that the Research 2000 data showed, chances like 10
^{228}.