THE CREDENTIALS OF SCIENTIFIC EVIDENCE: REBALANCING THE EPISTEMOLOGICAL SCALES
9.2.6 Fisher’s fixation and enduring legacy
Abstract
This commentary provides a historical and biographical critique about Sir Ronald Fisher, expounding the evidence and opinions that his preoccupation with frequentist statistics has acted as an impediment to the progress of medical science, because frequentism disregards basic science and causality. The paper details Fisher’s aggressive promotion of frequentism and P values and his simultaneous dismissal of Bayesian probability. His misdirection is illustrated by his lifelong strident refusal to accept the causal link between smoking and lung cancer. It also contrasts Fisher’s high profile with the secret war-work of two equally eminent scientists, Tukey and Turing. They successfully utilized Bayesian techniques in World War II, to protect the transatlantic convoys and to crack the Enigma code — a great success for Bayesian methods which were therefore kept classified, under official secrets acts, for decades. Consequently, this success greatly retarded their adoption in science and medicine, and for decades allowed the unchallenged dominance of frequentist methods. Thus, statistics remained stuck in what Tukey called Fisher’s ‘childhood of experimental statistics.’ This has occurred despite numerous eminent scientists having repeatedly pointed out the extensive problems of the frequentist approach, which is inextricably entwined with RCT methodology. It concludes that progressing beyond Fisher’s legacy is essential for developing the Bayesian and causal inference methods that are necessary tools for advancing scientific discovery — that requires experiments aimed at elucidating causes and mechanisms rather than the superficial activity of merely comparing one drug with another; it means asking ‘how and why’, not just ‘what’ [1].
Introduction
One might consider that Fisher’s fixation on frequentism and the legacy of P values is an albatross which yet hangs around the neck of medical science1 — for those who regard such a statement as heresy, the references cited herein will show that many other eminent scientists and statisticians agree with that, as set out below. I am not citing fringe researchers and obscure papers; many key papers cited here have been cited thousands of times, that puts them in the top 0.01% of all published papers.
Tukey (1915-2000), an eminent American statistician (FRS, Princetown), said Fisher’s ideas ‘emanated from the world of infancy, the childhood of experimental statistics’. This chasm-like dichotomy of views relates directly to Bayes theorem.
Fisher was ‘the’ statistical frequentist, and has been elevated (by some) as one of the greatest scientists of the 20th century.
The key papers cited here have thousands of citations: that puts them in the top 0.01% of all published papers
Fisher’s background
Although Fisher (1890-1962) was regarded as a great statistician, he was perceived by many as disputatious and disagreeable character. It is not ad hominem to note that Fisher had egregious personality traits and behaviours which aid in the understanding of his ideas and the single-minded way he pursued his objectives, some of which were deplorable (cf. Armitage).
He left his family in straitened circumstances on their ‘farm’, which he rarely visited, according to McGrayne, although other sources are not as condemning. Nevertheless, the marriage ended in 1943, perhaps due to financial strain, professional pressures, and his personality2. Post divorce, his wife appears to have managed with difficulty, some support from her own family and others, but not from Fisher. Colleagues said he was obsessed with ambition and driven by personal bitterness against many of the people with whom he rubbed shoulders [2], which included his fellow frequentist statisticians. Making enemies was his favourite hobby. Some have expressed the view that he was charming and kind to colleagues; however, these were junior and non-threatening colleagues and this pleasantness did not extend to others that were equal and more threatening to his dominance — this psychological aspect, and his capriciousness, explains the divergent opinions expressed about him, e.g., see Bodmer [3].
He was unable to accept the data, put forward by Sir Austin Bradford Hill and Sir Richard Doll, indicating that smoking caused lung cancer. This illustrates two points, his difficult personality (like some of his acolytes, and his daughter), and his incongruous denial of prior probability; he was blind to the fact that he himself used it (see 9.2.3).
Armitage stated, see Bodmer [3], that at a meeting at the National Institutes of Health in the late 1950s, in which he gave a seminar on smoking and lung cancer…
he was really quite appalling to people of more seniority against whom he had a grievance. He actually said there that Bradford Hill did not deserve to have been made a Fellow of the Royal Society. That seems an incredibly vicious sort of remark to me.
As a measure of how obsessed and unpleasant Fisher and some of his colleagues were — birds of a feather flock together — it may be noted that they sustained a barrage of vitriolic attacks against Hill and Doll, including books and articles, and even accused them of scientific dishonesty — to the extent that they mooted suing him for libel [2].
In relation to the dispute Armitage stated:
The records we have examined suggest that this [Fisher’s ‘accusation’] is a mischievous misrepresentation of the facts
This may be seen as an example of how Fisher created dispute and antipathy amongst many with whom he interacted, which adds further weight to Stolley’s [4] observation that he was at heart ‘a confrontational polemicist’, and a dishonest one at that.
If one considers what the likely explanations connecting smoking with cancer might be, in the light of knowledge of biology, medicine etc., then Fisher’s hypothesis (that genes cause smoking) becomes transparently untenable — when Doll pointed out (in the discussion [3]) that genetic factors could not have accounted for the increased death rate with increasing number of cigarettes smoked, the invalidity of Fisher’s hypothesis became obvious. Indeed, the various pieces of evidence contradicting Fisher’s ‘genes-cause-smoking’ idea are so glaringly obvious that one must conclude Fisher was being disingenuous and tendentious in advancing these arguments.
Disingenuous and tendentious are the two words that best sum up Fisher’s attitudes
The frequentist position of dealing with supposedly ‘pure data’ is not something that exists in reality, nor does it take account of, as above, aspects of the pre- existing knowledge contained in science and biology — such considerations help us to understand what Tukey meant by his criticisms of Fisher.
One might note that his daughter wrote his hagiography, hence the sanitised image that was promulgated about him, because his daughter’s work was for a long time the only biography.
Fisher was even at odds with his senior fellow frequentist, Karl Pearson3. Indeed, it was his well-known antipathy to Pearson (1857-1936) that caused London University to create a new chair, and department, which they called ‘Eugenics’— yes, you read that correctly — to attract him to a professorship. They knew he would refuse to work in the same department as Pearson. Although these two occupied adjacent floors of the same building, neither they, nor their associates, were on civil terms. This extended to the practice for the two groups to take tea in the common room at different times, so they would not cross each other’s paths. Hence the in-house joke, ‘what is the collective noun for a group of statisticians — a quarrel’. Some situation; two separate groups of frequentists at each other’s throats, whilst battling the Bayesians with equal vitriol. Fisher fomented a turbulent and toxic atmosphere in his department and infected many of those around him.
Fisher worked as a consultant to the tobacco industry, and he was a heavy smoker. He actively advocated concerning eugenics and was a member of the London Eugenics Society for 20 years; he was professor and head of the department of eugenics at university college London from 1933-43 [it was not really a eugenics department]; he sired eight children and this relates to his views about eugenics and the necessity for superior specimens of humanity to produce more offspring.
He was editor from 1934-1954, (after Pearson, 1925-1933) of the journal Annals of Eugenics; for many years he was also involved with the Society for Psychical Research4 — that tells one something. Eugenics and psychical research, those are certainly ‘red flags’: I wonder if he went searching for the Cottingley fairies with Sir Arthur Conan Doyle?
One should also note his comments on the statement that UNESCO made on The Nature of Race and Racial Differences, he said:
One fundamental objection … destroys the very spirit of the whole document…. human groups differ profoundly in their innate capacity for intellectual and emotional development.’ …
Although his views, as he may have wished them to have come across, after further reflection, might have been a little more nuanced than the above suggests, that statement is still revealing — the word ‘profoundly’ hardly suggests a balanced and objective scientific assessment. Remember that, as Stolley [4] notes, ‘he was at heart a confrontational polemicist’.
In the end analysis what is clear is that Fisher had ideas which were implausible in real-life, especially in the light of knowledge of biology or pharmacology. Among his many and great achievements were ANOVA and the promulgation of P values. But, as Tukey said his ideas ‘emanated from the world of infancy, the childhood of experimental statistics’, see below.
In ‘When genius errs: RA Fisher and the lung cancer controversy’ Stolley [4] said that he had trouble distinguishing the possible from the probable5. These were dictated by his misconceptions, prejudices, and antipathies.
Fisher’s daughter (his biographer) married a statistician called Box. He soon divorced her — at that time divorce was difficult and represented a more extreme course of action — stating that it was because she had inherited a temper like her father’s!
John Tukey, the Professor at Princeton university (further interesting details are in McGrayne’s book [2]) coined the terms ‘bit’ and ‘software’ and used Bayes theorem extensively; he was deeply involved in secret work during the war years and after, where he used Bayesian techniques to facilitate, inter alia, code-breaking throughout WW2 and the Cold War. Therefore, he was obliged to pretend not to be a Bayesian, because of his use of those techniques in secret government projects — this mirrored what happened in the UK (Alan Turing and Enigma).
Tukey made severe and scathing criticisms of Fisher; he described his frequency- based ideas with these words [2]:
emanating from the world of infancy, the childhood of experimental statistics, the childhood spent in the school of agronomy … almost invariably when closely inspected data are found to violate the standard assumptions required by frequentists. … far better [is] an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. … by and large the great innovations in frequency-based statistics have not had correspondingly great effects on data analysis
NB. These are all highly cited and influential papers, not obscure crank rants. Meehl cited 3,000 times, Lykken 1800, Rozenboom 1,000, Rothman 500, Killeen 600, Levine 200
Meehl stated [5]:
I suggest to you that Sir Ronald [Fisher] has befuddled us, mesmerized us, and led us down the primrose path. I believe that the almost exclusive reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology, I am not making some nit-picking statistician’s correction. I am saying that the whole business is so radically defective as to be scientifically almost pointless.
Levine [6]
Heated debate between Neyman–Pearson and Fisher ensued. Both sides saw their models as superior and incompatible with the other’s approach
Other examples of opinions about Fisher:
Rozeboom [7] [Statistical significance testing] is based upon a fundamental misunderstanding of the nature of rational inference and is seldom if ever appropriate to the aims of scientific research
Rothman [8]; Testing for statistical significance continues today not on its merits as a methodological tool but on the momentum of tradition. Rather than serving as a thinker’s tool, it has become for some a clumsy substitute for thought, subverting what should be a contemplative exercise into an algorithm prone to error
Killeen [9] Our unfortunate historical commitment to significance tests forces us to rephrase good questions in the negative, attempt to reject those nullities, and be left with nothing we can logically say about the questions
Lykken [10]; Statistical significance is perhaps the least important attribute of a good experiment; it is never a sufficient condition for claiming that a theory has been usefully corroborated, that a meaningful empirical fact has been established, or that an experimental report ought to be published
I feel impelled to add this comment to end this list, from the famous nuclear physicist Lord Rutherford:
If your experiment needs statistics, then you should have done a better experiment
And he would have been in agreement with the English psychiatrist William Sargant who said:
We are never going to learn how to treat depression in an MRC statistician’s office
This reflects the conclusion I have reached in this series of commentaries (cf. 9.2.5 on Placebo), which is that an understanding of epistemology indicates that good clinical observation of treatment effects has distinct advantages compared to RCTs because it is less influenced by RTM, and utilises both cause-effect relationships and real-world outcomes. These are non-trivial advantages.
References
- Deaton, A. and N. Cartwright, Reflections on Randomized Control
Trials. Soc Sci Med, 2018. 210: p. 86-90. - McGrayne, S.B., The Theory That Would Not Die: How Bayes’ Rule
Cracked the Enigma Code. 2011: Yale university press. - Bodmer, W., et al., Fisher and Bradford Hill: theory and pragmatism?’.
IJE, 2003. 32(6): p. 945-48. - Stolley, P.D., When genius errs: RA Fisher and the lung cancer controversy. American Journal of Epidemiology, 1991. 133(5): p. 416- 425.
- Meehl, P.E., Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. 1992.
- Levine, T.R., et al., A critical assessment of null hypothesis significance testing in quantitative communication research. Human Communication Research, 2008. 34(2): p. 171-187.
- Rozeboom, W.W., The fallacy of the null-hypothesis significance test. Psychological bulletin, 1960. 57(5): p. 416.
- Rothman, K.J., Causal inference in epidemiology. Modern epidemiology, 1986: p. 7-21.
- Killeen, P.R., An alternative to null-hypothesis significance tests. Psychological science, 2005. 16(5): p. 345-353.
- Lykken, D.T., Statistical significance in psychological research. Psychol Bull, 1968. 70(3): p. 151-9.
View and/or download a properly formatted PDF document below:
