THE CREDENTIALS OF SCIENTIFIC EVIDENCE: REBALANCING THE EPISTEMOLOGICAL SCALES
9.2.5 The placebo effect: meaning or muddle? phenomenon or phantasm?
Abstract
This commentary deconstructs the concept of the ‘placebo effect’, arguing that it is a scientifically imprecise muddle, not a unitary phenomenon. It reviews the foundational literature, starting with Beecher’s 1955 paper, which distinguished subjective symptoms from objective change. It contends that much of what is labelled as the placebo response is actually a statistical artifact called regression to the mean (RTM), and also spontaneous remission, and fluctuating illness courses. Systematic reviews find no significant placebo effects when using objective outcomes, supporting the idea that the concept is largely a phantasm that vanishes when objective endpoints are used. Consequently, reliance on the ‘magic wand’ of placebo control should be modified in favour of other methodologies, rigorous clinical judgment involving objective end points, and an understanding of causal mechanisms.
Introduction
The ‘placebo effect’ (whenever I use this word imagine it is in air quotes) is an imprecisely defined concept. It is invoked as if it were some kind of magical spell — if the magic wand of ‘placebo control’ has been waved over a trial its results are given more credence, but with insufficient justification, or critical analysis.
I have a predilection for tongue-in-cheek papers which poke fun at researchers in an educative manner; I was therefore attracted by the title of this paper [1] ‘How to prove that your therapy is effective, even when it is not: a guideline’. It reminded me of one of my lectures given to the Psychiatry trainees in the UK in the early part of my career, the title was ‘Baked-bean therapy.’ It was, of course, an illustration of how to arrange an RCT to obtain a positive result — oh so easy to do. I had clean forgotten, until I wrote this, how soon in my research career I recognised this problem.
Definitions and analysis
Like many notions in ‘pop psychology’1, placebo rests on uncertain foundations, as a reconsideration of the 1950s foundation paper by Henry Beecher ‘The Powerful Placebo’ [2] reveals. Beecher’s paper, as Kienle opined [3], was beset by:
… poor scholarship, misquotation, uncritical reporting of anecdotes, and inclusion of studies in which no placebo was given’
But what also strikes one forcibly is that Beecher made a clear distinction between subjective and objective changes. Much of what he wrote was about the subjective feelings of pain (he was an anaesthetist). Whilst Kienle had a point when he said Beecher overemphasised the power of the placebo, nevertheless there are useful observations and discussions in his material, and it is noteworthy that his view was that the effect was manifest in subjective symptoms but not objective changes.
The notion of the placebo effect is mired in conceptual and methodological problems
The notion of the placebo effect is explained by several separate processes and phenomena, thereby rendering it meaningless as a unitary concept: these include, regression to the mean (RTM), waxing and waning of the natural course of illness, patient and clinician expectations, non-specific therapeutic effects (illness education, empowerment), and the recruitment of inappropriate patients; these, and other problems, have been discussed by many authors [4-9].
Hróbjartsson’s thorough review [10] suggests three main meanings for the placebo effect:
1 change after a placebo intervention (temporal)
2 the effect of a placebo intervention (supposedly causal)
3 the effect of the patient-provider interaction
It is hard to improve on his well-expressed arguments; accordingly, I quote him extensively:
I will argue that despite common use, the three meanings are either not interesting to placebo research or so vague that no clearly delineated group of interventions can be defined
At least for some researchers, the term has clearly lost any implication of causal relation to a placebo treatment, and it is synonymous with change after placebo intervention, thus implying only a temporal relation
There is a confusion between causal and temporal associations
So, at present there is no commonly accepted definition of placebo, and it has even been argued that placebo cannot be defined in any logical way [11]
Hróbjartsson and Gøtzsche performed a systematic review of 114 randomized trials with a placebo group and a no-treatment group [12, 13] and found no significant effects of placebo with binary or objective outcomes.
Hróbjartsson concludes:
Thus, there is no evidence of a general and clinically important effect of placebo interventions. … Generally, the conceptual and methodological confusion in the field of placebo is of such a magnitude that references to placebo effects are incomprehensible without further clarification. It might be time to stop using the term placebo effect and instead specify which kind of intervention one is referring to, and how its effect was measured.
That agrees with the view that the more objective the outcome, and the longer the follow-up, the less apparent effect there is from the placebo effect.
For substantial useful clinical effects, assessed by objective outcomes of meaningful clinical end points, the concept of the placebo response is irrelevant
Tens of thousands of patients have been treated with placebo when the validity and meaning of that concept is dubious — that raises serious ethical and methodological concerns. The fact that such a large proportion of studies are funded by pharmaceutical companies is a factor which has inhibited the development of other trial methodologies and instruments of assessment (cf. criticism of the HDRS). That is regrettable. The regulatory agencies such as the EMA and the FDA have also played a part in failing to encourage and support the development of non-RCT methodologies and methods of assessment.
Hróbjartsson and Gøtzsche reviewed 114 RCTs and found no significant effects of placebo with binary or objective outcomes
Regression to the mean (RTM)
In clinical scenarios like the treatment of heterogenous syndromes including depression, and which involve small changes in subjective outcome assessments, regression to the mean2 is arguably the major factor implicated in apparent placebo improvements [14, 15].
The statistical term regression to the mean (RTM) describes the observation that the further an initial measurement deviates from the mean (average) for the group concerned, the more likely it is to move in the direction of the mean when subsequent measurements are carried out.
A cogent criticism on Beecher from a statistical standpoint has been made by McDonald [15]:
[Beecher] reported that 35.2 per cent of the pooled population of patients from all these studies improved after placebo therapy. This is neither the average improvement per patient nor the average direction of change. The percentage reported is the number of patients who improved divided by the number treated, and contains no information about patients who worsened under placebo treatment. … The magnitude of improvement and the number of patients who worsened under placebo therapy were not reported in many of the papers of that era
Hengartner [16] specifically examined the effect in relation to acute depression studies, and assessed if there was a genuine placebo effect, by reassessing RTM and spontaneous remission:
Nevertheless, it follows that the placebo effect in antidepressant trials is largely (though not entirely) a methodological artefact, and that the symptom reduction seen in placebo recipients is mostly due to both regression to the mean, and spontaneous remission.
Hengartner concludes:
In order to reduce bias due to regression to the mean and to arrive at efficacy estimates that are more likely to reflect true treatment effects, researchers should use hard (objective) real-world outcomes on which the sample was initially not selected for, such as, for instance, employment rates or receipt of disability benefits.
The ‘placebo effect’ in depression studies is mostly due to RTM and spontaneous remission
Trials of ADs maximise RTM for three reasons; the initial sample is selected for extreme values, the same values are assessed repeatedly over time, and measurement reliability is low. Over 30 years (~1980-2010) treatment/placebo differences have decreased by ~3 HAMD points [17] and were higher in trials that lasted 6 weeks or less. All these tend to maximise RTM especially in illnesses that show short-term temporal variation.
RTM also applies in reverse; viz., as McDonald’s serial measurements of biochemical parameters from his study demonstrates, thereby also proving the lack of a cause-effect relationship to ‘intervention’. For details, see McDonald’s explanation concerning serial biochemical measurements ([15] p421-22).
Furthermore, studies which have a ‘no-treatment’ arm show the same separation from active treatment as studies with a placebo arm, thereby further reinforcing that the explanation is RTM [18].
More pop psychology
Another poorly defined ‘pop psychology’ concept has been invoked and conjoined; that is the meaning-deficient concept called the Hawthorne effect, which has been relegated as a myth [19, 20] — it is analogous to invoking
telepathy to explain ghosts and an example of the reproducibility crisis that has beset psychology for decades — remember the mantra, if it is not replicated, it is not (yet) science [21-23].
There has been little penetrating research about placebos3. Particular and crucial aspects concerning the notion of the placebo response have received insufficient attention or recognition: first, is there a magnitude difference between objective and subjective measures of outcome; second, is it an enduring response? There is convincing evidence that effect is larger for subjective symptoms than objectively measured outcomes [3, 24, 25]; this was further strongly substantiated by a meta-analysis of 114 studies by Hrobjartsson [12, 13].
There is a profound implication resulting from the conclusion that RTM is the main explanation for the notion of placebo response. It is this: how homogenous is the condition being considered and how much do we know about its natural history. If the natural history of the condition is for chronicity without substantial short- term remission, then RTM is of little consequence. However, if the natural history is waxing and waning over shorter periods of time, then RTM is a major determinant of (apparent) change. This problem is compounded by the use of subjective rating scales as outcome measures, because changes in the subjective symptoms do not reflect accurately changes in the underlying illness pathology. This dimension and consideration have received insufficient attention4.
As Marder explains [26], active drugs have not separated from placebos in phase three trials, even though the effect size in the drug group has been the same as in phase 2 trials. This is because in phase three trials the placebo response has become greater. He notes that factors influencing this are, less experienced sites, and less experienced clinical raters, increases in population heterogeneity, and inclusion of sites contributing few subjects.
Professional patients in CRO trials
The placebo effect is further distorted by the fact that there are undoubtedly patients in trials who have been randomized to the ‘drug’ group, but who have no drug in their system — a factor that is infrequently measured or accounted for. This is especially concerning now that so many trials are managed by contract research organisations (CROs) and populated by ‘professional patients’, some of whom are frauds simulating illness [27] to receive the remuneration offered5. Registries have even been set up to try to identify and exclude ‘fraudulent’ patients [26].
RTM and AD trials over decades
More severe illnesses of greater duration that lead to hospital admission, as are more frequently represented in the old trials of TCAs, show a lesser degree of short-term temporal variation, and a lesser short- to medium-term RTM tendency — one assumes that illnesses showing persistent and severe symptoms are more likely to be admitted to hospital (cf. Marder [26]); this group will therefore exhibit a lesser RTM tendency. When an effective treatment is tested in severe depression it will separate from the control group (be that placebo, or a less effective drug) more decisively because RTM will be minimal.
Milder illnesses are more likely to present for ‘non-hospital’ treatment at a peak of symptom severity and therefore are likely to exhibit a greater tendency to RTM, especially over the timeframe of a typical clinical trial, thereby showing smaller differences to placebo — because the dominant mechanism in both is RTM.
This analysis also indicates the importance of the previous history of illness pattern, because failing to control for that invites difficulties (cf. McDonald), especially resulting from Simpsons paradox — few RCTs are able to control for that.
Resultant considerations
The notion of placebo is an ill-defined concept that obscures as much as it illuminates — this analysis indicates it frequently misleads. The more objective outcome measures are, and the longer the duration of the intervention, and the greater the intervention effect, the less placebo is of relevance. The corollary of that is P values — the less significant is the P-value, the more problematic is the placebo6 effect for interpreting small differences (cf. Bayes, because the lower the likelihood is of the proposition being true before the trial, the less weight a typical P value adds to the conclusion).
So-called placebo effects are only relevant for subjective symptoms showing minor changes in severity that are clinically insignificant
As far as the ‘hard’ long-term outcomes of serious illnesses are concerned, the placebo response resembles the morning mist. It is a prime example of an ill- defined concept misused over decades and it has had extensive negative consequences on clinical trial methodologies and results [26]. Worse still, it has encouraged much follow-up research of inconsequential results — it has thus caused a great waste of research money and effort; researchers have spent much time chasing mirages.
One of the definitions of placebo that Hróbjartsson identified is ‘change after a placebo intervention (temporal).’ This alerts us to a critical consideration which is that of cause-effect relationships and a theory of mechanism. Science dictates that there must be a specific mechanism for a ‘real’ effect, that being so, the time interval between the intervention and the outcome must be governed by that mechanism and therefore occur only within a defined and pre-specified time interval. A change which is spread ‘randomly’ over a longer time, like a placebo response (or spontaneous remission), is difficult to explain via any specific mechanism — that fact alone7 should cast doubt on the validity of placebo response. As Hengartner states, ‘there is a confusion between causal and temporal associations’, which from an epistemological viewpoint is a serious mistake. The crucial concept of cause-effect relationships and mechanisms, in relation to the timing of the measured response, is further elaborated in another part of this series of commentaries (cf. Pearl 9.2.7).
A concluding observation is that the above considerations give further substantive foundation to the proposition that clinical experience and judgement should be given a higher priority in comparison to RCTs. Two key points from Hengartner’s paper are that there should be serial assessments prior to initiation of treatment (to minimise the selection of transient illnesses and placebo responders, i.e. RTM) and that the outcome measure should be independent of the variables used to select the sample. Also, that the outcome measure should be an objective end point, not subjective surrogate measures.
Usual specialist clinic practice generally conforms to these dictates much more closely than do RCTs, because when patients come to specialists, they are more likely to have been consistently ill for longer, to have failed previous treatment, and be more severely ill. Also, clinicians will be more likely to judge outcome not by rating scales, but by using objective endpoints, like patients leaving hospital, getting back to work, or functioning in everyday life (work, social, and leisure)8.
The above analysis therefore constitutes a strong argument that clinical judgement and clinical experience have sounder epistemological foundations than conclusions drawn from RCTs: therefore, clinicians should trust their clinical judgement more and not base their conclusions and treatments so much on the results of RCTs.
References
- Cuijpers, P. and I. Cristea, How to prove that your therapy is effective, even when it is not: a guideline. Epidemiology and psychiatric sciences, 2016. 25(5): p. 428-435.
- Beecher, H.K., The powerful placebo. Journal of the American Medical Association, 1955. 159(17): p. 1602-1606.
- Kienle, G. and H. Kiene, The placebo effect: a scientific critique. Complementary Therapies in Medicine, 1998. 6(1): p. 14-24.
- Freeman, M.P., et al., Guarding the Gate: Remote Structured Assessments to Enhance Enrollment Precision in Depression Trials. J Clin Psychopharmacol, 2017. 37(2): p. 176-181.
- Benedetti, F., Placebo effects: from the neurobiological paradigm to translational implications. Neuron, 2014. 84(3): p. 623-37.
- Moerman, D.E. and W.B. Jonas, Deconstructing the placebo effect and finding the meaning response. Annals of Internal medicine, 2002. 136(6): p. 471- 476.
- Stahl, S.M. and G.D. Greenberg, Placebo response rate is ruining drug development in psychiatry: why is this happening and what can we do about it? Acta Psychiatr Scand, 2019. 139(2): p. 105-107.
- Meissner, K., H. Distel, and U. Mitzdorf, Evidence for placebo effects on physical but not on biochemical outcome parameters: a review of clinical trials. BMC Med, 2007. 5: p. 3.
- Fava, M., et al., The problem of the placebo response in clinical trials for psychiatric disorders: culprits, possible remedies, and a novel study design approach. Psychother Psychosom, 2003. 72(3): p. 115-27.
- Hróbjartsson, A., What are the main methodological problems in the estimation of placebo effects? Journal of Clinical Epidemiology, 2002. 55(5): p. 430- 435.
- Gotzsche, P.C., Is there logic in the placebo? [see comments]. Lancet, 1994. 344(8927): p. 925-6.
- Hrobjartsson, A. and P.C. Gotzsche, Is the placebo powerless? Update of a systematic review with 52 new randomized trials comparing placebo with no treatment. J Intern Med, 2004. 256(2): p. 91-100
- Hrobjartsson, A. and P.C. Gotzsche, Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med, 2001. 344(21): p. 1594-602.
- Morton,V.andD.J.Torgerson,Effectofregressiontothemeanondecision making in health care. Bmj, 2003. 326(7398): p. 1083-4.
- McDonald, C.J., S.A. Mazzuca, and G.P. McCabe, Jr., How much of the placebo ‘effect’ is really statistical regression? Stat Med, 1983. 2(4): p. 417-27.
- Hengartner, M.P., Is there a genuine placebo effect in acute depression treatments? A reassessment of regression to the mean and spontaneous remission. BMJ Evidence-Based Medicine, 2020. 25(2): p. 46-48.
- Khan, A., et al., Why has the antidepressant-placebo difference in antidepressant clinical trials diminished over the past three decades? CNS Neurosci Ther, 2010. 16(4): p. 217-26.
- Jones, B.D.M., et al., Magnitude of the Placebo Response Across Treatment Modalities Used for Treatment-Resistant Depression in Adults. JAMA Network Open, 2021. 4(9): p. e2125531.
- McCambridge, J., J. Witton, and D.R. Elbourne, Systematic review of the Hawthorne effect: new concepts are needed to study research participation effects. Journal of clinical epidemiology, 2014. 67(3): p. 267-277.
- Kompier, M.A., The” Hawthorne effect” is a myth, but what keeps the story going? Scandinavian journal of work, environment & health, 2006: p. 402-412.
- Leppink, J. and P. Pérez-Fuster, What is science without replication? Perspect Med Educ, 2016. 5(6): p. 320-322.
- Picho, K., L.A. Maggio, and A.R. Artino, Jr., Science: the slow march of accumulating evidence. Perspect Med Educ, 2016. 5(6): p. 350-353.
- Ioannidis, J.P.A., The Reproducibility Wars: Successful, Unsuccessful, Uninterpretable, Exact, Conceptual, Triangulated, Contested Replication. Clin Chem, 2017. 63(5): p. 943-945.
- Coton, J., et al., Do patients with cystic fibrosis participating in clinical trials demonstrate placebo response? A meta-analysis. Journal of Cystic Fibrosis, 2019.
- Kelley, J.M., et al., Mirror, mirror on the wall: placebo effects that exist only in the eye of the beholder. J Eval Clin Pract, 2009. 15(2): p. 292-8.
- Marder, S.R., T. Laughren, and S.J. Romano, Why Are Innovative Drugs Failing in Phase III? Am J Psychiatry, 2017. 174(9): p. 829-831.
- Rand, L.Z., et al., Impostor Syndrome: Fraudulent Participants in Qualitative Research Can Skew Results. J Law Med Ethics, 2025: p. 1-6.
View and/or download a properly formatted PDF document below:
