Randomized Controlled Trials

Script / Documentation


We hold these truths to be self-evident that … all studies are not created equal. That’s why Just Facts Academy gives you the tools to sort out the crummy studies from the solid ones.

In today’s lesson, we examine the most reliable type of study, see what makes them great, and learn how to spot when these “gold standard” studies go off the rails. Hang on.

Randomized controlled trials, or RCTs, are studies in which people are randomly assigned to receive or not receive a certain treatment.[1] [2] This can mean anything from a drug[3] or welfare benefit[4] [5] to a face mask[6] or educational opportunity.[7] [8] [9]

The defining feature of RCTs is that they control for every factor but the treatment being studied. This is a natural outcome of randomly giving the treatment to some people and not others, ensuring that any major differences between the groups are due to the treatment and not some other variable.[10] [11] [12]

Done properly, RCTs are the “gold standard” for research because they provide “a rigorous tool to examine cause–effect,” which “is not possible with any other study design.”[13] [14]

That’s why, as a medical textbook explains, “a well-designed” RCT “overcomes the major weaknesses of all other types of study designs….”[15]

Take a special note of that phrase: “well-designed.”

First, a well-designed RCT begins with a pre-analysis plan, where researchers disclose what they plan to measure before they measure it. Pre-analysis plans prevent biased or dishonest researchers from changing the goalposts after results begin to pour in.

Per the journal Epidemiological Reviews, credible RCTs are designed with “endpoints and case definitions … clearly laid out in advance … to avoid what is sometimes termed ‘data dredging,’ or looking for those outcomes for which significant differences would be found.”[16] [17]

Surely no one would deviate from their pre-analysis plan, would they?

Well, the pre-analysis plan for the famed Bangladesh mask study states that it will measure “hospitalizations and mortality,”[18] but these measures are completely absent from the study’s results.[19] [20]

This is a flagrant breach of research ethics,[21] [22] [23] and it obscures the only data that can objectively prove whether masks save or cost lives on net.[24]

Another attribute of a well-designed RCT is the use of placebos, or fake treatments.

The mind is a powerful thing, so researchers often give some subjects of their study a placebo so they don’t alter their mindsets or behaviors as a result of knowing they received a treatment.[25]

In some studies, even researchers don’t know who receives the treatment so they don’t respond to the subjects differently. This is called a double-blind study.[26] [27]

However, placebos aren’t always possible. For example, real-life studies of face masks cannot be blinded because subjects can clearly tell whether or not they’re wearing a mask.[28]


Besides pre-analysis plans and placebos, here are five more things to be on the lookout for when analyzing RCTs.

1. Data Manipulation

If an RCT is well-designed and honestly presented, there’s no need for complex analytical strategies like a “generalized linear model with a Poisson family and log-link function.”[29] [30] [31] Huh? Exactly.

As the Journal of the Royal College of Physicians of London warns, statistical “malpractice typically occurs when complex analytical techniques are combined with large data sets. … Indeed, as a general rule, the better the science, the less the need for complex analysis….”[32] [33] [34] [35] [36]

 

2. Attrition

Beware of RCTs where people drop out. If a drug takes one month to be fully effective, but the initial side effects cause people to drop out before the first month is over and data is collected, then the results from the remaining subjects do not accurately measure the drug’s benefits and harms.[37] [38]

If you think no one would ever test a drug in that way, think again.[39] [40]

3. Intent-to-Treat

Real RCTs include “all of the people taking part in a trial … regardless of whether or not they … fully adhered to the treatment or switched to an alternative treatment.” This is how you measure “actual practice” in the real world instead of utopia.[41] [42] [43]

The moment a researcher departs from an intent-to-treat analysis, the results are no longer random, which is the linchpin that makes RCTs reliable.[44] Yet, some researchers repeatedly try to get away with this.[45]

This kind of trickery can fool people who don’t understand the importance of intent-to treat, but not you.

4. Meaningful Outcomes

Make sure the RCT directly measures results that truly matter, not surrogates or proxies for them. For more details, watch our video on Overgeneralization (2:37 time marker).

5. Margins of Error

Money, time, and circumstances often limit the number of people in studies, and if the number is too small, the apparent results can merely be due to chance. We’ll show you how to sort this out in an upcoming video about margins of error.

One last thing. If RCTs are so great, why not use them all the time?

Well, first of all, “randomized controlled trials are sometimes not feasible when ethical considerations, costs, resources, or time, prove prohibitive.”[46] [47]

And of course, some matters, like global warming, can’t be studied with RCTs unless we had 1,000 earths to randomize to different treatments.

Therefore, sometimes we need to analyze other types of studies that aren’t as reliable as RCTs. We’ll show you how to do that in upcoming videos.

In the meantime, keep it locked to Just Facts Academy, so you can research like a genius.


Footnotes

[1] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. <www.ncbi.nlm.nih.gov>

In RCTs the patients are randomly assigned to the different study groups. This is intended to ensure that all potential confounding factors are divided equally among the groups that will later be compared (structural equivalence). These factors are characteristics that may affect the patients’ response to treatment, e.g., weight, age, and sex. Only if the groups are structurally equivalent can any differences in the results be attributed to a treatment effect rather than the influence of confounders.

[2] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Experimental studies differ from observational studies in that the former expose patients to a treatment being tested. Many experimental trials involve randomization of patients to the treatment group or appropriate control group. Although randomization ensures that known factors are evenly distributed between the exposure and control groups, the importance of RCTs lies in the even distribution of unknown factors. Thus, a well-designed RCT will result in more simplified endpoint analyses because complex statistical models are not necessary to control for confounding factors.

[3] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. <www.ncbi.nlm.nih.gov>

“In clinical research, randomized controlled trials (RCTs) are the best way to study the safety and efficacy of new treatments.”

[4] Paper: “Medicaid Increases Emergency-Department Use: Evidence from Oregon’s Health Insurance Experiment.” By Amy N. Finkelstein, Sarah L. Taubman, and others. Science, January 2, 2014. Pages 263–268. <www.science.org>

Page 263:

In 2008, Oregon initiated a limited expansion of a Medicaid program for uninsured, low-income adults, drawing names from a waiting list by lottery. This lottery created a rare opportunity to study the effects of Medicaid coverage using a randomized controlled design. Using the randomization provided by the lottery and emergency-department records from Portland-area hospitals, we study the emergency-department use of about 25,000 lottery participants over approximately 18 months after the lottery. We find that Medicaid coverage significantly increases overall emergency use by 0.41 visits per person, or 40 percent relative to an average of 1.02 visits per person in the control group. We find increases in emergency-department visits across a broad range of types of visits, conditions, and subgroups, including increases in visits for conditions that may be most readily treatable in primary care settings. …

[5] Paper: “Some Interim Results From a Controlled Trial of Cost Sharing.” By Joseph P. Newhouse and others. Rand, January 1982. <www.rand.org>

Page v:

A total of 7706 participants in six cities have taken part in a controlled experiment related to cost sharing in health insurance policies. …

The families were assigned in an unbiased manner to insurance plans that covered a broad range of medical services but varied the coinsurance rate, i.e., the fraction of its medical bills that the family must pay. This out-of-pocket expenditure was subject to an upper limit of $1000 per year or 5, 10, or 15% of income, whichever was less. …

Expenditure per person responds to variation in cost sharing. It is about 50 percent greater in the plan with no cost sharing [100% coverage] than in the one with 95-percent coinsurance [5% coverage] up to a maximum of $1000 in any one year. …

As cost sharing declines, the percentage of individuals seeking care rises, as does the number of ambulatory [outpatient] visits per user. The number of adults hospitalized increases, but the number of children hospitalized shows no systematic relationship to plan. Cost per person hospitalized does not appear to be related to plan.

Pages 339–340:

The reduced service use under the cost-sharing plans had little or no net adverse effect on health for the average person (Chapters 6 and 7). Indeed, restricted activity days fell with more cost sharing.

Health among the sick poor—approximately the most disadvantaged 6 percent of population—was adversely affected, however. In particular, the poor who began the Experiment with elevated blood pressure had their blood pressure lowered more on the free plan than on the cost-sharing plans. The effect on predicted mortality rates—a fall of about 10 percent—was substantial for this group. In addition, free care marginally improved both near and far corrected vision, primarily among the poor, and increased the likelihood that a decayed tooth would be filled—an effect found disproportionately among the less well educated. Health of gums was marginally better for those with free care. And serious symptoms were less prevalent on the free plan, especially for those who began the experiment poor and with serious symptoms. Finally, there appeared to be a beneficial effect on anemia for poor children. Although sample sizes made it impossible to detect any beneficial effects that free care might have had on relatively rare conditions, it is highly improbable that there were beneficial effects (one standard error of the mean changes) that we failed to detect in the physiologic measures of health taken as a group. Moreover, the confidence intervals are tight enough to rule out any beneficial effect of free care on the General Health Index, our best summary measure of health.

[6] Paper: “Effectiveness of N95 Respirators Versus Surgical Masks Against Influenza: A Systematic Review and Meta-Analysis.” By Youlin Long and others. Journal of Evidence-Based Medicine, March 13, 2020. <onlinelibrary.wiley.com>

We aimed to assess the effectiveness of N95 respirators versus surgical masks for prevention of influenza by collecting randomized controlled trials (RCTs). …

A total of six RCTs involving 9,171 participants were included. There were no statistically significant differences in preventing laboratory-confirmed influenza (RR = 1.09, 95% CI 0.92-1.28, P > .05), laboratory-confirmed respiratory viral infections (RR = 0.89, 95% CI 0.70-1.11), laboratory-confirmed respiratory infection (RR = 0.74, 95% CI 0.42-1.29) and influenza-like illness (RR = 0.61, 95% CI 0.33-1.14) using N95 respirators and surgical masks. Meta-analysis indicated a protective effect of N95 respirators against laboratory-confirmed bacterial colonization (RR = 0.58, 95% CI 0.43-0.78).

[7] Paper: “A Reanalysis of the High/Scope Perry Preschool Program.” By James Heckman and others. University of Chicago, January 22, 2010. <www.researchgate.net>

Page 2:

The High/Scope Perry Preschool program, conducted in the 1960s, was an early childhood intervention that provided preschool to low-IQ, disadvantaged African-American children living in Ypsilanti, Michigan, a town near Detroit. … The beneficial long-term effects reported for the Perry program constitute a cornerstone of the argument for early intervention efforts throughout the world.

Page 2: “The study was evaluated by the method of random assignment.”

[8] Book: The Education Gap: Vouchers and Urban Schools (Revised edition). By William G. Howell and Paul E. Peterson with Patrick J. Wolf and David E. Campbell. Brookings Institution Press, 2006 (first published in 2002). <www.brookings.edu>

Page 39:

To conduct an experiment in the social sciences that nonetheless approximates the natural-science ideal, scientists have come up with the idea of random assignment—drawing names out of a hat (or, today, by computer) and putting subjects into a treatment or control group. When individuals are assigned randomly to one of two categories, one can assume that the two groups do not differ from each another systematically, except in the one respect under investigation.

[9] Paper: “Private School Vouchers and Student Achievement: An Evaluation of the Milwaukee Parental Choice Program.” By Cecilia Elena Rouse. Quarterly Journal of Economics, May, 1998. Pages 553–602. <eml.berkeley.edu>

Page 554:

Ideally, the issue of the relative effectiveness of private versus public schooling could be addressed by a social experiment in which children in a well-defined universe were randomly assigned to a private school (the “treatment group”), while others were assigned to attend public schools (the “control group”). After some period of time, one could compare outcomes, such as test scores, high school graduation rates, or labor market success between the treatment and control groups. Since, on average, the only differences between the groups would be their initial assignment—which was randomly determined—any differences in outcomes could be attributed to the type of school attended.

[10] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. <www.ncbi.nlm.nih.gov>

In RCTs the patients are randomly assigned to the different study groups. This is intended to ensure that all potential confounding factors are divided equally among the groups that will later be compared (structural equivalence). These factors are characteristics that may affect the patients’ response to treatment, e.g., weight, age, and sex. Only if the groups are structurally equivalent can any differences in the results be attributed to a treatment effect rather than the influence of confounders.

[11] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 1: “What Is Multiple Regression?” <us.sagepub.com>

Page 20:

Randomization controls for all characteristics of the experimental subjects, regardless of whether those characteristics can be measured. Thus, with randomization there’s no need to worry about whether those in the treatment group are smarter, more popular, more achievement oriented, or more alienated than those in the control group (assuming, of course, that there are enough subjects in the experiment to allow randomization to do its job effectively).

[12] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Experimental studies differ from observational studies in that the former expose patients to a treatment being tested. Many experimental trials involve randomization of patients to the treatment group or appropriate control group. Although randomization ensures that known factors are evenly distributed between the exposure and control groups, the importance of RCTs lies in the even distribution of unknown factors.

[13] Paper: “Randomised Controlled Trials – The Gold Standard for Effectiveness Research.” By Eduardo Hariton & Joseph J. Locascio. BJOG (British Journal of Obstetrics & Gynecology), June 19, 2018. <obgyn.onlinelibrary.wiley.com>

Randomised controlled trials (RCTs) are the reference standard for studying causal relationships between interventions and outcomes as randomisation eliminates much of the bias inherent with other study designs. …

RCTs are prospective studies that measure the effectiveness of interventions. Although no study is likely on its own to prove causality, randomisation reduces bias and provides a rigorous tool to examine cause–effect relationships between an intervention and outcome. This is because the act of randomisation in a large study balances participant characteristics (both observed and unobserved) between the groups, allowing attribution of any differences in outcome to the intervention. This is not possible with any other study design, so RCTs are considered the reference standard for driving practice….

[14] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. <www.ncbi.nlm.nih.gov>

In clinical research, randomized controlled trials (RCTs) are the best way to study the safety and efficacy of new treatments. RCTs are used to answer patient-related questions and are required by governmental regulatory bodies as the basis for approval decisions. …

In clinical research, randomized controlled trials are the gold standard for demonstrating the efficacy and safety of a new treatment.

[15] Textbook: Principles and Practice of Clinical Research. By John I. Gallin and ‎Frederick P Ognibene. Academic Press, 2012.

Page 226:

The strength of the well-designed clinical trial is its ability to establish causality. In this way the clinical trial overcomes the major weaknesses of all other types of study designs, although randomized controlled trials are sometimes not feasible when ethical considerations, costs, resources, or time, prove prohibitive.

[16] Paper: “Francis Field Trial of Inactivated Poliomyelitis Vaccine: Background and Lessons for Today.” By Arnold S. Monto. Epidemiological Reviews, March 1, 1999. <academic.oup.com>

Selection of Endpoints

In designing the protocol of any clinical trial conducted today, there would be a requirement that the endpoints and case definitions be clearly laid out in advance. In fact, regulatory authorities hold the investigators to these predetermined endpoints to avoid what is sometimes termed “data dredging,” or looking for those outcomes for which significant differences would be found.

[17] Article: “A Pre-Analysis Plan Checklist.” By David McKenzie. World Bank, October 28, 2012. <blogs.worldbank.org>

There are several goals in specifying an analysis plans, but one important reason is to avoid many of the issues associated with data mining and specification searching by setting out in advance exactly the specifications that will be run and with which variables. This is particularly important for interventions which have a whole range of possible different outcomes, like the CDD [Community-Driven Development] programs looked at by Casey et al. They look at 334 different outcomes, and illustrate that they could have picked 7 outcomes that made their program look like it strengthened institutions, or alternatively have picked 6 alternate outcomes that make the program look like it weakened institutions. This is less of an issue in evaluating many other policies in which there are one or two most important key outcome (e.g. profits and sales for a firm intervention, attendance and test scores for a school intervention, or incidence of some disease for some health interventions). But even in those cases there are often many different possible choices of how to measure the key outcome, so some ex-ante discipline on how this outcome is defined can be useful.

[18] Pre-analysis plan: “Can Face Masks Reduce Transmission of SARS-CoV-2 in Bangladesh? A Cluster Randomized Controlled Trial.” By Laura H. Kwong and others, December 2020. <osf.io>

Page 9:

We will run both the intent-to-treat and IV regression for a number of auxiliary outcomes in addition to the primary outcome of symptomatic seropositivity.

These auxiliary outcomes include …

• Hospitalizations and mortality (in both the village-level and individual-level experiments)

[19] Paper: “Impact of Community Masking on COVID-19: A Cluster-Randomized Trial in Bangladesh.” By Jason Abaluck and others. Science, December 2, 2021. <www.science.org>

[20] Article: “Famed Bangladesh Mask Study Excluded Crucial Data.” By James D. Agresti. Just Facts, April 8, 2022. <www.justfactsdaily.com>

Because the lead and final authors of a clinical study are most responsible for it, Just Facts asked Yale economics professors Jason Abaluck and Ahmed Mushfiq Mobarak why they flouted their pre-analysis plan to measure deaths. As documented in this full record of the email exchange, they gave counterfactual answers and then failed to reply after they painted themselves in a corner.

In a key part of the exchange, Abaluck claimed:

Collecting mortality data would have required us to revisit every household at endline in order to survey them (we only collected blood from the small subset of households symptomatic during our study period). Given the nationwide lockdown that went into effect, another round of revisits would have been prohibitively expensive and complicated, and we prioritized the other outcome variables where we had much better hope of being statistically powered.

Directly quoting the authors’ working paper, Just Facts asked:

Given that your team was “able to collect follow-up symptom data” from “98%” of the individuals in the study, why would they need to “revisit every household at endline to survey them”?

Likewise, the working paper reveals that their study “surveyed all reachable participants about Covid-related symptoms” and then used the data to calculate that masks reduce the risk of “Covid-like symptoms.”

During the very same surveys, they could have easily asked the participants if anyone in their household died. In fact, the authors may have done that, because they wouldn’t answer these questions:

Did you collect mortality data during any part of the study before the endline? If so, would you share it?

Just Facts asked those straightforward questions twice, but the authors did not reply.

[21] Paper: “Francis Field Trial of Inactivated Poliomyelitis Vaccine: Background and Lessons for Today.” By Arnold S. Monto. Epidemiological Reviews, March 1, 1999. <academic.oup.com>

Selection of Endpoints

In designing the protocol of any clinical trial conducted today, there would be a requirement that the endpoints and case definitions be clearly laid out in advance. In fact, regulatory authorities hold the investigators to these predetermined endpoints to avoid what is sometimes termed “data dredging,” or looking for those outcomes for which significant differences would be found.

[22] Article: “A Pre-Analysis Plan Checklist.” By David McKenzie. World Bank, October 28, 2012. <blogs.worldbank.org>

There are several goals in specifying an analysis plans, but one important reason is to avoid many of the issues associated with data mining and specification searching by setting out in advance exactly the specifications that will be run and with which variables. This is particularly important for interventions which have a whole range of possible different outcomes, like the CDD [Community-Driven Development] programs looked at by Casey et al. They look at 334 different outcomes, and illustrate that they could have picked 7 outcomes that made their program look like it strengthened institutions, or alternatively have picked 6 alternate outcomes that make the program look like it weakened institutions. This is less of an issue in evaluating many other policies in which there are one or two most important key outcome (e.g. profits and sales for a firm intervention, attendance and test scores for a school intervention, or incidence of some disease for some health interventions). But even in those cases there are often many different possible choices of how to measure the key outcome, so some ex-ante discipline on how this outcome is defined can be useful.

[23] The Handbook of Social Research Ethics. Edited by Donna M. Mertens and Pauline E. Ginsberg. Sage, 2009.

Chapter 24: “Use and Misuse of Quantitative Methods: Data Collection, Calculation, and Presentation.” By Bruce L. Brown and Dawson Hedges. Pages 373–386.

Page 384:

Science is only as good as the collection, presentation, and interpretation of its data. The philosopher of science Karl Popper argues that scientific theories must be testable and precise enough to be capable of falsification (Popper, 1959). To be so, science, including social science, must be essentially a public endeavor, in which all findings should be published and exposed to scrutiny by the entire scientific community. Consistent with this view, any errors, scientific or otherwise, in the collection, analysis, and presentation of data potentially hinder the self-correcting nature of science, reducing science to a biased game of ideological and corporate hide-and-seek.

… Any hindrance to the collection, analysis, or publication of data, such as inaccessible findings from refusal to share data or not publishing a study, should also be corrected for science to fully function.

[24] Article: “Famed Bangladesh Mask Study Excluded Crucial Data.” By James D. Agresti. Just Facts, April 8, 2022. <www.justfactsdaily.com>

Import of the Death Data

To accurately measure the impact of masking or any other medical intervention on death, one has to measure actual deaths—not some other variable. This is because measuring whether masks prevent C-19 infections, as the Bangladesh study does, doesn’t measure how many people died from C-19 or any of the lethal risks of masks identified in medical journals, such as:

cardio-pulmonary events.

elevated CO2 inhalation, which can impair high-level brain functions and lead to fatal mistakes.

social isolation, which can lead to drug abuse and suicide.

heat, humidity, and other discomforts of wearing a mask, which can cause increased error rates and response times in situations where mental sharpness is vital to safety.

Only RCTs that measure deaths can capture the net effects of all such factors. That’s why medical journals call “all-cause mortality” in RCTs:

• “the most objective outcome” (Journal of Critical Care)

• “the most relevant outcome” (The Lancet Respiratory Medicine)

• “the most significant outcome” (JAMA Internal Medicine)

• “the most important outcome” (PLoS Medicine)

• “the most important outcome” (Journal of the National Medical Association)

• “the most important outcome” (International Journal of Cardiology)

• “a hard and important end point” (JAMA Internal Medicine)

Unlike other data which can be easily manipulated through statistical tampering, all-cause mortality in RCTs is straightforward and solid. If an RCT is large enough and properly conducted, a simple tally of all deaths among people who receive and don’t receive a treatment proves whether the treatment saves more lives than it takes. This gets more complicated for cluster RCTs, but it is still a clear-cut process.

[25] Paper: “Blinding: Who, What, When, Why, How?” By Paul J. Karanicolas, Forough Farrokhyar, and Mohit Bhandari. Canadian Journal of Surgery, October 2010. <www.ncbi.nlm.nih.gov>

Blinding refers to the concealment of group allocation from one or more individuals involved in a clinical research study, most commonly a randomized controlled trial (RCT). Although randomization minimizes differences between treatment groups at the outset of the trial, it does nothing to prevent differential treatment of the groups later in the trial or the differential assessment of outcomes, either of which may result in biased estimates of treatment effects. The optimal strategy to minimize the likelihood of differential treatment or assessments of outcomes is to blind as many individuals as possible in a trial.

Randomized controlled trials of surgical interventions are frequently more difficult to blind than RCTs of medications, which typically achieve blinding with placebos. However, imaginative techniques may make blinding more feasible in surgical trials than is commonly believed by many researchers. In this article we discuss the importance of blinding and provide practical suggestions to researchers who wish to incorporate blinding into their surgical studies.

[26] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. <www.ncbi.nlm.nih.gov>

In a double-blind study neither patient nor study physician knows to which treatment the patient has been assigned. Double-blind studies are advantageous if knowledge of the treatment might influence the course and therefore the results of the study. Thus it is particularly important that the study physician is blinded to treatment if the endpoints are subjective. Blinding of patients to their treatment is important, for example, if their attitude could potentially affect their reliability in taking the test medication (compliance) or even their response to treatment.

[27] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14:

Double-blinded trials are conducted so that both clinicians and patients are unaware of the treatment assignment. Often, a separate research group is responsible for the randomization allocation and has minimal or no contact with the clinicians and patients.

[28] Paper: “A Cluster Randomised Trial of Cloth Masks Compared with Medical Masks in Healthcare Workers.” By C Raina MacIntyre and others. BMJ Open, April 22, 2015. <bmjopen.bmj.com>

“As facemask use is a visible intervention, clinical end points could not be blinded.”

NOTE: The CDC tried to dismiss this RCT on cloth masks by declaring that the study was “unblinded,” which could bias “self-reporting of illness.” The duplicity of the CDC’s criticism is exposed by the simple fact that all real-world studies of masks are unblinded because the participants can easily tell if they are wearing a mask and what type of mask they are wearing. Yet, the CDC misleads by singling out this one study as “unblinded,” even though this is the case with every real-life study of masks.

[29] Paper: “Impact of Community Masking on COVID-19: A Cluster-Randomized Trial in Bangladesh.” By Jason Abaluck and others. Science, December 2, 2021. <www.science.org>

“In table S7, we report results from our pre-specified linear model and in Table 2 we report results from a generalized linear model with a Poisson family and log-link function.”

[30] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Although randomization ensures that known factors are evenly distributed between the exposure and control groups, the importance of RCTs lies in the even distribution of unknown factors. Thus, a well-designed RCT will result in more simplified endpoint analyses because complex statistical models are not necessary to control for confounding factors.

[31] Paper: “Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects.” By Michael L. Anderson. Journal of the American Statistical Association, December 2008. Pages 1481–1495. <are.berkeley.edu>

Page 1483: “The random assignment process makes estimation of causal effects straightforward.”

Page 1484: “Note that no assumptions regarding the distributions or independence of potential outcomes are needed. This is because the randomized design itself is the basis for inference (Fisher 1935), and preexisting clusters cannot be positively correlated with the treatment assignments in any systematic way.”

[32] Article: “Statistical Malpractice.” By Bruce G. Charlton. Journal of the Royal College of Physicians of London, March 1996. Pages 112–114. <www.ncbi.nlm.nih.gov>

Page 112:

Statistical malpractice is an insidious, and indeed prestige-laden and grant-rewarded, activity. Brilliantly clever, but fundamentally wrong-headed, number-crunchers are encouraged to devise inappropriate applications of mathematical methods to health problems. …

Epidemiology and cognate disciplines such as health economics are the main source of culprits, because statistical malpractice typically occurs when complex analytical techniques are combined with large data sets. The mystique of mathematics blended with the bewildering intricacies of big numbers makes a potent cocktail. …

The relationship between science and statistical analysis in medicine is quite simple: statistics is a tool of science which may or may not be useful for a given task. Indeed, as a general rule, the better the science, the less the need for complex analysis, and big databases are a sign not of rigor but of poor control. Basic scientists often quip that if statistics are needed, you should go back and do a better experiment.

[33] Book: Business and Competitive Analysis: Effective Application of New and Classic Methods (2nd edition). By Craig S. Fleisher and Babette E. Bensoussan. Pearson Education, 2015.

Page 338: “Statistical analysis is very easy to misuse and misinterpret. Any method of analysis used, whenever applied to data, will provide a result, and all statistical results look authoritative.”

[34] Book: Health Promotion & Education Research Methods: Using the Five Chapter Thesis/Dissertation Model (2nd edition). By Randall R. Cottrell and James F. McKenzie. Jones and Bartlett, 2011.

Chapter 6: “Research Ethics.” Article: “Data Analyses.” Pages 112–113.

Page 112:

Torabi (1994)† has identified several specific ethical issues related to analyses of data. They include using data in an ignorant or careless way, direct manipulation of data, selective use or elimination of data, and overanalysis of data. … Manipulation of data involves subjecting data to multiple statistical techniques until one achieves the desired outcome.

NOTE: † Article: “Reliability Methods and Numbers of Items in Development of Health Instruments.” By M. R. Torabi. Health Values: The Journal of Health Behavior, Education & Promotion, November–December 1994. Pages 56–59. <psycnet.apa.org>

[35] Book: Health Promotion & Education Research Methods: Using the Five Chapter Thesis/Dissertation Model (2nd edition). By Randall R. Cottrell and James F. McKenzie. Jones and Bartlett, 2011.

Chapter 6: “Research Ethics.” Article: “Data Analyses.” Pages 112–113.

Page 112:

Torabi (1994)† has identified several specific ethical issues related to analyses of data. …

The final ethical issue noted by Torabi (1994) dealing with data analyses is overanalysis of data. There are times in the analyses of data that advanced statistical techniques need to be used to deal with complicated studies that include many variables. These advanced techniques should not be used just to impress others. “A general principle of data analysis recommends using the most appropriate, yet simplest, statistical techniques in research so findings can be better understood, interpreted, and communicated” (p. 11). A simple way to avoid overanalysis is to remember the old saying “Don’t use Cadillac statistics with Volkswagen data.”

NOTE: † Article: “Reliability Methods and Numbers of Items in Development of Health Instruments.” By M. R. Torabi. Health Values: The Journal of Health Behavior, Education & Promotion, November–December 1994. Pages 56–59. <psycnet.apa.org>

[36] Book: Social Work Research: Methods for the Helping Professions (Revised edition). Edited by Norman A. Polansky. University of Chicago Press, 1975.

Chapter 10: “Applications of Computer Technology.” By William J. Reid. Pages 229–253.

Page 238:

Misuse of Computers.

The computer has made an incalculable contribution to data analysis. At the same time it has created, or aggravated, some problems. If computers are to be used to best advantage, researchers must be alert to ways in which they can be misused.

While computers have made the use of complex methods of analysis possible, they have, by the same token, made it easy for researchers to use statistical methods they do not fully understand. As a result, researchers may use methods inappropriately or may produce “findings” that they cannot properly interpret. In the precomputer era, investigators either did their own computations or had them done under their supervision; consequently they had a better grasp of what they were doing and tended to limit themselves to methods they knew reasonably well.

Ignorance of what goes into a method of analysis is no longer a barrier to its use. An investigator need know only that a method is generally relevant to his purposes. He can then call for its instant application. Thus he may whip his data through one or several factor analyses with only a hazy idea of the limitations of the technique or what the resulting printout really means.

[37] Paper: “Equal Censoring but Still Informative: When the Reasons for Censoring Differ Between Treatment Arms.” By Timothée Olivier and Vinay Prasad. European Journal of Cancer, February 17, 2024. <www.ejcancer.com>

In randomized controlled trials, informative censoring has been described as a potential bias, mainly affecting time-to-event composite endpoints, like progression-free survival (PFS). It is usually suspected in the presence of unequal attrition rates between arms. Early censoring occurs for different reasons: patients may withdraw from a trial because of toxicity, or because of disappointment with their allocation arm. If censoring is more frequent in one arm due to increased toxicity, this removes the frailest individuals and introduces a bias favoring this arm. Conversely, patients who withdraw because of disappointment of their allocation arm may be more affluent and healthy patients, who will seek treatment options outside the protocol. In trials with one treatment arm presenting higher toxicity rates, and the other arm potentially leading to patient disappointment, censoring can occur for different reasons in each arm however with the same rates.

[38] Commentary: “Hard-Wired Bias: How Even Double-Blind, Randomized Controlled Trials Can Be Skewed From the Start.” By Vinay Prasad and Vance W. Berger. Mayo Clinic Proceedings, September 2015. <www.mayoclinicproceedings.org>

Selection Bias

Consider the open-label “run-in period.” The Heart Protection Study,20 a randomized trial of 20,536 individuals with high cardiovascular risk, tested whether 40 mg/d of simvastatin could improve outcomes as compared with placebo. The trial found that the medication decreased major vascular events by 25%, and the authors go further, arguing that without nonadherence the improvement would have been 33%.20 Notably, this trial used a 4-week placebo run-in period followed by a 4- to 6-week simvastatin run-in period before randomization. During this time, a patient’s primary doctor could remove a patient from randomization, and any patient could elect not to be randomized for “any reason.”21 All together, 11,609 patients who were eligible for the study and began the run-in period dropped out before randomization.21 Thus, more than a third of the patients who began the study were not randomized, and no set of specified inclusion criteria can define the set of patients who remained. Others have noted that the use of run-in periods can limit the applicability of study findings and can inflate estimates of benefits.22 This occurs in part because run-in periods of the active drug test a different clinical question, whether discontinuation of a therapy is harmful, rather than whether initiating a therapy is beneficial. …

Censoring

Informative censoring28 in clinical trials can distort our perception of the benefits of a treatment. All survival analyses are based on the premise that censoring is uninformative—the patients censored are no different from those who are followed. However, this assumption should be questioned. In many trials for cancer treatment, censoring occurs because patients who cannot tolerate the study medication, or have excess toxicity, withdraw from the study. These patients are likely to be different from those who tolerate therapy well. In a recent study, Campigotto and Weller29 provide 2 examples in which patients who are censored are likely to have better or worse survival than those who are followed. The authors then provide a range of estimates for the outcome had these patients not been excluded, on the basis of simulation. However, if patients discontinue treatment, and are no longer followed, we cannot reconstruct their outcomes without making assumptions. We are left with a hard-wired bias.

[39] Report: “Summary Basis for Regulatory Action: Pfizer-BioNTech mRNA Covid-19 Vaccine.” By Ramachandra Naik and others. U.S. Food and Drug Administration, Division of Vaccines and Related Product Applications, Office of Vaccines Research and Review, August 23, 2021. <www.justfacts.com>

Page 17:

The ongoing Phase 3 portion of the study is evaluating the safety and efficacy of COMIRNATY for the prevention of COVID-19 occurring at least 7 days after the second dose of vaccine. Efficacy is being assessed throughout a participant’s blinded follow-up in the study through surveillance for potential cases of COVID-19.

[40] Article: “FDA Violated Own Safety and Efficacy Standards in Approving Covid-19 Vaccines For Children.” By James D. Agresti. Just Facts, July 14, 2022. <www.justfactsdaily.com>

Another area of uncertainty in the Covid vaccine trials is whether adverse events are “serious” or not. This involves the FDA and vaccine manufacturers making subjective and arguable judgments like the following.

A 2021 Pfizer briefing document for the FDA claims that all “serious adverse events” which occurred in a study of about 4,500 children aged 5 to 11 were “not related” to the vaccine. On the very same page, however, the document reveals a supposedly non-serious event “related” to the vaccine which led a participant to withdraw from the study after the first dose. This occurred because the child experienced:

• a “severe” fever that peaked at 104.2 ºF (40.1 °C) on the third day after the vaccine and subsided a day later.

• a “severe” decline in a type of white blood cells produced in bone marrow that are crucial to stopping bacterial infections. The girl had a “benign” history of this condition, but her levels dropped from 480/mm3 before the vaccine to 20 on the second day after the vaccine and then “improved to 70” by 23 days after the vaccine. Per the Gale Encyclopedia of Medicine, when levels of these cells decline below 200, “there is a “risk of overwhelming infection” which “requires hospital treatment with antibiotics.”

In a study of children aged 2–4 years, the FDA characterized a fever of 105.4 ºF (40.8 °C) as “non-serious.” The fever, which the FDA decided was “related” to the Pfizer vaccine, began 2 days after the first dose, lasted for 5 days, and led to the child’s withdrawal from the study.

In other cases involving children aged 6 months to 4 years, 5 of them withdrew from Pfizer studies after the first dose because of events that the FDA deemed to be “non-serious,” including 4 “related” to the vaccine.

Given that severe reactions to Covid vaccines tend to worsen with each dose, the withdrawal of these children from the study could mask more serious problems that might have occurred if they took the later doses. This applies to parents who follow the CDC’s advice and give their toddlers and preschoolers multiple doses of the vaccine, even if their child has a “non-serious” event on the first dose.

[41] “Glossary.” National Institute for Health and Care Excellence (United Kingdom). Accessed July 6, 2024 at <www.nice.org.uk>

Intention-to-treat analysis

An assessment of the people taking part in a trial, based on the group they were initially (and randomly) allocated to. This is regardless of whether or not they dropped out, fully adhered to the treatment or switched to an alternative treatment. Intention-to-treat analysis (ITT) analyses are often used to assess clinical effectiveness because they mirror actual practice, when not everyone adheres to the treatment, and the treatment people have may be changed according to how their condition responds to it. Studies of drug treatments often use a modified ITT analysis, which includes only the people who have taken at least 1 dose of a study drug.

[42] Paper: “Intent-to-Treat Analysis Versus as-Treated Analysis.” By Jonas H. Ellenberg. Therapeutic Innovation & Regulatory Science, April 1, 1996. <journals.sagepub.com>

[43] Paper: “Statistical Considerations in the Intent-to-Treat Principle.” By John M. Lachin. Controlled Clinical Trials, June 2000. Pages 167–189. <www.sciencedirect.com>

. The pivotal property of a clinical trial is the assignment of treatments to patients at random. Randomization alone, however, is not sufficient to provide an unbiased comparison of therapies. An additional requirement is that the set of patients contributing to an analysis provides an unbiased assessment of treatment effects, or that any missing data are ignorable. A sufficient condition to provide an unbiased comparison is to obtain complete data on all randomized subjects. This can be achieved by an intent-to-treat design wherein all patients are followed until death or the end of the trial, or until the outcome event is reached in a time-to-event trial, irrespective of whether the patient is still receiving or complying with the assigned treatment.

[44] Paper: “Randomised Controlled Trials – The Gold Standard for Effectiveness Research.” By Eduardo Hariton & Joseph J. Locascio. BJOG (British Journal of Obstetrics & Gynecology), June 19, 2018. <obgyn.onlinelibrary.wiley.com>

Randomised controlled trials (RCTs) are the reference standard for studying causal relationships between interventions and outcomes as randomisation eliminates much of the bias inherent with other study designs. …

RCTs are prospective studies that measure the effectiveness of interventions. Although no study is likely on its own to prove causality, randomisation reduces bias and provides a rigorous tool to examine cause–effect relationships between an intervention and outcome. This is because the act of randomisation in a large study balances participant characteristics (both observed and unobserved) between the groups, allowing attribution of any differences in outcome to the intervention. This is not possible with any other study design, so RCTs are considered the reference standard for driving practice….

[45] In 2015, a scholarly journal published an RCT which found that the use of cloth masks should “be discouraged in high-risk situations” because “filtration was extremely poor (almost 0%)”

However, the authors of this study started backpedaling on it during 2020 under the pressure of Covid-19 mask mandates. They did by focusing on a subset of people in their study who had the highest “adherence to mask use.” This violates the very essence of RCTs, which are supposed to be “randomized,” not limited to people who take the most precautions to avoid getting sick.

The same authors pulled the same stunt in a 2020 paper that claims to be an analysis of “randomised controlled trials” and asserts that masking “appeared to be effective.” The problem is that not one of the eight RCTs they cited found a statistically significant positive impact from masking.

Yet, the CDC cited that fatally flawed paper as evidence that “randomized trials” show masks work. Meanwhile, they while ignored a host of actual RCTs that show nothing of the sort:

[46] Textbook: Principles and Practice of Clinical Research. By John I. Gallin and ‎Frederick P Ognibene. Academic Press, 2012.

Page 226:

The strength of the well-designed clinical trial is its ability to establish causality. In this way the clinical trial overcomes the major weaknesses of all other types of study designs, although randomized controlled trials are sometimes not feasible when ethical considerations, costs, resources, or time, prove prohibitive.

[47] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

To expose patients to randomization of treatment, clinical equipoise must exist. The principle of equipoise relies on a situation in which clinical experts professionally disagree on the preferred treatment method. …

Although RCTs represent the pinnacle in clinical design, there are many situations in which RCTs are impractical or impossible. Clinical equipoise may not exist, or common sense may prevent randomization of well-established practices, such as the use of parachutes.6 RCTs are also costly to conduct and must generate a new control group with each trial.