The Masking Debate We Didn’t Have

The Covid establishment saw masking policies as purely scientific. Now critics are making the same mistake.

“Mask Up DC” signs are still visible in the windows of some businesses around Washington, D.C. Are these signs public-health recommendations based on science, or just outdated reminders of a bygone pandemic era? Or could they be relics of a time when many mistakenly believed that masks were actually protecting us? That is the conclusion many have drawn in recent months as new scientific evidence has emerged that seems to challenge conventional pandemic wisdom. An early 2023 study by the Cochrane Library, a well-respected UK-based not-for-profit that specializes in analyzing evidence in medicine and public health, found that “wearing masks in the community probably makes little or no difference” to the outcome of either influenza-like illness or Covid-19.

Masking policies aren’t the only thing again being questioned. Two weeks after the Cochrane study appeared, the Lancet published a study suggesting that natural immunity to Covid-19 — that is, immunity acquired through infection — is strong, long-lasting, and should be considered alongside immunity conferred through vaccination in policy decisions. At about the same time, both the Department of Energy and the Federal Bureau of Investigation announced that, based on classified intelligence, the most likely origin of the pandemic was an accident at a laboratory in Wuhan, China.

All of this feeds into a narrative that is crystallizing as the pandemic recedes: everything the experts said was wrong. Far from “following the science,” the entire pandemic policy regime appears to be collapsing all around us under the weight of scientific evidence.

“The mask mandates did nothing,” proclaims Bret Stephens in the New York Times. “After Americans spent 2021 and much of 2022 screaming at each other about natural immunity vs. vaccination, now ‘the science’ is telling us that the previously infected unvaccinated had adequate protection all along,” write the editors of National Review. From the lab leak to masks to vaccines, “the conventional wisdom about COVID seems to have been upended,” Derek Thompson summarizes in the Atlantic.

So were the skeptics right after all? Does science now definitively prove that our pandemic policies were pointless? Actually, what recent research really highlights is the complexity of scientific evidence, and why it alone can never dictate policy.

Evidence is of course vital to making and assessing policy decisions, especially when faced with a global public health crisis. But deciding what policies are called for and when — and, yes, even assessing those policies after the fact — requires more than scientific evidence. It also requires practical reasoning — deciding what to do in light of the evidence, making judgment calls, weighing uncertainties, accepting tradeoffs, and grappling with conflicting goals. Scientific evidence may be necessary in all of this, but never sufficient — especially so when evidence is partial, imperfect, and contested but political decision-making nevertheless demands an answer.

That evidence cannot dictate policy is a lesson we all should have learned from the pandemic. The dream of technocracy — to replace the political decision-making of elected representatives with the technical problem-solving of apolitical experts — is not only morally undesirable; it is epistemologically impossible. It doesn’t matter whether the policies in question are those favored by the left or the right. Unfortunately, the ongoing public debate over masks and other Covid-era policies suggests that this lesson continues to fall on deaf ears.

In light of new research findings, many who spent the pandemic criticizing the technocratic tyranny of those professing to “follow the science” are now turning around and pointing to “the science” as if it dictated the policy outcomes they preferred all along. The implication is that if we had really followed the science, we would not have imposed mask policies in the first place. At the very least, they say, we should now accept that science has decided the matter against such policies. “The science,” it seems, is unsettled until it settles in your favor.

In response, those who had been rallying behind the “follow the science” banner during the pandemic, now faced with recalcitrant evidence, appear to be retreating. Do these new studies show that the experts were wrong? No, says Vox, “finding answers in science isn’t that easy.” When it comes to “complex questions with imperfect data,” Derek Thompson writes in the Atlantic, do not trust those who “manufacture simplistic answers with perfect confidence.” True, that. But then why did so many elite institutions, including media outlets such as Vox and the Atlantic, spend the pandemic doing just that — and castigating those who disagreed as irrational anti-science ideologues?

Perhaps because it was never really about “the science” in the first place; it was always about the policies themselves. That is to say, the disagreements that really drove — and continue to drive — controversy over pandemic policies were not about science, but about the same questions that underlie most of our political disputes: whether and when a risk calls for precautionary interventions; whether and when to prioritize individual freedom over the common good; whether and when constitutional protections may be limited for the sake of the national interest; whether and when federal action is needed to achieve a given end versus relying on state, local, or private institutions.

Why, then, have we spent three years arguing about science? Because the illusion that science can dictate policy is comforting — and strategically useful — to those on both sides of our political divide. It makes the hard work of politics appear unnecessary — an appealing proposition in a country as divided as ours. If the correct policy decisions — such as whether to implement mask mandates — simply follow from the science, then there is no need for deliberation, persuasion, accommodation, or compromise, either about the goals of political decisions or the best way to achieve them. Instead, we just need to “do science.”

Those who disagree can then be dismissed as “anti-science.” Perhaps they need to be educated, or perhaps they are simply irrational, impervious to evidence and reason. Or perhaps they are driven by nefarious motives — to assert power, to restrict our freedoms, and so forth. Whatever the case, science, on this view, leaves no room for reasonable disagreements over policy decisions. Yet that’s precisely what democratic politics is all about — making collective decisions amidst uncertainty and despite sometimes profound disagreements.

Now, more than ever, we need a reasonable debate, not about whether we are (or were) “following the science,” but about where we ultimately want to go — and whether, and to what extent, scientific evidence can help us get there.

What is the Cochrane Review, Anyway?

“But wait!” you say, “didn’t the Cochrane study prove that masks don’t work? There is nothing complicated about that. If masks don’t work, then masks are bad policy. It doesn’t matter what the circumstances are or whether we have different goals or clashing political worldviews. If the science says masks don’t work, then following the science means not imposing mask policies. Doesn’t the evidence dictate policy in such a clear-cut case as this?”

Perhaps, this argument goes, policymakers could be forgiven for recommending masks two years ago when they had little evidence to go on but needed to do something in the face of such a large and unknown threat. But now that we have learned more, shouldn’t the experts admit they were wrong? Yet some of them seem to be digging in their heels instead. When asked by a congressional committee in February about the Cochrane review and mask policies for schools, former Centers for Disease Control and Prevention director Rochelle Walensky said: “our masking guidance doesn’t really change with time — what it changes with is disease, so when there’s a lot of disease in the community we recommend that those communities and those schools mask.” What better illustration is there of not following the science?

To understand what’s going on here, we have to take a closer look at the Cochrane study itself — and where it fits into the larger debate over pandemic policies.

The Cochrane review is not a scientific study in the sense that many people might think — a new experiment or set of empirical observations. It is a systematic review; it does not provide — nor does it purport to provide — any new empirical or experimental evidence about the efficacy or effectiveness of masks. Rather, it uses a well-established set of procedures, including meta-analysis, for reviewing the best available evidence for masks and other “physical measures,” including hygiene protocols such as handwashing. This particular study is an update to an earlier Cochrane review, released in 2020, incorporating research that has taken place since that time.

The updated review, which analyzes seventy-eight studies, purports to show that “wearing masks in the community probably makes little or no difference to the outcome of influenza-like illness (ILI)/COVID-19 like illness compared to not wearing masks.” The lead author, British epidemiologist Tom Jefferson, has been even more emphatic in interviews, saying for instance that, “There is just no evidence that they make any difference. Full stop.” And about N95 respirators, “it makes no difference — none of it.” Yet the study has numerous limitations, as Walensky noted in her response to Congress, and as the study’s abstract and a follow-up statement by Cochrane both emphasize. These limitations call into question Jefferson’s confident and far-ranging interpretation.

Most of these limitations concern the methodology the Cochrane review employs — so we need to say a bit more about that first. Meta-analyses pool the results of scientific studies, typically randomized controlled trials (RCTs). These are tests that compare the results of a given intervention or treatment, such as masking, in a treatment group against a control group. The “randomized” part refers to the fact that test participants are randomly assigned to one of these two groups. Ideally, RCTs are also blinded, meaning neither those administering nor those receiving the treatment know who is in which group. Together, randomization and blinding are intended to reduce the chance that one of the two groups is biased in favor of a given treatment effect.

If a well-conducted RCT shows a significant “average treatment effect” — or difference in mean outcome between the treatment group and the control group — it provides strong evidence that the intervention had a genuine causal effect, that “it works.” (This is a slight oversimplification, as we will later see.) Because RCTs allow causal inferences to be drawn even when we do not know what potential factors could confound the effects, RCTs are extremely powerful tools. And many experts consider them to be the “gold standard” of evidence in medicine and public policy.

Yet RCTs are often conducted on small populations, limiting the validity of the inferences we can draw from them. (Statistically speaking, such studies are not “well-powered.”) Basically, this is because the larger the test population, the less likely it is that the observed effect is due to chance. Here is where meta-analyses come in: they pool the results of multiple RCTs to simulate large-scale populations. This allows for more reliable statistical inferences — and, in principle, more robust conclusions — about the causal effect of a given treatment. Hence meta-analyses can provide even stronger evidence than RCTs. For this reason, some consider meta-analyses, not RCTs, to be the “gold standard.”

But the quality of a meta-analysis partly depends on the quality of the RCTs it analyzes. How large was the test population in each? Was there blinding? Did test participants comply with the guidelines? The quality of the meta-analysis also depends on the way it pools the RCTs — whether they are similar in the relevant respects. If we can’t draw reliable inferences from apples, it is no help trying to draw inferences from apples and oranges.

Study Limitations

The Cochrane study is limited on several of these fronts. For instance, when it comes to masking studies, blinding is hard, if not impossible. Can those administering or wearing masks be prevented from knowing who is wearing masks? A similar problem obtains for enforcing or encouraging adherence. How do researchers ensure that test participants actually wear masks and wear them consistently and correctly? Given these uncertainties, if the RCTs show little or no effect of masking, it is hard to know whether that means masks aren’t very effective or whether test participants simply did not wear masks very effectively (or even at all). Simply put, it could be that these studies show that mask policies don’t work, not that masks themselves don’t work.

Of course, the fact that the RCTs included in the Cochrane review are of variable quality can cut both ways in the debate over masks. After all, it is precisely the putatively low quality of these studies that leads the authors to conclude that evidence for masking is poor. They point to “large disparities between studies with respect to the clinical outcome events, which were imprecisely defined in several studies,” as well as “differences in the extent to which laboratory-confirmed viruses were included in the studies that assessed them.” More generally, the authors note the “high risk of bias” in the studies, the “variation in outcome measurement,” and “relatively low adherence with the interventions,” all of which “hampers drawing firm conclusions.”

But there are other reasons to be skeptical that the review really shows that masks don’t work, “full stop,” as Jefferson puts it. These concern the way in which the studies being analyzed in the review are pooled together.

Of the seventy-eight studies reviewed by Cochrane, only seven looked specifically at the transmission of Covid. (The other seventy-one were conducted prior to the pandemic or were looking at other respiratory illnesses, such as influenza.) And of the seven Covid studies, only two were looking specifically at masking. (Five were looking at other physical interventions.) It matters, then, when considering the Cochrane review’s implications for mask policies, whether and to what extent the modes of transmission of Covid and these other respiratory diseases are similar. More generally, it matters what these modes of transmission are in the first place, especially the relative importance of droplets versus aerosols. But this is something that the Cochrane review itself does not — and does not purport — to answer.

Then there are the various settings of the RCTs included in the review. Some were conducted in health-care settings while others were conducted in community settings; some were conducted in epidemic settings while others were not. These differences are potentially significant because the risk of infection is not the same in every setting, nor are the variables that contribute to spread. For instance, disease prevalence may be much higher in epidemic versus non-epidemic settings, physical interventions other than masking may be more common in community versus health-care settings, or vice versa.

Another potential problem is that the Cochrane review combines RCTs where masks were worn intermittently — as in health-care settings — and RCTs in which masks were worn continuously. A further reason these different contexts are important is that both influenza and Covid are transmitted at least in part through aerosolized particles, meaning that individuals who wear masks intermittently — for example, only when caring for infected patients — may still be at risk of infection once they remove their masks.

Experts have also pointed out that the Cochrane review is insufficiently attentive to the differences between types of face coverings — cloth, medical, or surgical masks versus N95s and KN95s. At issue is not simply the quality of these different face coverings but also their intended purposes. Technically, N95s and similar face coverings are not masks; they are respirators, personal protective equipment designed — and regulated by the U.S. Food and Drug Administration — to provide an uninfected wearer with protection against diseases spread through the air by filtering out aerosols.

Medical and surgical masks, by contrast, are designed primarily to block fluid splatter. As such, they are generally not recommended for personal protection against respiratory infection. When well-made and worn correctly and consistently, such masks may be effective at blocking large droplets, one mode of transmission for Covid. This was one reason why experts, after initially advising against masks early in the pandemic, reversed course and recommended masks — including cloth masks — for “source control,” that is, reducing transmission from infected wearers to uninfected individuals within the community.

This points to a broader ambiguity that has confounded the entire debate over mask policy during the pandemic: Are face coverings recommended because they protect the uninfected wearer directly or because they limit spread from infected individuals to the community? The answer is, of course, both. Source control protects everyone indirectly by reducing total community prevalence. But this is a different policy goal than providing individual wearers with maximum protection. And different kinds of face coverings may be more or less effective — or more or less feasible — for these different purposes in different kinds of settings.

The Cochrane review, for its part, skews heavily toward RCTs that studied the effects of masks and respirators for protection against illness — not for source control — simply because that’s what most of the RCTs on masks were designed to test. Yet this is a particularly notable limitation, given that officials changed course and began recommending the blanket use of any face coverings as a means of source control — not direct protection — after the discovery of asymptomatic Covid transmission in early 2020.

Taken together, these considerations provide good reasons to be cautious about the Cochrane review’s conclusions, or at least about interpretations that assume those conclusions provide a definitive answer on the effectiveness of face coverings in general. But ultimately, your answer to the question of whether or not you think the Cochrane review settles the debate over masks — or whether it even adds much weight to the evidence one way or the other — depends in part on what kind of evidence you believe should decide the matter in the first place. And this is something that experts themselves do not agree on — an important fact that has been largely ignored by both sides in the public debate over masks.

What Counts as Evidence?

There are relatively few randomized studies on masks, and these have yielded mixed results. Recall that the Cochrane review included only two RCTs on masks and Covid-19 specifically. One showed a small but not statistically significant reduction in Covid infections among people who wore surgical masks in public spaces. The other showed a significant positive effect of masking for controlling community spread of the virus, but only with high-quality masks. Each of these studies has important limitations, along the lines noted earlier.

Now, we already knew this before Cochrane published its updated review this year. Those two studies were from late 2020 and late 2021, respectively. Tellingly, the conclusions of the 2023 review do not differ significantly from those of the 2020 review, despite three years of experience in a global pandemic, and a bewildering amount of (mostly) non-randomized scientific research. If you take double-blinded RCTs to be the gold standard of evidence, the Cochrane review more or less only confirms what you probably already thought: that the evidence for masks was lacking. According to this line of thinking, the fact that there are so few high-quality randomized studies — and that what studies there are cannot be pooled in a consistent way — just goes to show how scanty the evidence for mask policies remains, and the Cochrane review adds little new evidence — that’s the point. The subject is badly understood and poorly studied. What we need are more well-conducted randomized trials to settle the matter. Until we do, we have little evidence to go on.

But should randomized trials really be the gold standard? Do we really have so little evidence to go on? And would not implementing mask policies have been the right way to hedge against uncertainty in any case? Part of what is — and has always been — at issue in the debate over masks (and pandemic policies generally) is not only questions about the evidence for whether policy interventions are effective. At issue, more fundamentally, is what kind of evidence we can and should rely on when making and assessing policy decisions in the first place. And here the expert community remains divided.

Cochrane is a prominent establishment in the field of evidence-based medicine, a methodological school with origins stretching back to the nineteenth (or perhaps even the late eighteenth) century, but that really began to take root in medicine and public health in the twentieth century — especially the 1980s and early 1990s. (The Cochrane Collaboration, which includes the Library, was founded in 1993.) The goal of evidence-based medicine is not simply to base medicine and public health on evidence but on a particular kind of evidence — typically “experimental evidence” in the form of quantitative data from randomized trials.

Evidence-based approaches often employ a “hierarchy of evidence,” which is used to guide decision-making and to inform policy evaluation based on the relative quality of evidence. At the top of this hierarchy are well-powered, double-blinded, randomized controlled trials (or meta-analyses, as discussed above). Hence non-blinded randomized trials, randomized trials that lack statistical power, and non-randomized trials all rank lower on the evidence hierarchy. In practice, this means that observational evidence, for example from studies of non-randomized populations, get downgraded or discounted.

The rationale for this approach is that without sufficient statistical power, blinding, and randomization, we cannot draw reliable inferences from data. For instance, without randomly assigning test participants to the treatment and control groups, how do we know that a given effect (fewer infections of a given disease in a population, say) is due to the intervention in question (such as masks) as opposed to some other factor (for example, that people who wear masks also practice social distancing or good hygiene)? And without reliable inferences from data, we have little solid evidence on which to base our decisions.

This point should not be underestimated. Two notorious problems have plagued the human sciences when compared to the natural sciences: a relative lack of theoretical consensus about underlying causes and difficulty acquiring controlled evidence. What makes RCTs so powerful is that they offer a reliable way to infer causal effects from data even in the absence of complete knowledge of the underlying causal pathways. Hence many experts see randomization as a way to introduce scientific rigor into fields such as medicine, public health, and the social sciences that have historically been thought to lack it.

Yet whatever their virtues, traditional evidence-based approaches also have important limitations both in science and in policymaking.

Problems with the “Gold Standard”

No one disputes that medicine, public health, or public policy generally should be informed by evidence, and that we should aim for high-quality evidence — and even that RCTs can provide it. Experts also generally agree that randomized trials are very powerful tools for inferring causal effects, and that they are particularly useful in clinical contexts. For instance, there is widespread agreement among experts that the current method for testing vaccines — multi-phase, large-scale, double-blinded randomized clinical trials — is robust.

The dispute among experts, rather, is about the underlying idea that there is a hierarchy of evidence, which can be used to guide decision-making in a cut-and-dried way. Instead of thinking of evidence as a hierarchy, we might think of it as a toolkit, with different methods being better suited for achieving different goals in different settings. There are many reasons — practical, ethical, and scientific — for favoring this approach.

Start with the practical reasons. As we saw above with masking, it is not always possible to conduct an RCT in a rigorous way, or at all. There can be implementation challenges, such as difficulties with blinding or policing post-randomization effects. Also, some problems are intrinsically difficult to study in controlled settings, whether because of the kinds of exposures involved or the time it takes for effects to appear. Then there’s the fact that RCTs are expensive and time-consuming.

For these reasons, there are circumstances in which we may need to rely on other types of evidence simply because that’s all we have. This was the argument that a number of experts made in the early days of the pandemic about a range of interventions, including cloth masks. In a perfect world, we might conduct large-scale, randomized controlled trials to see how effective masks or other interventions are. But in the face of a massive and urgent new threat, policy-making couldn’t wait. So leaders looked to non-randomized and indirect evidence — from observational studies, computer simulations, and basic research — to assess whether these interventions were appropriate. Though the evidence on hand was not perfect, the argument went, it was good enough for the time being.

Note that this kind of pragmatic approach is, in principle at least, perfectly compatible with the idea that there is a hierarchy of evidence. It just emphasizes that, in practice, we don’t always have the luxury of gathering the best evidence according to that hierarchy. So while RCTs may be the highest standard, they cannot and should not be taken as the only standard, at least during a fast-moving crisis such as a pandemic.

Then there are ethical reasons why we may want to consider evidence other than that provided by RCTs. Randomized controlled trials require withholding a treatment from a control group. Since the early days of large-scale clinical trials in the mid-twentieth century, some physicians have voiced concerns that this requirement violates their ethical and professional obligations. It is unethical, this argument goes, to withhold a potentially life-saving treatment from patients in order to carry out a certain test, if there is already some evidence, such as from observational studies, that it works. It is unethical to wait until a large-scale clinical trial is complete before treating patients, especially if waiting could mean they get sicker or die in the meantime.

Such reasoning was part of the justification for mask policies during the pandemic. The argument was not only that it was impractical to await evidence from RCTs, since we had to make policy decisions quickly, but also that it was unethical, since it meant not implementing a potentially life-saving intervention. The same reasoning applies to the scientific studies themselves, since RCTs also require withholding the intervention from a control group. Some commenters argued that this requirement makes it ethically impossible to determine whether masks are effective using RCTs.

This kind of ethical reasoning does not always favor precautionary policies. On the contrary, it has been employed, for instance, in critiques of FDA regulation, which enshrines RCTs as the gold standard. Critics — including conservatives and libertarians — have long made the case that in some circumstances, such as with rare or terminal disease, it is unethical to prevent physicians from prescribing innovative drug therapies that have the potential to help patients simply because they have not yet been vetted by the kinds of RCTs required by the FDA. In these cases, they argue, non-clinical evidence may be sufficient to justify use or temporary approval of a drug.

It is also interesting to note, given the politics of the pandemic, that physicians who touted the benefits of alternative Covid treatments, such as hydroxychloroquine and ivermectin, reasoned about RCTs in a similar way to mainstream proponents of masking. Their argument, in effect, was that because there appeared to be evidence that the treatments worked on individual patients, and given the urgency of the pandemic, it would be unethical to prevent doctors from prescribing these treatments until they had been vetted by large-scale RCTs. (Multiple RCTs have since shown neither treatment to be effective.)

Finally, there are scientific reasons why we may need to consider evidence other than that provided by RCTs. Like all scientific methods, RCTs rest on certain assumptions, so that the conclusions drawn from them are valid only insofar as those assumptions hold. Two particular features of RCTs are worth mentioning in this regard.

First, as noted above, the kind of causal effect that RCTs can capture is statistical in nature: it holds for a given subpopulation within a study. But the RCT does not tell you for which subpopulation that effect holds. So, even if an RCT shows a positive effect for a given treatment, we cannot infer from this that this individual here will experience that effect — unless we happen to know that that individual is a member of the relevant statistical population. But that is precisely what the RCT cannot tell you. This is an instance of what statisticians call the “reference class problem.” And it relates to a second limitation: though a well-conducted RCT can isolate a robust causal effect for a given test population, it cannot tell you whether that effect will hold in a new population, or whether it will hold generally for the population at large. This is what is known as the problem of “external validity.”

In sum, a well-conducted RCT may show a genuine causal effect — it may be “internally valid.” But, by itself, it cannot indicate whether that effect will hold for, say, an individual patient under a physician’s care or whether it will hold for, say, a demographic group that is different from the ones represented in the RCT. Of course, the effect might hold for this individual or that demographic. But it could also be the case that there are some distinctive characteristics of that individual or group — such as genetic predispositions or environmental factors — that render the treatment ineffective in those settings. To know whether the effect will hold you have to know something about the new setting — whether it is similar to the test setting in the relevant respects. And knowing that usually requires drawing on other sources of knowledge and evidence.

For instance, a physician will often have enough first-hand knowledge of her patient — medical history, demographic characteristics, risk factors, prognosis — to decide whether a given treatment may be effective or is worth any possible side effects. Or researchers may have basic scientific knowledge of underlying causal mechanisms, observational data, or even local knowledge about a target population, to help them decide whether a causal effect from an RCT is likely to hold in a new setting.

Hence various kinds of evidence — not just RCTs — are often needed to inform sound decision-making, even on purely epistemic grounds.

The Case for Scientific Pluralism

Taken together, these practical, ethical, and scientific considerations offer reasons why RCTs should not be taken as the gold standard of evidence. Powerful as RCTs are, they are not all-purpose, nor are they the only tools available. We should think about them as one important item in a larger toolkit needed for making informed decisions. This means approaching problems contextually, not hierarchically, being willing to draw on different sources of evidence when needed, depending on the specifics at hand and the goals we are trying to achieve. It also means that we cannot assess the quality of evidence in the abstract, apart from such specifics and goals.

Defenders of the “gold standard” approach tend to see it as the most scientific, because, as in the natural sciences, it privileges conclusions drawn from the most rigorous experimental evidence. But the pluralistic approach to evidence is hardly unprecedented in the natural sciences.

In fact, the natural sciences employ a range of methods for drawing inferences from empirical data, from the traditional “hypothetico-deductive method” (what is popularly called the scientific method) to statistical analysis of observational data to reliance on computer simulations. There is no single experimental or inferential method in the natural sciences, nor is there an established hierarchy of evidence, much less one that discounts observational data. The latter would be particularly problematic for sciences such as astronomy. Natural scientists often appeal to a range of evidence in order to establish a given claim — what philosophers call a “no miracles” or “no coincidence” argument. The idea is that, taken together, multiple sources of evidence provide more evidentiary weight than any one of them would alone.

A classic example is the experimental confirmation of the theory of Brownian motion, for which French physicist Jean Perrin won the Nobel Prize in 1926. In 1905, Albert Einstein had proposed a causal explanation of an empirical phenomenon that had long puzzled scientists — the apparently random motion of small particles suspended in a gas or liquid. (The phenomenon was named after the Scottish botanist Robert Brown, who first identified it in 1827 while observing pollen grains.) In characterizing the phenomenon mathematically, Einstein posited the existence of water molecules to explain the particles’ behavior. He also predicted a precise relationship between the mean square displacement of the particles and Avogadro’s number (the number of molecules or atoms in one mole of a given substance).

A few years later, Perrin confirmed Einstein’s prediction experimentally by calculating the value of Avogadro’s number. This was widely considered a “crucial experiment” — confirming not only Einstein’s theory but also the existence of molecules and atoms more generally, ending a long-standing scientific debate. Yet to test the theory, Perrin offered no fewer than thirteen distinct methods for determining the same value. His argument was not that any one of these methods was definitive, but, rather, that all of them together provided compelling evidence that his calculation was correct. It could not simply be a coincidence — a “miracle” — that so many lines of evidence all pointed in the same direction.

There is no reason to suppose, a priori, that the medical and social sciences should be more restrictive in their methodological approach to evidence than the natural sciences, especially if we take the latter to be the paragon of scientific rigor. As we have seen, RCTs are not the only available source of evidence in the medical and social sciences. Nor are they the only formal method for inferring causes from effects. Over the last few decades, fields from economics to epidemiology to computer science have pioneered a whole range of causal inference methods, including techniques for drawing causal inferences from observational data. To be sure, these methods have their own strengths and weaknesses, and are more or less useful in different settings. But, from a pluralist standpoint, that is all the more reason to be pragmatic about the methods we use to inform and assess policy interventions.

The pluralist approach in public health predates the pandemic. But it played an important — if often overlooked — role in pandemic policy decisions. It is why, for instance, experts could point to evidence that masks work, despite a relative lack of randomized studies: because there is evidence — indeed, lots of evidence — about masks. It’s just that much (though by no means all) of that evidence comes from basic research, computer modeling, and observational data — the kinds of studies that rank low on hierarchies in evidence-based medicine. It is the kind of evidence that Tom Jefferson, lead author of the recent Cochrane review, dismisses as “non-randomised studies, flawed observational studies.”

In contrast, many of the critical responses to the Cochrane review emphasize the importance of weighing various sources of evidence — and doing so in a way that is sensitive to context. For instance, critics have pointed out that while there are relatively few RCTs for masks in general, and for cloth masks in particular, there is lots of non-randomized evidence that masks — even cloth masks — are effective for source control (something that is difficult to study using RCTs). And there is considerable evidence from observational trials and comparative studies that high-quality masks and respirators provide protection against transmission of coronaviruses such as SARS, MERS, and SARS-COV-2. A number of RCTs have also shown that N95s can reduce the risk of clinical respiratory illness and even provide some herd protection.

Of course, none of this evidence definitively settles the matter for science, much less for policy. The true justification for mask policies was never one definitive piece of evidence, one conclusive scientific study proving that masks, in general, work, full stop. There isn’t (and may never be) such evidence. The justification, rather, was that given the “preponderance of evidence” that wearing face coverings can help to reduce transmission or provide personal protection — and given the relatively low cost of the intervention — it is good policy to require certain face coverings in certain settings, especially when the stakes are as high as they were during the pandemic.

Public Health Policies are Political Acts

To be clear, this justification was not how mask and other pandemic policies were typically framed by journalists, politicians, or even by many experts themselves — far from it. On the contrary, throughout the pandemic, the rhetoric of “following the science” obscured the complexity of scientific evidence, not to mention the much greater complexity of using such evidence to inform policy. Rather than judgment calls based on various kinds of evidence, policy recommendations were often couched as definitive conclusions, based on incontrovertible evidence. Here Anthony Fauci is exemplary.

As is well known, the United States and other Western countries initially did not recommend that the public wear face coverings — whether cloth or surgical masks or respirators. Instead, the CDC recommended, “Take everyday preventive actions, like staying home when you are sick and washing hands with soap and water, to help slow the spread of respiratory illness.” Yet at the same time, other countries — especially non-Western ones, such as China and Japan, which had been at the forefront of the SARS and H1N1 pandemics — embraced masks from the beginning. Then, in April 2020, the CDC reversed course and began recommending mask-wearing. The World Health Organization soon followed suit; and by summer, mask policies had become more or less universal.

When asked why official guidance about masks changed so abruptly, Fauci responded: “It became clear that cloth coverings … and not necessarily a surgical mask or an N95, cloth coverings, work…. Meta-analysis studies show that, contrary to what we thought, masks really do work in preventing infection.” This statement was quickly picked up by mainstream media outlets, such as Vox, which had dutifully cautioned the public against wearing masks only weeks prior. But this was, frankly, an inaccurate characterization of both the state of the evidence surrounding masks — particularly cloth masks — and the reasons experts gave for recommending them.

In fact, the debate over mask policies is rooted in a scientific dispute that was going on long before the Covid pandemic — and has continued since. As a recent article by leading public health experts criticizing the Cochrane review puts it: “The question of whether and to what extent face masks work to prevent respiratory infections such as COVID and influenza has split the scientific community for decades.” As recently as 2017, one RCT on masks stated that “There is currently a lack of consensus around the efficacy of medical masks and respirators for healthcare workers … against influenza, with only five published randomised control trials … conducted to date.”

About cloth masks there was even less agreement. An RCT from 2015 that compared the use of surgical masks to cloth masks in health-care settings not only found no statistically significant evidence that cloth masks were effective; it also cautioned against their use because of evidence that wearing cloth actually increased rates of infection among wearers. At the same time, many experts prior to 2020 argued that surgical masks and N95-type respirators were effective, both at reducing the spread of disease and for personal protection, at least when worn properly and with sufficient compliance. And they could point to evidence — both randomized and non-randomized — showing just that.

The fact that the expert community was already divided on this issue helps explain why there was so much ambiguity and inconsistency about mask policies going into the pandemic. The disagreement was not irrational; it reflected what we knew and didn’t know about Covid-19 as well as the pre-existing — and ongoing — divisions within the expert community over the effectiveness of different kinds of face coverings and the methods we should employ to assess it. So what changed between winter and spring 2020? No one definitive study — contrary to what Fauci implied.

Rather, the justification for recommending cloth masks turned on an assessment of various lines of mostly indirect and observational evidence that they worked, mainly for source control, and that they were a relatively low-cost intervention when surgical masks and respirators were a scarce resource. This assessment was made, moreover, in light of mounting evidence that the virus could be spread asymptomatically as well as the belief that droplets were an important mode of transmission. (The growing recognition that aerosols were at least as — and probably more — important to transmission is partly why official guidance quietly shifted from cloth to “high quality” masks during the pandemic.)

Contrast Fauci’s statements on cloth masks with what a study in the Proceedings of the National Academy of Sciences had to say on the same subject only a few months later:

These policies are being developed in a complex decision-making environment, with a novel pandemic, rapid generation of new research, and exponential growth in cases and deaths in many regions. There is currently a global shortage of N95/FFP2 respirators and surgical masks for use in hospitals. Simple cloth masks present a pragmatic solution for use by the public.

A “pragmatic solution” in a “complex decision-making environment” with limited resources — this may well have been the best we could do in early 2020. In fact, it isn’t a bad characterization of what it means to use evidence for high-stakes decision-making in general. And it goes a long way toward showing why pluralism is necessary on practical grounds, at least during a fast-moving crisis. But this is a far cry from simply “following the science.” Crucially, the PNAS statement leaves room for reasonable disagreement about the best means to limit the spread of disease — and what tradeoffs we are willing to accept to achieve this end.

Yet this suggests that recent critics of Covid-19 policies have things exactly backwards. The problem with official pandemic policy was not that the science really shows the policies to be unscientific, but rather that such policies pretended to be entirely scientific in the first place. Political leaders have to be responsive to competing interests, values, and goals. That means grappling with sometimes intractable disagreements over what goals to prioritize, how to meet them, and at what cost.

When it comes to pandemic policies, should we err on the side of precautionary intervention, knowing that it will likely have adverse effects on the economy and on other areas of public health? Or should we err on the side of non-intervention, knowing that we may be putting countless lives at risk? Should we mandate masks, given what we know and don’t know about their effectiveness and potential downsides? If so, what kind of masks and in what settings — and for whom? Do, say, the upsides of requiring two-year-olds to wear cloth masks in schools and daycare centers really outweigh the downsides? What about children with learning disabilities? What about once vaccines are widely available?

These are fundamentally philosophical, ethical, and political questions about the best way to deal with a problem while hedging against uncertainty. They can be informed by science, but they are not the sort of thing that an RCT or a meta-analysis — or any kind of scientific study — can answer for us.

Technocratic vs. Anti-Technocratic Scientism

During the pandemic, the phrase “follow the science” became a shibboleth, a marker of which political tribe one belonged to. For one tribe, it became a rallying cry for rational decision-making and a cry of resistance against the ignorant masses duped by demagogues and disinformation. For the other, it became something to rally against, a mockery of the ideological cover the “expert class” used to advance its own interests at the expense of individual freedoms.

And yet, though apparently polarized over whether to “follow the science” or not, both sides of the ongoing debate over masks are in fact captivated by the same illusion — that the correct political decisions simply “follow” from “the science.” It’s just that each wants to follow different science. Or, more accurately — if to stretch the metaphor — they want to go to different places, in terms of policy outcomes, and are happy to “follow the science” if and when it helps them get there.

The problem is that each side believes that science can dictate policy. And yet neither side “follows the science” to the other’s satisfaction. Each accuses the other for this failure — and so they wind up disagreeing about policy after all. But because the disagreement remains framed as being about the science rather than about what to do, the arguments remain intractable. The very fact of such disagreement, ironically, further undermines the idea that we can simply “follow the science” to reach incontestable policy decisions.

Are both sides therefore “anti-science”? After all, isn’t the whole point of science-based policy to base political decisions on objective evidence rather than subjective preferences or ideologies? Shouldn’t “the science,” rather than our political worldviews, dictate our policies?

It is precisely this illusion that the pandemic should have dispelled. We need to use evidence to make and evaluate policy decisions. But doing so always requires making judgment calls, weighing uncertainties, accepting tradeoffs, and grappling with conflicting values and goals. It’s to be expected, then, that those with divergent worldviews may disagree about what policies are called for under which circumstances — and that they continue to disagree even as we learn more about the effectiveness of those policies. This is so precisely because different people are apt to make different judgment calls, weigh uncertainties differently, accept different trade-offs, and prioritize different values and goals.

Ultimately, questions about whether or not and how to act on the evidence at hand are not strictly scientific in nature. Should we impose an intervention or not? What kind or amount of evidence is sufficient to act? What are the potential downsides of action or inaction? These kinds of questions fall under the umbrella of what classical philosophers called “practical reason” — reasoning about means and ends.

Practical reasoning is always difficult, especially in a large and highly polarized democracy such as ours. What makes practical reasoning all the more difficult is that evidence for policy is very often messy and uncertain, especially in moments of crisis. As a result, disagreements over science-based policy will almost inevitably bleed over into disagreements about the nature and extent of the evidence for any proposed policy. Though unavoidable, this is all the more reason for experts to be open and honest about the difficulty.

Instead, too often during the pandemic, policymakers, experts, and media elites cast what were fundamentally political decisions as purely technical matters to be decided by “the science.” In so doing, they misrepresented to the public both the nature of the evidence and the role of scientific experts in democratic decision-making. We shouldn’t accuse them of being unscientific for this, but of trying to be too scientific — of being scientistic.

When it comes to complex, high-stakes political decisions, we cannot outsource responsibility to science. In such situations, the questions that ultimately matter most are: What are we trying to achieve? What should we be trying to achieve? Evidence and experts can help us grapple with these questions, but they cannot provide answers for us. Our leaders failed to learn this lesson during the pandemic. Their critics are in danger of unlearning it now, just because a couple of new studies appear to support their policy preferences.

We do need scientific evidence to inform policy decisions. But we also need the capacity to deliberate well about the best course of action under the circumstances at hand. This requires practical wisdom, what Aristotle called “prudence”: the ability to discern the proper ends and determine the appropriate means to achieve them. No scientific study, no matter how statistically powered or cleverly designed, can give us that.