Calculating COVID-19

How researchers are using math to predict and understand a global pandemic

Note: I researched, interviewed, and wrote the bulk of this in mid-March 2020. Unable to get it published, overwhelmed by the pandemic and the need to finish other work, I left it sitting, untouched from March 22, 2020 until about last week. I’ve fleshed out parts of the draft that were incomplete (relying solely on my notes from the time) but tried to avoid injecting future knowledge, so as to preserve it in some kind of digital amber. It is my hope that the reader finds this a useful addition to the record of epidemiological knowledge of that time.

Forty-eight confirmed cases of coronavirus in the entire world existed on January 17, 2020. The bulk—45 cases—were in Wuhan, one was in Japan, and two were in Thailand. What was there to glean from a few dozen cases inside China and three lonely data points in nearby countries? For epidemiologists, that meager information was more than enough. 

Using the three cases abroad and travel data from Wuhan, two teams of researchers worked backwards to estimate the size of the epidemic. They calculated likely infections in Wuhan based on the odds of exporting three cases to Japan and Thailand. Conservative estimates for Wuhan using the method predicted 1,250 and 1,700 infections, respectively. Worst-case scenarios ranged from 4,000 to 5,000. By the time testing in Wuhan ramped up at the end of January, the estimates proved prescient as thousands of people were discovered to have been infected.

In the two months that have passed, COVID-19 has become a full-blown pandemic. Countries like Italy and Iran have buckled under the strain of thousands of cases; others, like Singapore, South Korea, and Taiwan seem to have wrestled the virus into an uneasy submission. As of this writing, more than 180 countries around the world have confirmed cases.

Armed with data and a bevy of epidemiological approaches, scientists around the world have worked at a frenetic pace, publishing hundreds of papers about COVID-19. The research ranges from quantifying basic characteristics of the virus to creating complex models that can predict the impact of specific interventions, like school closures. 

Their findings, typically reserved for dusty journals and rarely downloaded pdfs, are now widely discussed on platforms like Twitter. Once obscure epidemiological terms like R0 now make headlines. Epidemiological research shapes how we understand the outbreak and guides the moves of policymakers around the world.

Like everything else, misinformation about the epidemiology of COVID-19 is rampant. Some non-experts perform their own data analyses and self-publish on sites like Medium where they garner millions of views; others simply turn to the op-ed pages of the New York Times. Claims that 40, or 70 percent of the world will be infected are tossed out with little to no context. 

But the research isn’t incomprehensible. It’s more than possible to get a clear grasp on what epidemiologists know about COVID-19 so far and how they learned it. Even calculations they make and the math they use can be made lucid. 

Compared to the obvious role of drugs and medical techniques, math is an unlikely hero in an outbreak. The patterns mathematical models identify can help researchers make predictions that have real practical value, such as how many cases there are likely to be, or what the efficacy of a quarantine is.

“There hasn’t been time for the sort of wet lab biological experiments to figure out all the characteristics of the virus itself,” says Kimberlyn Roosa, an epidemiological researcher at the Georgia State School of Public Health. “These studies can inform things like how long we think isolation should be, and how testing could affect when we should try and test individuals and isolate them to reduce the number of transmission.”

Mathematical models are often a dry and—at least in the media—overlooked area of research. Theory rarely draws the attention that experiment does. But when it comes to forecasting early in an epidemic, these models are often all there is to work from. Theory provides the only measure of predictability in an otherwise chaotic situation. By reducing the messy human world and viral distribution to simpler models and equations, by shrinking the aggregate of activity in a city to a parameter, a virus can be understood and eventually tamed. 

It is not a hopeless endeavor. In many cases, the math seems to work unreasonably well. Laws of disease are not hard-coded into the fabric of our universe, as, say, the speed of light or Newton’s laws of motion. But patterns turn up unerringly, and the course of an epidemic can be tracked, explained, and even predicted with math from simple growth equations to complex, multivariable models. 

This is not just a story about what the predictions made by researchers are, it is also about how the researchers made those predictions and which mathematical tools they used. In that sense, it is a story that aims to help the reader grasp the calculation of COVID-19.

And there has never been a more important time to understand the math.

The Data 

COVID-19 is the first truly 21st century pandemic. Nowhere is that clearer than in the way data about the outbreak has been tracked down, collected, disseminated, visualized, and stored. 

One of the first transmissions of data about COVID-19 occured on December 30, after Ai Fen, a doctor at Wuhan Central Hospital, received a test result for a patient. Ai informed her hospital, sent a photo of the test to colleagues with the critical information circled in red pen: “SARS coronavirus, Pseudomonas aeruginosa, 46 types of oral / respiratory colonization bacteria.” The image made its way to another doctor, Li Wenliang, who wrote “There are 7 confirmed cases of SARS at Huanan Seafood Market” to a group of 150 medical school classmates on WeChat, a popular messaging app in China. Screenshots of Li’s message circulated rapidly online, and on January 3, he was threatened with prosecution for “making false comments on the internet” by Chinese officials.

News about the infections quickly spread beyond China. The World Health Organization’s office in China was alerted to “pneumonia of unknown origin” on December 31, and by January 3, national authorities in China informed the WHO that there were 44 patients with symptoms.

At that same time, local officials ordered the destruction of virus test samples and China’s National Health Commission (NHC) prohibited publication of information about the virus. The Chinese government would sit on information about the virus’ genetic origins for a week, and it would be over two weeks before the government admitted that the virus spread from person-to-person. Starting on January 21, the NHC began to aggregate regional reports and publish plaintext daily updates with epidemiological information including confirmed infections, recoveries, deaths, and the number of close contacts tracked. 

Data is the basis for all predictions about COVID-19. No pre-existing model, no matter how sophisticated, can work in the absence of data. Previous outbreaks can guide researchers, but they aren’t a substitute for what’s happening on the ground. 

“The data we use is from those published online by the National Health Commission of China,” says Roosa. “So they publish daily and we have someone who daily goes on there and gets the data and sends it to us.” 

When a team from Johns Hopkins University (JHU) launched the first global coronavirus tracker on January 22, the world map had only a few splotches of red—mostly in China. Now, enormous red circles blot out much of the globe. By mid-February, the site had over 140 million views, and due to its popularity, copycat sites with malware on them had popped up. 

At first, the team updated the site manually, adding data to a Google spreadsheet, primarily from, a social network of physicians that has been tracking cases, aggregating reports from hospitals and municipal governments to provide a real time map of COVID-19 in China.

Data continued to pour in—from the WHO, the European Centers for Disease Control and more—so the JHU team automated most of the system to update every 15 minutes and moved the data to GitHub, a website typically used by programmers to host their code. This automation allowed it to give more timely updates than the WHO and Chinese CDC. On GitHub, the data is stored in spreadsheets that can be easily accessed by other researchers. Though the site is primarily used by coders and researchers handling data, people in China have even used GitHub as a censor-proof archive for news articles and personal accounts. 

In the U.S., data on the amount of testing—which is critical—has been difficult to track down due to the lack of centralization, which has led to efforts like the COVID-19 Tracking Project

Demand for data comes not just from policymakers and researchers, but everyone who wants to stay informed about the spread of the pandemic.

“In terms of who is using this dashboard, as far as I can tell it’s pretty much everybody,” said JHU epidemiologist Lauren Gardner in a presentation. “I think this really speaks to this huge demand for reliable, trustworthy, objective information—especially around situations like these.” 

Uncertainties about data have been around since the dawn of epidemiology. When the Swiss physicist Daniel Bernoulli developed the first mathematical model to predict the spread of smallpox in 1766, he ran into problems with data provided by the British astronomer, Edmund Halley. Though Halley listed the number of children in Breslau who made it to one year old as 1,000, he did not provide information about how many had died in their first year. “It would appear that M. Halley wished to start with a round number,” Bernoulli wrote, evidently frustrated at the lack of precision. 

Data collection is now modernized, organized in Excel spreadsheets and updated in real time, modern researchers still face many of the same uncertainties Bernoulli did. For example, many researchers are questioning whether the cases we’re seeing are really representative of the true number of infections. 

“The data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable,” Stanford data scientist John Ionannidis wrote in an op-ed. “We don’t know if we are failing to capture infections by a factor of three or 300.”

Additionally, China’s report of zero new domestic cases has caused some to wonder aloud if Chinese officials are suppressing data, in part because of the initial cover-up.

But every country, not just China, is undercounting cases because there are a large number of asymptomatic and mild cases—estimates range from 18 percent to 50 percent. Many researchers believe that without universal testing, we will miss roughly half of all infections because the virus is mild or asymptomatic in many infected individuals. Conversely, this has made it more deadly: If those who were infectious were incapacitated, the virus would be much worse at spreading.

When testing was minimal in China, prior to January 23, researchers estimate that only 14 percent of cases were documented. Once testing ramped up in February, that number jumped to about 65 percent. In other words, minimal testing only saw the tip of the iceberg, and even rigorous testing left massive gaps.

Uncertainties about the data remain, but we are far from ignorant. With more testing, the data we have will only get better and so will our understanding of the virus.

Modeling pt. 1

A quote often attributed to the Danish physicist Niels Bohr, goes something like this: “It is difficult to make predictions, especially about the future.” In no small irony, even the quote’s origins are murky. Compared to Bohr’s work making predictions in the subatomic realm, modeling outbreaks is substantially trickier, though epidemiologists emphasize it has made leaps and bounds in the past decade—similar to weather forecasting. 

“At one time hurricanes just arrived in Miami with little notice and there were no opportunities to prepare,” says Glenn Webb, a mathematical epidemiologist at Vanderbilt University. Just as meteorologists track the possible paths of a hurricane, Webb says his goal is to “predict the path of an epidemic.”

What those paths can look like depends a great deal on how infectious a disease is. Epidemiologists keep track of epidemics with a number they call R0, pronounced “R-naught.”

“R0 is just a very traditional parameter we use to estimate the number of infected cases generated by the primary cases,” says Qiao Fan, an epidemiologist at the Center for Quantitative Medicine in Singapore.

For example, an R0 of 2 means that on average, one person infects two others; an R0 of 1 means that one person usually affects only one other person. Below one, you reduce spread; above it, the disease can spiral out of control, into an epidemic. It is not a magic number, but a statistical proxy for contagiousness and exceptions do exist.

Determining what R0 is is critical to understanding the virus. But calculating the value isn’t so straightforward—there’s nothing in the virus’ genetic code that gives an unambiguous answer. R0 also depends on circumstantial factors, such as time of year and location. There are no lab experiments where infected volunteers cough near susceptible volunteers. Instead, researchers have to peer into the data and draw out a value for R0.

One way of doing this is by looking at the overall growth rate of cases and trying to work backward to find the rate of growth. In the early days of a case, researchers do this by relying on messy and limited data. The problem is that the approach lacks precision. Especially when there are many unreported cases, looking at the growth of the total number of cases is a poor approximation. More targeted approaches to sussing out R0 can use contact tracing, where researchers can look at infected individuals and count how many times they spread the virus on average. 

So, what is R0 for SARS-CoV-2? There are dozens of estimates, but most settle around 2.5 and modelers tend to use values around there as a baseline when making predictions.

At the beginning of the outbreak, multiple teams measured the outbreak in Wuhan to have R0 closer to 4 or 5. This could have been due to inadequate data, but values that high are not totally out of the question. Under conditions like a 40,000 person potluck, the virus could spread far more prolifically. Notably, after China implemented the lockdown, R0 went down, as low as .3 as each infected individual cut off routes of transmission. Some of the highest R0 calculations were found for a cruise ship, the Diamond Princess. Estimates averaged around 6-7 prior to intervention. With ultra-dense conditions—roughly 24,000 people/square kilometer—the disease spread furiously. 

Growth is also asymmetric, according to David Fisman, an epidemiologist at the University of Toronto. The characteristic shape of the number of cases in an epidemic has a steep slope upward, until a peak, and then a drop with a long tail.

“The level you get to before you shut this down is critically important not just because you had more cases getting there, but because the journey to the finish line is now that much longer, from that much higher a peak,” Fisman says. 

At the beginning of an outbreak, when answers about the nature of the virus transmission and detailed data are scarce—the incubation period and number of asymptomatic infections are as yet undetermined—researchers rely on phenomenological models which seek mainly to reproduce the growth patterns present in the data. 

“In the absence of sufficient data, it would be wise to have a model that has less model parameters involved,” says Jianhong Wu, an applied mathematician at York University in Toronto. “The phenomenological model is useful, especially when you don’t know or know much about the mechanism of transmission.”

More complicated models that rely on assumptions could go wrong, Wu argues, so fewer parameters means a smaller chance for error. These models can be as simple as an exponential curve with one parameter governing growth over time. 

Some more complex phenomenological models try to apply the data to stochastic, or random processes that mimic the spread of a virus. In Victorian England, a major concern of the ruling class was that the aristocracy was dying out. “Surnames that were once common have since become scarce or have wholly disappeared,” the statistician and founder of eugenics, Francis Galton wrote in 1875. 

To determine if this was a result of fertility problems or part of some sort of mathematical inevitability, he enlisted the help of a mathematician, the Reverend Henry William Watson, to make the calculation. Watson found that the nefarious force was math itself. If surnames were passed down patrilineally—from father to son—and each family had some chance of not having a son reach adulthood, the family tree of surnames would be continually pruned, leading to fewer surnames over time. 

This branching process turns out to apply to far more than just English surnames. It has uses for describing the survival of a genetic mutation, the start of a nuclear chain reaction, and even the spread of viruses. For an epidemic, the branching process models how each individual creates a new generation of infections. When the branch goes extinct, instead of a surname going extinct, it means that the viral transmission is cut off. One way to rephrase key questions about how contagious a disease is in terms of this branching process.

“We do the same in epidemics—just how many of these branches will go extinct?” says Gergely Rost, a mathematical epidemiologist at the University of Szeged in Hungary.

Using basic assumptions like R0, epidemiologists can calculate what the branching spread of an outbreak looks like. Because phenomenological models are simple, or rely on random spreading processes to guide them they “just kind of let the data do the talking,” according to Roosa. 

“The downside to that is with the phenomenological models, we can’t assess different intervention strategies,” she says. “For forecasting purposes, the simple models are good, but for generating scenarios, not so much.” 

Modeling pt. 2

Enter the compartmental model—so termed because it segments the population to better account for the mechanisms of spread.

“The fundamental idea is try to partition or stratify the entire population into different disjoint groups according to their epidemiological status,” says Wu. “Being susceptible, being infected, being recovered.” 

Different models use slightly different variations. From the basic SIR (susceptible infected recovered) partition—some add an exposed category. These models are critical for generating long term predictions, far beyond what can easily be extrapolated with a simple phenomenological model. 

“The usual way an epidemic is controlled is that so many people get infected, they have immunity,” says Webb. “There’s not enough susceptible people out there to support the epidemic.” 

This is the calculus that has been filtered down to predictions like “40 to 70 percent of the world will be infected” or California governor Gavin Newsom’s claim that 25 million people in California will be infected, or those which fueled the UK’s initial strategy. These sky high numbers are not a realistic scenario, according to Roosa, because they assume little to no intervention. 

But they make a certain kind of sense. From an epidemiological perspective, a virus only stops if there are no more susceptible people. For that to happen, it usually requires a huge reduction in the susceptible numbers either due to acquired immunity or a vaccine. In countries where coronavirus has been relatively controlled, like China, Taiwan, South Korea, and Japan, that is not the case. The vast majority of the population is naive, or not immune. Massive interventions have changed the course from the usual to a sort of artificially-tamped down level. 

By subdividing the population, compartmental models offer a chance to understand the mechanics behind these interventions and more.

“We’re focused on the unreported cases, and the pre-symptomatic cases that are infectious,” says Webb “There’s no doubt that there’s people who are not showing symptoms that are infectious. And also there are many unreported cases. both are a little mysterious, but we wanted to include them because they’re definitely a factor in the transmission.”

Many compartmental modelers feel that their goal is not to pinpoint equations that may perfectly forecast the outbreak, but to capture the broader dynamics of the system.

“It’s not really about being as correct as possible. It’s more about like, ‘which mechanisms are sufficient to describe the data?’” says Ben Maier, an epidemiologist at Humboldt University in Germany. Getting the mechanisms right can provide better long term forecasting than a short term prediction which is right for the wrong reasons. 

Some of these compartments have subdivisions. Wu’s model, for example, adds a “quarantine” compartment within the exposed compartment, and subdivides an exposed component into identified, confirmed, and hospitalized compartments. The goal is to reflect strategies being employed by countries like South Korea and Singapore. By creating compartments for “quarantined” and “identified” individuals, Wu’s model aims to capture the results of contact tracing, in which everyone who possibly came in contact with the infected is placed into quarantine until their test results came back. 

So, which model should be used? Phenomenological? SIR? SEIR? Here, scientists tend to agree in unsatisfying consensus: No one model is necessarily the best; each has different strengths and weaknesses—as Wu puts it, the best model “depends on what types of issues you’re trying to address.” 

The modeling situation is dynamic; predictions based on data only a few days old may be thousands of cases behind and a poor reflection of the current situation. Other models perform with remarkable accuracy weeks later. 

We know this largely due to the way the research is being done and published. Normally, epidemiological research goes through a publication process that includes months of peer review. But to accelerate their research and conversations, scientists have been using preprint servers like MedRxiv to rapidly publish papers prior to peer review and the typical publication cycle. For Fisman, who remembers how slow publication led to a lagged response to the 2014 Ebola crisis, adoption of preprints has been a “godsend.” 

As the latest round of predictions are rolling out, they are providing even more rigorous and sophisticated models of China, which has the most data, but critically, estimates about the outbreak in other countries—Singapore, South Korea, Japan, Italy, Iran, and the United States.

One conservative estimate for the U.S. based only on direct air travel from Wuhan to the U.S. found that by March 1, there could be as many as 9,484 cases. At the time of the preprint’s release on March 8, the U.S. was reporting only 500 cases.

Another key result: Interventions are massively important. A February 28 preprint estimated that the epidemic would level off around 81,000 total cases by March 21, but if interventions had been implemented one week earlier, the number would be about 6,000 total cases—ten times less. Conversely, if China had waited another week to intervene, due to the virus’ exponential spread, the total number of cases would be about 1.2 million. 

There are, at this point, dozens and dozens of papers and preprints about the epidemiology of the pandemic, but two conclusions are inescapable: interventions must be sufficiently restrictive and they must come early. If they fail in either regard, many, many more people will become sick and die.


What we know about the beginnings of SARS-CoV-2, the technical name for the virus, is that it originated in a non-human animal, and that it has great genetic similarity to coronaviruses found in regional bats. At some point in November, the infection was passed on to a human, and it spread from the now infamous Huanan Market, in Wuhan.

In December, as people began to fall sick, local doctors noticed something odd. They began to put the pieces together and send out warnings to their colleagues. At the end of December, 8 doctors including Li Wenliang attempted to get the word out. Their attempts were silenced, and the response was stalled. A month later, Li would die from coronavirus.

Xi Jingping was well aware of the epidemic early on, which he discussed in an internal meeting on January 7, though it would be 13 days until he made public comments. As of January 23, there were still fewer than 1,000 confirmed cases in Hubei province. Wuhan was placed under lockdown and severe travel restrictions placed across China. 

As of this writing, China has had a total of 80,000 confirmed cases and many of those cases have resolved and the victims have recovered. China therefore offers an almost complete trajectory of COVID-19 through a country—and there is much to learn.

We know now that the very first estimates of spread in China by Imai et al. and Chinazzi et al., which predicted thousands of untested cases were largely correct: Under minimal testing and no interventions, the virus had been spreading, well, virulently.

Over the next few weeks, multiple teams attempted to predict the cumulative number of cases in China. Some used a phenomenological model while others used a mechanistic model. Almost all teams ended up being off, in no small part due to the fact that China changed the way it counted cases from February 13 to February 19. The change caused an unexpected spike of about 15,000 cases, mostly centered in Hubei. 

“Originally they were reporting only lab-confirmed as their confirmed cases,” says Roosa. “And then on February 13, they changed it to include all those who also had clinical symptoms and hadn’t been confirmed yet,” Less than a week later, they went back to counting only lab-confirmed cases. Roosa’s model, which was accurate for the rest of China, estimated the number of cases in Hubei by Feb. 24 would be 37,000. The actual number of reported cases was 65,000.

Several models by non-epidemiologists attempting to predict the final total ended up lowballing the number of cases around 45,000–50,000. Estimates by epidemiologists were closer: 63,600 for Maier’s model, about 63,000 for Wu’s model.

It is worth reiterating that these curves which level off, suggesting an end to the epidemic in China, are not natural. Only when China implemented a lockdown on Wuhan and quarantine on the rest of China did the situation change. Wu found that prior to lockdown, R0 in Hubei was about 6.4; a week later it was 1.6; a week after that it was .4. Maier saw similar results: the number of infected grew exponentially until growth hit the brick wall of quarantine. 

Lockdowns and quarantines were initially mocked or criticized by many Western researchers who felt them unfeasible. How could such measures really work? How could a full quarantine work on 1.2 billion people, and what would be the point if the population would still be naive afterwards? Meanwhile, many Chinese epidemiologists had worked on these papers while quarantined inside their apartments. 

Wu, who was born in China, credits the success of the interventions to a culture of resilience. Other countries with democratic governments like South Korea and Taiwan have also implemented their own version of the quarantine/suppression and contact tracing to keep numbers low. 

“These measures were extreme and we viewed this as a very drastic change in transmission that occurred at a certain date,” says Webb “Before that, the transmission was constant, because there was an exponential growth phase.”

When the measures were implemented were the most important, as Webb’s Feb. 28 paper showed. Roughly speaking, interventions a week earlier would have resulted in 1/10th the number of cases, but waiting a week later would have resulted in 10x the number of cases. Other effects, like the ratio between the number of reported and unreported cases were much less consequential, only doubling the total cases. (More unreported cases implies a hard to track population that is nevertheless, infectious.) Similarly, changing the incubation period, lessening the amount of time between when people were infected and started being infectious themselves produced little change in the cumulative cases.

Getting to these measures is no small matter. It requires an immense amount of political capital. When I asked Fisman about the feasibility of implementing quarantines on March 8, he emphasized the difficulty.

“Oh my God, you can’t do that unless people are terrified,” he says. Conditions often already have to be bad enough that “they’re ventilating people in hallways, their hospitals are overfilled and people are dying. What you’re looking at with this disease is situations where you literally cannot care for people.” 

More contagious than the flu, SARS-CoV-2 is also far deadlier. If the epidemic peaked in Canada, Fisman estimates that roughly .7 percent of the adult population would require a ventilator. Canada has roughly 30 million adults, so 210,000 would require a ventilator. No country in the world has close to that many ventilators—a 2018 report estimated that at best, the U.S. could ventilate 160,000 patients.

The case is clear: Our only option to avert this dire scenario is massive suppression measures to reduce R0 and force the outbreak’s growth down to manageable levels, at which point it is possible to switch to contact tracing and containment. 

“It requires massive social distancing. That’s no football matches. That’s no concerts. That’s no school. That’s no working from the office. Anyone who can work from home is working from home and you’re shutting places down,” says Fisman. “The difficulty for Italy and for all of us, is you can do that—you can mobilize political capital to do that, when you’re in the middle of the crisis, and people are seeing deaths around them. When it’s more effective is before. That’s the problem.”

If suppression is achieved, then contract tracing and containment strategies can kick in. As Wu describes it, the strategy relies on teams who do quick tests to identify the infected, who are put into isolation. Then, investigators search for anyone the infected individual might have been in contact with while contagious. A February 11 preprint found that, given R0 of 2.5 to 3.5, roughly 70 to 90 percent of contacts would have to be traced in order to keep outbreaks suppressed. 

However, contact tracing also depends on the number of presymptomatic and asymptomatic cases and how much these cases—which are temporarily or completely unreported—drive infections.

Within China, differences between provinces are stark. While Wuhan City in Hubei was the epicenter of the crisis, the virus spread throughout China, necessitating intervention measures nationwide. Unlike Hubei, the virus did not get the chance to grow exponentially in other provinces. Interventions prevented that, and the growth remained relatively low. 

These infections in other provinces did not necessarily act in expected ways. For instance, Heilongjiang, a province in the Northeast of China—far away from Wuhan—had the highest infection growth rate of all other provinces. Conversely, Beijing, an ultra dense city, had the lowest growth rate. According to Fan, this was a result of stricter measures in Beijing than other areas. 

Tianyi Li, a Ph.D. candidate at MIT’s Sloan School of Management, used his experience modeling complex systems to come up with a network that looked at how transportation modes led to spread across China. 

“Especially in China, where the public flow is massive, you have to consider the local dynamics,” says Li. “On different transportation media, the transmissivity is different.” For example, on planes, fewer people talk, leading to less transmission. But on trains, talking amongst neighbors is common, which could lead to high transmission.

There are other differences between modes of transit. Notably, neither cars and planes have path overlap. But on a train, say, from Shanghai to Beijing, passengers will board at stops along the way. Path overlap greatly increases the chance of cross-infection, allowing the virus to disseminate widely and quickly. Li found that big cities accounted for 15 percent of transit-driven infectious, while small cities contributed nearly zero cases—it was all driven locally after an infection was imported.

One other important transit factor is the time of year: around January, Chinese New Year, hundreds of millions travel. By many estimates, it is the largest holiday travel on the globe. While some of that travel was cut short by restrictions, but some of it had already happened in January before things were locked down. 

An important value to determine from the data is the incubation period, or the time when a person is infected, but doesn’t show symptoms or is infectious. Coronavirus has proven quite short with an incubation time of roughly 5 days.


“There’s this obsession early in an epidemic with where the cases came from,” says Fisman. 

When King Charles VIII of France invaded Italy in 1494, his soldiers were stricken with a plague. After they returned home, they spread the disease far and wide. The French called it the Italian disease; the Italians called it the French disease. For the Dutch, it was the Spanish disease, for the Russians, the Polish disease, and for the Ottoman Turks, it was the Christian disease. We know it today as syphilis. 

Enmity can spread much like a virus, but there’s much to learn from other countries. In particular, exported cases can be a powerful indicator of what the spread is from the originator country. This is what Imai et al. and Chinazzi et al. did to predict cases early on in China, and it’s what Fisman’s team has able to do it for Italy and Iran. From 46 exported cases abroad and travel data, the researchers estimated that the true number of cases in Italy on Feb. 29 was roughly 4,000, not the 1128 that were counted. For Iran, they found a much larger discrepancy: 18,000 estimated cases against only 43 reported cases, as of February 23. A Hong Kong-led team came to a similar conclusion, estimating around 16,500 cases on February 25. Both countries then underwent an explosion of confirmed cases and deaths.

“Someone’s termed it ‘forensic epidemiology,’ which I quite like,” says Fisman. “When it’s a new outbreak and people are catching up with testing and they don’t know what the extent of the outbreak is inside the country, you can indirectly estimate what the size must be if you have access to travel volumes.” 

It’s a little like figuring out how much confetti was dropped at a party by counting the number of people at nearby restaurants with paper stuck to them. For Fisman, the takeaway is not that there are exactly 4,000 cases in Italy, it’s the the exported cases suggest Italy is missing a ton—maybe 75 percent of their cases.

Some countries, like Singapore, have proven remarkably resilient, and thus far avoided exponential growth, which several researchers attribute to diligent contact tracing, among other measures. One odd data point: Singapore has not required masks, which has surprised a number of researchers, including Jin Cheng, a mathematician at Fudan University in Shanghai. Cheng attributes a substantial amount of China’s success at quashing the epidemic to masks, but admits that Singapore seems to be succeeding with a different strategy. 

What’s the best way to reduce a country’s risk of an outbreak? A team of researchers at Szeged University in Hungary led by mathematical biologist Gergely Röst, estimated the best way to reduce risk while the virus was mainly in China.

“You achieve the largest reduction of your risk with the smallest effort,” says Röst. “If you have very low connectivity to China, for example, but very high local transmission potential, then the best thing to do is you reduce your connection more with China.” 

He gives the example of a soccer player who has strengths and weaknesses. According to Röst, it’s better for countries to focus on their strengths—low connectivity to countries with infection, or low local transmission potential—than try to shore up weaknesses. 

South Korea has proven its strength at reducing transmission. The country was on the way to out of control growth at the end of February, when nearly 1000 cases were reported on a single day. But by mid-March, South Korea had flattened the curve, with fewer than 100 cases reported per day. How? One strategy has been to use high-tech contact tracing, leveraging masses of mobile data to track where and who infected people might have been in contact with. The approach has raised concerns about privacy, but the results are hard to argue with. 

Transmission may also be impacted by demographic-level differences. A February 27 preprint by researchers from the University of Warwick predicted reduced transmission across Africa, central America, the Middle East and India, but high rates in Europe and Japan due to the age of the population, because older individuals are more susceptible to SARS-CoV-2.

Another factor can be temperature. The more time people spend indoors, the more likely they are to catch a virus, and cold weather is known to lower the strength of an immune system. Even so, it’s not everything. Countries like Singapore have done well primarily because of their intervention measures, says Wenbin Chen, a researcher on the Fudan University team.

A paper by a Portugeuse team found that temperature and humidity contributed only 18 percent of the epidemic’s variance. That is, the difference in viral spread between regions in China was only partially due to differences in temperature. A standard doubling time—the duration it takes for confirmed cases to double—at 20 Celsius might be 5 days, but at a steamy 40 Celsius, the doubling time should increase to roughly 7 days, slowing down growth.


There are now thousands of people dying from the virus every day. Many are old and infirm, or immunocompromised. Others are young and healthy, like Li Wenliang. When patients die, they are converted into a numeral, shifted from one column to the next. A journey in human suffering across three compartments: susceptible, infected, deceased. 

In general, epidemiologists don’t make predictions based on deaths.

“This mortality rate is not necessarily just a scaled down rate of the infection with certain delay—it’s not that simple. It’s much more complicated,” says Wu. 

Comorbidities have powerful effects on fatalities, as do age and even gender. A February 27 preprint tracking 1,590 patients in China found that nearly a quarter of patients had comorbidities such as smoking or hypertension. Men accounted for nearly 60 percent of patients with comorbidities, and patients with a comorbidity were about 15 years older than those without.

Differences exist between countries as well. China’s total fatalities have been passed by Italy, though Italy has fewer confirmed cases. According to Wu, there are two main reasons for this much higher mortality rate. First, the number of seniors is far higher in Italy than in China. The average age in Italy is about 8 years older, which has led to devastating numbers of octogenarians succumbing. 

The other is the availability of public health. In China, healthcare providers poured into Wuhan and temporary hospitals were constructed in a week. Doctors in Italy are being forced to make terrible choices, triaging patients. In some places, there are so few ventilators that if a patient was over 65 they will not receive one. 

If the virus is not controlled with intervention measures, a March 16 Imperial College study found that 2.2 million people in the U.S. die.


At this point, it is time to address a caveat. Mathematical models, even when they succeed, will not capture or predict many important intangible outcomes. There are social and economic costs of quarantines that no SEIR model is capable of approximating. Neither can they assess the emotional toll from the loss of life.

Even factors that are in principle, capable of being captured—varying degrees of industrialization, healthcare payment models—complicate matters and prevent easy mathematical generalizations. 

In 2003, the SARS epidemic hit Toronto after an elderly woman contracted the disease in Hong Kong. The city implemented strict protocols in hospitals and isolated infectious individuals, allowing it to control the virus. But when the virus seemed to be gone, and the city released the protocols, the outbreak came roaring back, eventually infecting several hundred people. 

For Fisman, a Torontonian, 2003 is not the proper historical precedent. “None of us were around in 1918. I think these are unchartered waters,” he says. The number of cases countries are currently missing is a strong indicator of how difficult the virus will be to control. 

Other researchers have similarly dire outlooks. 

“In the very long term, you somehow have to reduce the number of susceptibles. Either by vaccination or they just contract the disease,” Röst says. 

“My feeling and again—keep in mind I’m a mathematician—my feeling is this problem cannot be solved until we have the vaccine widely available,” Wu says.

How to write an email to a researcher you’ve never spoken to before

Since I’ve gotten asked about careers in science writing/journalism twice in the past week, I’ve been hunting down basic resources (what is science writing, how to pitch, where a science writing career stats) from excellent sites like The Open Notebook to help get folks started.

But this is a particularly basic question—so basic that people usually don’t ask it and (IMO) it doesn’t get a lot of good answers. Here’s my take.


Media Inquiry: Interesting Research

You want to clearly label your email as a media email—ideally from a specific publication, but if you’re a freelancer and not sure where it will appear, “Media” is just fine. You also want to make the topic of the email clear. Specific keywords that are relevant to their specific research are often helpful. For example, it might be better to include “Penrose process” than just “black hole” in the subject line. A more specific topic is more relevant to them and means your email is more likely to be read.


Dear Dr. So and So,

Titles can be tricky. On first contact, I always use Dr. (as opposed to Prof.) unless I am positive they don’t have a PhD. If there are three people or fewer, use Drs. If for some reason there are more than three you can address it to “all.” Keep in mind that you generally want to avoid sending a single email to more than 3 or so researchers—things get messy. (One or two really is best.) Make sure to double check that you have spelled their name(s) correctly before sending.


My name is Dan Garisto and I’m a freelance science journalist currently on assignment with Such and Such publication writing about [topic of interest].

You want to convey who you are and what you’re knocking on their door about, generally within a sentence or two. Often you’ll want to add a clarifying sentence about the article you’re writing.

In particular, I’m hoping to give readers a glimpse of [topic] from [relatively under-reported angle].

Sometimes, but not always, you’ll want to prove your credentials upfront with the appropriate links.

I’ve previously written about [topic] here, here, and here.


I’m reaching out because of your work on [topic of interest], especially [somewhat recent paper].

In some ways, this is the most important sentence of your entire email. It’s one thing to receive a cold email from a science writer asking to talk; it’s another if they link to a highly specific (and relevant!) paper you published 18 months ago which has 3 citations. Linking to their relevant research demonstrates that you’ve actually done your homework. It’s an investment of your time into them; it shows you have genuine interest. They are so much more likely to respond if you do this.

Another possible reason:
I’m emailing because So and So said you were the expert to talk to about [topic].

Slightly less good:
Your university bio said you had expertise in [topic] and [related topic].


I was hoping to speak with you about topic.

This is maybe the least important sentence of the entire email. Don’t spend too much time on it. That you want their time is implicit; how you explicitly state that you want it is somewhat less important. That said, a couple variants to keep in mind:

I was wondering if you’d be willing to look over [forthcoming paper from another researcher] and share your thoughts with me.

Rather than emailing multiple people, it’s often easier to put this request in the ask. Also a good way to diversify your sources.
Would you or someone in your lab/one of your coauthors have time to chat?

My schedule is pretty flexible later this week and I’m available via Skype/Zoom/phone. Could you let me know if there are any times that work for you?

Be clear about your availability, but on the first email, don’t list every time that you’re available. It’s messy and presumes a bit too much. Sometimes you’re in a crunch. Be upfront about that too.

Unfortunately I’m on deadline and I really need to get a draft to my editor by tomorrow morning or she’ll have my hide. I know this is a tough ask, but would you have time later today?

There are dozens of other permutations here, but the important thing is to remember to be gracious. Nobody owes you their time.


Looking forward to hearing from you.

This one is totally up to you. “Thanks for your time” works just as well.




Media Inquiry: Interesting Research

Dear Dr. So and So,

My name is Dan Garisto and I’m a freelance science journalist currently on assignment with Such and Such publication writing about [topic of interest]. In particular, I’m hoping to give readers a glimpse of [topic] from [relatively under-reported angle].

I’m reaching out because of your work on [topic of interest], especially [somewhat recent paper].

I was wondering if you’d be willing to look over [forthcoming paper from another researcher] and share your thoughts with me.

My schedule is pretty flexible later this week and I’m available via Skype/Zoom/phone. Could you let me know if there are any times that work for you?

Looking forward to hearing from you.


I’ll update this later if I think of stuff. But for now, that’s it.


This was not the election of a healthy democracy. A complete list of the undemocratic measures taken before, during, and after this election would stretch for pages. The proximate cause for so many—including the attempted coup—was Donald Trump. Following his lead, Republican voters and officials, all the way up to sitting senators, have conjured up a phantasmagoria where the only explanation for defeat is a conspiracy of rampant voter fraud around every corner. Where democracy is defined as a subset of the population; a herrenvolk. Theirs is a fever dream motivated by an explicit rejection of both democratic values and the intransigent reality of the ballot box. At roughly 31 cases in a billion ballots, voter fraud is astronomically rare. Voter suppression of minorities, meanwhile, remains commonplace.

It would not be unreasonable, in spite of efforts to throw out votes, intimidate election officials, spread lie after lie after lie until blood was spilled on the Capitol steps, to believe that this was a more undemocratic election than most in the nation’s history.

I want to first be clear about what I am not saying. I’m not trying to issue panglossian polemic, in the style of Steven Pinker. I am not here to acknowledge wrongs American electoral history only insofar as they serve to illustrate a vague ideal of neoliberal progress. I am, frankly, not abundantly optimistic about the future of American democracy, which is likely to remain yoked to undemocratic institutions like the Senate and Electoral College for the foreseeable future.

That said: I’d like to make the case that in spite of it all, this was in fact the most democratic general election in the history of the United States.

A caveat of sorts: I’m not a historian, and my work as a journalist is mostly restricted to science—particularly physics. In short, this is not my expertise. But seeing as the data bears it out, and nobody else seems to have forcefully articulated the point, I thought I’d try my hand at it.

Total vote as a percentage of population (blue) and winner’s vote as a percentage of population (red) for U.S. president. Note the increases around 1820, 1865, and 1920.

If you wanted to tell the story of democracy in America over the past 231 years—all of its fits and starts, flaws and virtues, banalities and oddities, triumphs and tragedies—you could do worse than the following graph.

What this chart illustrates, perhaps reductively, is that American democracy has not always been so. Elections today bear little resemblance to those a century ago, let alone two centuries ago. We the people were not we the voters.

With absentee ballots finally counted, President Joe Biden has topped 81 million votes, which is not only the highest total ever, but also—by a small margin—the highest total as a percentage of population. Perhaps more importantly, the total vote in 2020 nearly scraped 50%, about 5% higher than the previous record in 2008.

In the sense that democracy refers to “a system of government by the whole population or all the eligible members of a state, typically through elected representatives,” elections closest to the democratic ideal are those that in which more of the population votes, not less. 2012 was a more democratic election than 1912; 1912 was more democratic than 1812. This is a simple, mathematically ineluctable definition without caveat or context, but it works.

And it works because increases in participation are so deeply entwined with expansion of the franchise that we can see, in the dips and rises of that graph, milestones of suffrage: universal white male suffrage in the 1820s, the partial success of the 15th Amendment after the Civil War, the resounding increase on the passage of the 19th Amendment, and the impact of Civil Rights legislation with which the U.S. first became a multiracial democracy.

There are plenty of confounding variables, and enormous setbacks remain. Roughly 5 million Americans remain disenfranchised due to a felony. Shelby v. Holder needs to be counteracted with a new VRA. Automatic voter registration and the repeal of voter I.D. laws are a must.

Electoral politics are not the be-all end-all of a democracy, and maybe not a “lifeblood” (or even hemolymph) but they are a sort of transmission fluid—a substance that allows for the continued maintenance and survival of the system. Better that it flow freely.

I’ll hopefully update this later with some more historical context to flesh out the point, but for now, it’s

Lorem ipsum yadda yadda

The first blog post on a new website is probably more akin to the befouling of a pristine litterbox than the Platonic ideal of a writing: the crisp scrawl of a black pen across ecru pages of a notepad. (If I knew more about typewriters I’d throw their aficionados a bone, but alas.)

And yet, it must be done. Especially if your new website comes with it all set up ahead of time like some sort of Calvinist imperative to blog.

Anyways, here’s to future words.

Oh, and I took this picture of a ground story window somewhere near Stuyvesant Village on the Lower East Side. I’m afraid there’s no context for it, but I am fond of it.