Calculating COVID-19

How researchers are using math to predict and understand a global pandemic

Note: I researched, interviewed, and wrote the bulk of this in mid-March 2020. Unable to get it published, overwhelmed by the pandemic and the need to finish other work, I left it sitting, untouched from March 22, 2020 until about last week. I’ve fleshed out parts of the draft that were incomplete (relying solely on my notes from the time) but tried to avoid injecting future knowledge, so as to preserve it in some kind of digital amber. It is my hope that the reader finds this a useful addition to the record of epidemiological knowledge of that time.

Forty-eight confirmed cases of coronavirus in the entire world existed on January 17, 2020. The bulk—45 cases—were in Wuhan, one was in Japan, and two were in Thailand. What was there to glean from a few dozen cases inside China and three lonely data points in nearby countries? For epidemiologists, that meager information was more than enough. 

Using the three cases abroad and travel data from Wuhan, two teams of researchers worked backwards to estimate the size of the epidemic. They calculated likely infections in Wuhan based on the odds of exporting three cases to Japan and Thailand. Conservative estimates for Wuhan using the method predicted 1,250 and 1,700 infections, respectively. Worst-case scenarios ranged from 4,000 to 5,000. By the time testing in Wuhan ramped up at the end of January, the estimates proved prescient as thousands of people were discovered to have been infected.

In the two months that have passed, COVID-19 has become a full-blown pandemic. Countries like Italy and Iran have buckled under the strain of thousands of cases; others, like Singapore, South Korea, and Taiwan seem to have wrestled the virus into an uneasy submission. As of this writing, more than 180 countries around the world have confirmed cases.

Armed with data and a bevy of epidemiological approaches, scientists around the world have worked at a frenetic pace, publishing hundreds of papers about COVID-19. The research ranges from quantifying basic characteristics of the virus to creating complex models that can predict the impact of specific interventions, like school closures. 

Their findings, typically reserved for dusty journals and rarely downloaded pdfs, are now widely discussed on platforms like Twitter. Once obscure epidemiological terms like R0 now make headlines. Epidemiological research shapes how we understand the outbreak and guides the moves of policymakers around the world.

Like everything else, misinformation about the epidemiology of COVID-19 is rampant. Some non-experts perform their own data analyses and self-publish on sites like Medium where they garner millions of views; others simply turn to the op-ed pages of the New York Times. Claims that 40, or 70 percent of the world will be infected are tossed out with little to no context. 

But the research isn’t incomprehensible. It’s more than possible to get a clear grasp on what epidemiologists know about COVID-19 so far and how they learned it. Even calculations they make and the math they use can be made lucid. 

Compared to the obvious role of drugs and medical techniques, math is an unlikely hero in an outbreak. The patterns mathematical models identify can help researchers make predictions that have real practical value, such as how many cases there are likely to be, or what the efficacy of a quarantine is.

“There hasn’t been time for the sort of wet lab biological experiments to figure out all the characteristics of the virus itself,” says Kimberlyn Roosa, an epidemiological researcher at the Georgia State School of Public Health. “These studies can inform things like how long we think isolation should be, and how testing could affect when we should try and test individuals and isolate them to reduce the number of transmission.”

Mathematical models are often a dry and—at least in the media—overlooked area of research. Theory rarely draws the attention that experiment does. But when it comes to forecasting early in an epidemic, these models are often all there is to work from. Theory provides the only measure of predictability in an otherwise chaotic situation. By reducing the messy human world and viral distribution to simpler models and equations, by shrinking the aggregate of activity in a city to a parameter, a virus can be understood and eventually tamed. 

It is not a hopeless endeavor. In many cases, the math seems to work unreasonably well. Laws of disease are not hard-coded into the fabric of our universe, as, say, the speed of light or Newton’s laws of motion. But patterns turn up unerringly, and the course of an epidemic can be tracked, explained, and even predicted with math from simple growth equations to complex, multivariable models. 

This is not just a story about what the predictions made by researchers are, it is also about how the researchers made those predictions and which mathematical tools they used. In that sense, it is a story that aims to help the reader grasp the calculation of COVID-19.

And there has never been a more important time to understand the math.

The Data 

COVID-19 is the first truly 21st century pandemic. Nowhere is that clearer than in the way data about the outbreak has been tracked down, collected, disseminated, visualized, and stored. 

One of the first transmissions of data about COVID-19 occured on December 30, after Ai Fen, a doctor at Wuhan Central Hospital, received a test result for a patient. Ai informed her hospital, sent a photo of the test to colleagues with the critical information circled in red pen: “SARS coronavirus, Pseudomonas aeruginosa, 46 types of oral / respiratory colonization bacteria.” The image made its way to another doctor, Li Wenliang, who wrote “There are 7 confirmed cases of SARS at Huanan Seafood Market” to a group of 150 medical school classmates on WeChat, a popular messaging app in China. Screenshots of Li’s message circulated rapidly online, and on January 3, he was threatened with prosecution for “making false comments on the internet” by Chinese officials.

News about the infections quickly spread beyond China. The World Health Organization’s office in China was alerted to “pneumonia of unknown origin” on December 31, and by January 3, national authorities in China informed the WHO that there were 44 patients with symptoms.

At that same time, local officials ordered the destruction of virus test samples and China’s National Health Commission (NHC) prohibited publication of information about the virus. The Chinese government would sit on information about the virus’ genetic origins for a week, and it would be over two weeks before the government admitted that the virus spread from person-to-person. Starting on January 21, the NHC began to aggregate regional reports and publish plaintext daily updates with epidemiological information including confirmed infections, recoveries, deaths, and the number of close contacts tracked. 

Data is the basis for all predictions about COVID-19. No pre-existing model, no matter how sophisticated, can work in the absence of data. Previous outbreaks can guide researchers, but they aren’t a substitute for what’s happening on the ground. 

“The data we use is from those published online by the National Health Commission of China,” says Roosa. “So they publish daily and we have someone who daily goes on there and gets the data and sends it to us.” 

When a team from Johns Hopkins University (JHU) launched the first global coronavirus tracker on January 22, the world map had only a few splotches of red—mostly in China. Now, enormous red circles blot out much of the globe. By mid-February, the site had over 140 million views, and due to its popularity, copycat sites with malware on them had popped up. 

At first, the team updated the site manually, adding data to a Google spreadsheet, primarily from, a social network of physicians that has been tracking cases, aggregating reports from hospitals and municipal governments to provide a real time map of COVID-19 in China.

Data continued to pour in—from the WHO, the European Centers for Disease Control and more—so the JHU team automated most of the system to update every 15 minutes and moved the data to GitHub, a website typically used by programmers to host their code. This automation allowed it to give more timely updates than the WHO and Chinese CDC. On GitHub, the data is stored in spreadsheets that can be easily accessed by other researchers. Though the site is primarily used by coders and researchers handling data, people in China have even used GitHub as a censor-proof archive for news articles and personal accounts. 

In the U.S., data on the amount of testing—which is critical—has been difficult to track down due to the lack of centralization, which has led to efforts like the COVID-19 Tracking Project

Demand for data comes not just from policymakers and researchers, but everyone who wants to stay informed about the spread of the pandemic.

“In terms of who is using this dashboard, as far as I can tell it’s pretty much everybody,” said JHU epidemiologist Lauren Gardner in a presentation. “I think this really speaks to this huge demand for reliable, trustworthy, objective information—especially around situations like these.” 

Uncertainties about data have been around since the dawn of epidemiology. When the Swiss physicist Daniel Bernoulli developed the first mathematical model to predict the spread of smallpox in 1766, he ran into problems with data provided by the British astronomer, Edmund Halley. Though Halley listed the number of children in Breslau who made it to one year old as 1,000, he did not provide information about how many had died in their first year. “It would appear that M. Halley wished to start with a round number,” Bernoulli wrote, evidently frustrated at the lack of precision. 

Data collection is now modernized, organized in Excel spreadsheets and updated in real time, modern researchers still face many of the same uncertainties Bernoulli did. For example, many researchers are questioning whether the cases we’re seeing are really representative of the true number of infections. 

“The data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable,” Stanford data scientist John Ionannidis wrote in an op-ed. “We don’t know if we are failing to capture infections by a factor of three or 300.”

Additionally, China’s report of zero new domestic cases has caused some to wonder aloud if Chinese officials are suppressing data, in part because of the initial cover-up.

But every country, not just China, is undercounting cases because there are a large number of asymptomatic and mild cases—estimates range from 18 percent to 50 percent. Many researchers believe that without universal testing, we will miss roughly half of all infections because the virus is mild or asymptomatic in many infected individuals. Conversely, this has made it more deadly: If those who were infectious were incapacitated, the virus would be much worse at spreading.

When testing was minimal in China, prior to January 23, researchers estimate that only 14 percent of cases were documented. Once testing ramped up in February, that number jumped to about 65 percent. In other words, minimal testing only saw the tip of the iceberg, and even rigorous testing left massive gaps.

Uncertainties about the data remain, but we are far from ignorant. With more testing, the data we have will only get better and so will our understanding of the virus.

Modeling pt. 1

A quote often attributed to the Danish physicist Niels Bohr, goes something like this: “It is difficult to make predictions, especially about the future.” In no small irony, even the quote’s origins are murky. Compared to Bohr’s work making predictions in the subatomic realm, modeling outbreaks is substantially trickier, though epidemiologists emphasize it has made leaps and bounds in the past decade—similar to weather forecasting. 

“At one time hurricanes just arrived in Miami with little notice and there were no opportunities to prepare,” says Glenn Webb, a mathematical epidemiologist at Vanderbilt University. Just as meteorologists track the possible paths of a hurricane, Webb says his goal is to “predict the path of an epidemic.”

What those paths can look like depends a great deal on how infectious a disease is. Epidemiologists keep track of epidemics with a number they call R0, pronounced “R-naught.”

“R0 is just a very traditional parameter we use to estimate the number of infected cases generated by the primary cases,” says Qiao Fan, an epidemiologist at the Center for Quantitative Medicine in Singapore.

For example, an R0 of 2 means that on average, one person infects two others; an R0 of 1 means that one person usually affects only one other person. Below one, you reduce spread; above it, the disease can spiral out of control, into an epidemic. It is not a magic number, but a statistical proxy for contagiousness and exceptions do exist.

Determining what R0 is is critical to understanding the virus. But calculating the value isn’t so straightforward—there’s nothing in the virus’ genetic code that gives an unambiguous answer. R0 also depends on circumstantial factors, such as time of year and location. There are no lab experiments where infected volunteers cough near susceptible volunteers. Instead, researchers have to peer into the data and draw out a value for R0.

One way of doing this is by looking at the overall growth rate of cases and trying to work backward to find the rate of growth. In the early days of a case, researchers do this by relying on messy and limited data. The problem is that the approach lacks precision. Especially when there are many unreported cases, looking at the growth of the total number of cases is a poor approximation. More targeted approaches to sussing out R0 can use contact tracing, where researchers can look at infected individuals and count how many times they spread the virus on average. 

So, what is R0 for SARS-CoV-2? There are dozens of estimates, but most settle around 2.5 and modelers tend to use values around there as a baseline when making predictions.

At the beginning of the outbreak, multiple teams measured the outbreak in Wuhan to have R0 closer to 4 or 5. This could have been due to inadequate data, but values that high are not totally out of the question. Under conditions like a 40,000 person potluck, the virus could spread far more prolifically. Notably, after China implemented the lockdown, R0 went down, as low as .3 as each infected individual cut off routes of transmission. Some of the highest R0 calculations were found for a cruise ship, the Diamond Princess. Estimates averaged around 6-7 prior to intervention. With ultra-dense conditions—roughly 24,000 people/square kilometer—the disease spread furiously. 

Growth is also asymmetric, according to David Fisman, an epidemiologist at the University of Toronto. The characteristic shape of the number of cases in an epidemic has a steep slope upward, until a peak, and then a drop with a long tail.

“The level you get to before you shut this down is critically important not just because you had more cases getting there, but because the journey to the finish line is now that much longer, from that much higher a peak,” Fisman says. 

At the beginning of an outbreak, when answers about the nature of the virus transmission and detailed data are scarce—the incubation period and number of asymptomatic infections are as yet undetermined—researchers rely on phenomenological models which seek mainly to reproduce the growth patterns present in the data. 

“In the absence of sufficient data, it would be wise to have a model that has less model parameters involved,” says Jianhong Wu, an applied mathematician at York University in Toronto. “The phenomenological model is useful, especially when you don’t know or know much about the mechanism of transmission.”

More complicated models that rely on assumptions could go wrong, Wu argues, so fewer parameters means a smaller chance for error. These models can be as simple as an exponential curve with one parameter governing growth over time. 

Some more complex phenomenological models try to apply the data to stochastic, or random processes that mimic the spread of a virus. In Victorian England, a major concern of the ruling class was that the aristocracy was dying out. “Surnames that were once common have since become scarce or have wholly disappeared,” the statistician and founder of eugenics, Francis Galton wrote in 1875. 

To determine if this was a result of fertility problems or part of some sort of mathematical inevitability, he enlisted the help of a mathematician, the Reverend Henry William Watson, to make the calculation. Watson found that the nefarious force was math itself. If surnames were passed down patrilineally—from father to son—and each family had some chance of not having a son reach adulthood, the family tree of surnames would be continually pruned, leading to fewer surnames over time. 

This branching process turns out to apply to far more than just English surnames. It has uses for describing the survival of a genetic mutation, the start of a nuclear chain reaction, and even the spread of viruses. For an epidemic, the branching process models how each individual creates a new generation of infections. When the branch goes extinct, instead of a surname going extinct, it means that the viral transmission is cut off. One way to rephrase key questions about how contagious a disease is in terms of this branching process.

“We do the same in epidemics—just how many of these branches will go extinct?” says Gergely Rost, a mathematical epidemiologist at the University of Szeged in Hungary.

Using basic assumptions like R0, epidemiologists can calculate what the branching spread of an outbreak looks like. Because phenomenological models are simple, or rely on random spreading processes to guide them they “just kind of let the data do the talking,” according to Roosa. 

“The downside to that is with the phenomenological models, we can’t assess different intervention strategies,” she says. “For forecasting purposes, the simple models are good, but for generating scenarios, not so much.” 

Modeling pt. 2

Enter the compartmental model—so termed because it segments the population to better account for the mechanisms of spread.

“The fundamental idea is try to partition or stratify the entire population into different disjoint groups according to their epidemiological status,” says Wu. “Being susceptible, being infected, being recovered.” 

Different models use slightly different variations. From the basic SIR (susceptible infected recovered) partition—some add an exposed category. These models are critical for generating long term predictions, far beyond what can easily be extrapolated with a simple phenomenological model. 

“The usual way an epidemic is controlled is that so many people get infected, they have immunity,” says Webb. “There’s not enough susceptible people out there to support the epidemic.” 

This is the calculus that has been filtered down to predictions like “40 to 70 percent of the world will be infected” or California governor Gavin Newsom’s claim that 25 million people in California will be infected, or those which fueled the UK’s initial strategy. These sky high numbers are not a realistic scenario, according to Roosa, because they assume little to no intervention. 

But they make a certain kind of sense. From an epidemiological perspective, a virus only stops if there are no more susceptible people. For that to happen, it usually requires a huge reduction in the susceptible numbers either due to acquired immunity or a vaccine. In countries where coronavirus has been relatively controlled, like China, Taiwan, South Korea, and Japan, that is not the case. The vast majority of the population is naive, or not immune. Massive interventions have changed the course from the usual to a sort of artificially-tamped down level. 

By subdividing the population, compartmental models offer a chance to understand the mechanics behind these interventions and more.

“We’re focused on the unreported cases, and the pre-symptomatic cases that are infectious,” says Webb “There’s no doubt that there’s people who are not showing symptoms that are infectious. And also there are many unreported cases. both are a little mysterious, but we wanted to include them because they’re definitely a factor in the transmission.”

Many compartmental modelers feel that their goal is not to pinpoint equations that may perfectly forecast the outbreak, but to capture the broader dynamics of the system.

“It’s not really about being as correct as possible. It’s more about like, ‘which mechanisms are sufficient to describe the data?’” says Ben Maier, an epidemiologist at Humboldt University in Germany. Getting the mechanisms right can provide better long term forecasting than a short term prediction which is right for the wrong reasons. 

Some of these compartments have subdivisions. Wu’s model, for example, adds a “quarantine” compartment within the exposed compartment, and subdivides an exposed component into identified, confirmed, and hospitalized compartments. The goal is to reflect strategies being employed by countries like South Korea and Singapore. By creating compartments for “quarantined” and “identified” individuals, Wu’s model aims to capture the results of contact tracing, in which everyone who possibly came in contact with the infected is placed into quarantine until their test results came back. 

So, which model should be used? Phenomenological? SIR? SEIR? Here, scientists tend to agree in unsatisfying consensus: No one model is necessarily the best; each has different strengths and weaknesses—as Wu puts it, the best model “depends on what types of issues you’re trying to address.” 

The modeling situation is dynamic; predictions based on data only a few days old may be thousands of cases behind and a poor reflection of the current situation. Other models perform with remarkable accuracy weeks later. 

We know this largely due to the way the research is being done and published. Normally, epidemiological research goes through a publication process that includes months of peer review. But to accelerate their research and conversations, scientists have been using preprint servers like MedRxiv to rapidly publish papers prior to peer review and the typical publication cycle. For Fisman, who remembers how slow publication led to a lagged response to the 2014 Ebola crisis, adoption of preprints has been a “godsend.” 

As the latest round of predictions are rolling out, they are providing even more rigorous and sophisticated models of China, which has the most data, but critically, estimates about the outbreak in other countries—Singapore, South Korea, Japan, Italy, Iran, and the United States.

One conservative estimate for the U.S. based only on direct air travel from Wuhan to the U.S. found that by March 1, there could be as many as 9,484 cases. At the time of the preprint’s release on March 8, the U.S. was reporting only 500 cases.

Another key result: Interventions are massively important. A February 28 preprint estimated that the epidemic would level off around 81,000 total cases by March 21, but if interventions had been implemented one week earlier, the number would be about 6,000 total cases—ten times less. Conversely, if China had waited another week to intervene, due to the virus’ exponential spread, the total number of cases would be about 1.2 million. 

There are, at this point, dozens and dozens of papers and preprints about the epidemiology of the pandemic, but two conclusions are inescapable: interventions must be sufficiently restrictive and they must come early. If they fail in either regard, many, many more people will become sick and die.


What we know about the beginnings of SARS-CoV-2, the technical name for the virus, is that it originated in a non-human animal, and that it has great genetic similarity to coronaviruses found in regional bats. At some point in November, the infection was passed on to a human, and it spread from the now infamous Huanan Market, in Wuhan.

In December, as people began to fall sick, local doctors noticed something odd. They began to put the pieces together and send out warnings to their colleagues. At the end of December, 8 doctors including Li Wenliang attempted to get the word out. Their attempts were silenced, and the response was stalled. A month later, Li would die from coronavirus.

Xi Jingping was well aware of the epidemic early on, which he discussed in an internal meeting on January 7, though it would be 13 days until he made public comments. As of January 23, there were still fewer than 1,000 confirmed cases in Hubei province. Wuhan was placed under lockdown and severe travel restrictions placed across China. 

As of this writing, China has had a total of 80,000 confirmed cases and many of those cases have resolved and the victims have recovered. China therefore offers an almost complete trajectory of COVID-19 through a country—and there is much to learn.

We know now that the very first estimates of spread in China by Imai et al. and Chinazzi et al., which predicted thousands of untested cases were largely correct: Under minimal testing and no interventions, the virus had been spreading, well, virulently.

Over the next few weeks, multiple teams attempted to predict the cumulative number of cases in China. Some used a phenomenological model while others used a mechanistic model. Almost all teams ended up being off, in no small part due to the fact that China changed the way it counted cases from February 13 to February 19. The change caused an unexpected spike of about 15,000 cases, mostly centered in Hubei. 

“Originally they were reporting only lab-confirmed as their confirmed cases,” says Roosa. “And then on February 13, they changed it to include all those who also had clinical symptoms and hadn’t been confirmed yet,” Less than a week later, they went back to counting only lab-confirmed cases. Roosa’s model, which was accurate for the rest of China, estimated the number of cases in Hubei by Feb. 24 would be 37,000. The actual number of reported cases was 65,000.

Several models by non-epidemiologists attempting to predict the final total ended up lowballing the number of cases around 45,000–50,000. Estimates by epidemiologists were closer: 63,600 for Maier’s model, about 63,000 for Wu’s model.

It is worth reiterating that these curves which level off, suggesting an end to the epidemic in China, are not natural. Only when China implemented a lockdown on Wuhan and quarantine on the rest of China did the situation change. Wu found that prior to lockdown, R0 in Hubei was about 6.4; a week later it was 1.6; a week after that it was .4. Maier saw similar results: the number of infected grew exponentially until growth hit the brick wall of quarantine. 

Lockdowns and quarantines were initially mocked or criticized by many Western researchers who felt them unfeasible. How could such measures really work? How could a full quarantine work on 1.2 billion people, and what would be the point if the population would still be naive afterwards? Meanwhile, many Chinese epidemiologists had worked on these papers while quarantined inside their apartments. 

Wu, who was born in China, credits the success of the interventions to a culture of resilience. Other countries with democratic governments like South Korea and Taiwan have also implemented their own version of the quarantine/suppression and contact tracing to keep numbers low. 

“These measures were extreme and we viewed this as a very drastic change in transmission that occurred at a certain date,” says Webb “Before that, the transmission was constant, because there was an exponential growth phase.”

When the measures were implemented were the most important, as Webb’s Feb. 28 paper showed. Roughly speaking, interventions a week earlier would have resulted in 1/10th the number of cases, but waiting a week later would have resulted in 10x the number of cases. Other effects, like the ratio between the number of reported and unreported cases were much less consequential, only doubling the total cases. (More unreported cases implies a hard to track population that is nevertheless, infectious.) Similarly, changing the incubation period, lessening the amount of time between when people were infected and started being infectious themselves produced little change in the cumulative cases.

Getting to these measures is no small matter. It requires an immense amount of political capital. When I asked Fisman about the feasibility of implementing quarantines on March 8, he emphasized the difficulty.

“Oh my God, you can’t do that unless people are terrified,” he says. Conditions often already have to be bad enough that “they’re ventilating people in hallways, their hospitals are overfilled and people are dying. What you’re looking at with this disease is situations where you literally cannot care for people.” 

More contagious than the flu, SARS-CoV-2 is also far deadlier. If the epidemic peaked in Canada, Fisman estimates that roughly .7 percent of the adult population would require a ventilator. Canada has roughly 30 million adults, so 210,000 would require a ventilator. No country in the world has close to that many ventilators—a 2018 report estimated that at best, the U.S. could ventilate 160,000 patients.

The case is clear: Our only option to avert this dire scenario is massive suppression measures to reduce R0 and force the outbreak’s growth down to manageable levels, at which point it is possible to switch to contact tracing and containment. 

“It requires massive social distancing. That’s no football matches. That’s no concerts. That’s no school. That’s no working from the office. Anyone who can work from home is working from home and you’re shutting places down,” says Fisman. “The difficulty for Italy and for all of us, is you can do that—you can mobilize political capital to do that, when you’re in the middle of the crisis, and people are seeing deaths around them. When it’s more effective is before. That’s the problem.”

If suppression is achieved, then contract tracing and containment strategies can kick in. As Wu describes it, the strategy relies on teams who do quick tests to identify the infected, who are put into isolation. Then, investigators search for anyone the infected individual might have been in contact with while contagious. A February 11 preprint found that, given R0 of 2.5 to 3.5, roughly 70 to 90 percent of contacts would have to be traced in order to keep outbreaks suppressed. 

However, contact tracing also depends on the number of presymptomatic and asymptomatic cases and how much these cases—which are temporarily or completely unreported—drive infections.

Within China, differences between provinces are stark. While Wuhan City in Hubei was the epicenter of the crisis, the virus spread throughout China, necessitating intervention measures nationwide. Unlike Hubei, the virus did not get the chance to grow exponentially in other provinces. Interventions prevented that, and the growth remained relatively low. 

These infections in other provinces did not necessarily act in expected ways. For instance, Heilongjiang, a province in the Northeast of China—far away from Wuhan—had the highest infection growth rate of all other provinces. Conversely, Beijing, an ultra dense city, had the lowest growth rate. According to Fan, this was a result of stricter measures in Beijing than other areas. 

Tianyi Li, a Ph.D. candidate at MIT’s Sloan School of Management, used his experience modeling complex systems to come up with a network that looked at how transportation modes led to spread across China. 

“Especially in China, where the public flow is massive, you have to consider the local dynamics,” says Li. “On different transportation media, the transmissivity is different.” For example, on planes, fewer people talk, leading to less transmission. But on trains, talking amongst neighbors is common, which could lead to high transmission.

There are other differences between modes of transit. Notably, neither cars and planes have path overlap. But on a train, say, from Shanghai to Beijing, passengers will board at stops along the way. Path overlap greatly increases the chance of cross-infection, allowing the virus to disseminate widely and quickly. Li found that big cities accounted for 15 percent of transit-driven infectious, while small cities contributed nearly zero cases—it was all driven locally after an infection was imported.

One other important transit factor is the time of year: around January, Chinese New Year, hundreds of millions travel. By many estimates, it is the largest holiday travel on the globe. While some of that travel was cut short by restrictions, but some of it had already happened in January before things were locked down. 

An important value to determine from the data is the incubation period, or the time when a person is infected, but doesn’t show symptoms or is infectious. Coronavirus has proven quite short with an incubation time of roughly 5 days.


“There’s this obsession early in an epidemic with where the cases came from,” says Fisman. 

When King Charles VIII of France invaded Italy in 1494, his soldiers were stricken with a plague. After they returned home, they spread the disease far and wide. The French called it the Italian disease; the Italians called it the French disease. For the Dutch, it was the Spanish disease, for the Russians, the Polish disease, and for the Ottoman Turks, it was the Christian disease. We know it today as syphilis. 

Enmity can spread much like a virus, but there’s much to learn from other countries. In particular, exported cases can be a powerful indicator of what the spread is from the originator country. This is what Imai et al. and Chinazzi et al. did to predict cases early on in China, and it’s what Fisman’s team has able to do it for Italy and Iran. From 46 exported cases abroad and travel data, the researchers estimated that the true number of cases in Italy on Feb. 29 was roughly 4,000, not the 1128 that were counted. For Iran, they found a much larger discrepancy: 18,000 estimated cases against only 43 reported cases, as of February 23. A Hong Kong-led team came to a similar conclusion, estimating around 16,500 cases on February 25. Both countries then underwent an explosion of confirmed cases and deaths.

“Someone’s termed it ‘forensic epidemiology,’ which I quite like,” says Fisman. “When it’s a new outbreak and people are catching up with testing and they don’t know what the extent of the outbreak is inside the country, you can indirectly estimate what the size must be if you have access to travel volumes.” 

It’s a little like figuring out how much confetti was dropped at a party by counting the number of people at nearby restaurants with paper stuck to them. For Fisman, the takeaway is not that there are exactly 4,000 cases in Italy, it’s the the exported cases suggest Italy is missing a ton—maybe 75 percent of their cases.

Some countries, like Singapore, have proven remarkably resilient, and thus far avoided exponential growth, which several researchers attribute to diligent contact tracing, among other measures. One odd data point: Singapore has not required masks, which has surprised a number of researchers, including Jin Cheng, a mathematician at Fudan University in Shanghai. Cheng attributes a substantial amount of China’s success at quashing the epidemic to masks, but admits that Singapore seems to be succeeding with a different strategy. 

What’s the best way to reduce a country’s risk of an outbreak? A team of researchers at Szeged University in Hungary led by mathematical biologist Gergely Röst, estimated the best way to reduce risk while the virus was mainly in China.

“You achieve the largest reduction of your risk with the smallest effort,” says Röst. “If you have very low connectivity to China, for example, but very high local transmission potential, then the best thing to do is you reduce your connection more with China.” 

He gives the example of a soccer player who has strengths and weaknesses. According to Röst, it’s better for countries to focus on their strengths—low connectivity to countries with infection, or low local transmission potential—than try to shore up weaknesses. 

South Korea has proven its strength at reducing transmission. The country was on the way to out of control growth at the end of February, when nearly 1000 cases were reported on a single day. But by mid-March, South Korea had flattened the curve, with fewer than 100 cases reported per day. How? One strategy has been to use high-tech contact tracing, leveraging masses of mobile data to track where and who infected people might have been in contact with. The approach has raised concerns about privacy, but the results are hard to argue with. 

Transmission may also be impacted by demographic-level differences. A February 27 preprint by researchers from the University of Warwick predicted reduced transmission across Africa, central America, the Middle East and India, but high rates in Europe and Japan due to the age of the population, because older individuals are more susceptible to SARS-CoV-2.

Another factor can be temperature. The more time people spend indoors, the more likely they are to catch a virus, and cold weather is known to lower the strength of an immune system. Even so, it’s not everything. Countries like Singapore have done well primarily because of their intervention measures, says Wenbin Chen, a researcher on the Fudan University team.

A paper by a Portugeuse team found that temperature and humidity contributed only 18 percent of the epidemic’s variance. That is, the difference in viral spread between regions in China was only partially due to differences in temperature. A standard doubling time—the duration it takes for confirmed cases to double—at 20 Celsius might be 5 days, but at a steamy 40 Celsius, the doubling time should increase to roughly 7 days, slowing down growth.


There are now thousands of people dying from the virus every day. Many are old and infirm, or immunocompromised. Others are young and healthy, like Li Wenliang. When patients die, they are converted into a numeral, shifted from one column to the next. A journey in human suffering across three compartments: susceptible, infected, deceased. 

In general, epidemiologists don’t make predictions based on deaths.

“This mortality rate is not necessarily just a scaled down rate of the infection with certain delay—it’s not that simple. It’s much more complicated,” says Wu. 

Comorbidities have powerful effects on fatalities, as do age and even gender. A February 27 preprint tracking 1,590 patients in China found that nearly a quarter of patients had comorbidities such as smoking or hypertension. Men accounted for nearly 60 percent of patients with comorbidities, and patients with a comorbidity were about 15 years older than those without.

Differences exist between countries as well. China’s total fatalities have been passed by Italy, though Italy has fewer confirmed cases. According to Wu, there are two main reasons for this much higher mortality rate. First, the number of seniors is far higher in Italy than in China. The average age in Italy is about 8 years older, which has led to devastating numbers of octogenarians succumbing. 

The other is the availability of public health. In China, healthcare providers poured into Wuhan and temporary hospitals were constructed in a week. Doctors in Italy are being forced to make terrible choices, triaging patients. In some places, there are so few ventilators that if a patient was over 65 they will not receive one. 

If the virus is not controlled with intervention measures, a March 16 Imperial College study found that 2.2 million people in the U.S. die.


At this point, it is time to address a caveat. Mathematical models, even when they succeed, will not capture or predict many important intangible outcomes. There are social and economic costs of quarantines that no SEIR model is capable of approximating. Neither can they assess the emotional toll from the loss of life.

Even factors that are in principle, capable of being captured—varying degrees of industrialization, healthcare payment models—complicate matters and prevent easy mathematical generalizations. 

In 2003, the SARS epidemic hit Toronto after an elderly woman contracted the disease in Hong Kong. The city implemented strict protocols in hospitals and isolated infectious individuals, allowing it to control the virus. But when the virus seemed to be gone, and the city released the protocols, the outbreak came roaring back, eventually infecting several hundred people. 

For Fisman, a Torontonian, 2003 is not the proper historical precedent. “None of us were around in 1918. I think these are unchartered waters,” he says. The number of cases countries are currently missing is a strong indicator of how difficult the virus will be to control. 

Other researchers have similarly dire outlooks. 

“In the very long term, you somehow have to reduce the number of susceptibles. Either by vaccination or they just contract the disease,” Röst says. 

“My feeling and again—keep in mind I’m a mathematician—my feeling is this problem cannot be solved until we have the vaccine widely available,” Wu says.