COVID-19 Data Myopia

People around the world are hooked on following daily updates of confirmed COVID-19 cases and their related deaths. Extracting various conclusions and projections is omnipresent these days. However, not enough attention is given to understanding what these numbers really mean and what we can learn from them, while at the same time avoiding over-interpretation.

Let’s briefly touch upon this issue. The number of confirmed cases is not an easy number to work with. Different countries apply different strategies on how many tests to take and whom to test (e.g. only people with symptoms or anyone who was in a contact with an infected person) as well as on the capacity of a country to secure enough tests. Not surprisingly, this number can be inconsistent in time and between countries. It does not necessarily represent a consistent fraction of the total number of infected and, therefore, it is hard to make any conclusion from it except to look at its trends within each country (or US state) separately, while being informed on possible changes in the testing policy (e.g. the case of faulty tests in the US that was followed by a new round of increased testing efforts).

Unfortunately, the number of deaths is a far more informative dataset, but it too must be considered with caution. Even though there are some unified rules on how to attribute deaths to a disease, inconsistencies are possible. More importantly, not all COVID-19 deaths are registered as such. When the virus infects a large fraction of a community, the local medical services collapse under the flood of people in need of hospitalization. The lock-down is then particularly harsh on vulnerable social groups, such as the elderly population living home alone or people with risky medical preconditions. Many people die in their homes without ever being tested for the SARS-Cov-2 virus. Some people also died simply because they were not able to access their regular medical services. The extent of this effect is a matter of debate, but for illustration, the mayor of Bergamo, the hardest-hit region of Italy, claims that the number of COVID-19 victims is 4 times higher in his town than the official numbers. Similar stories come from FranceSpainUKGermanyChina and the US. On top of that, many undemocratic governments have decided to suppress the true extent of the pandemic in their countries as they fear political unrests that could topple their regimes.

Regardless of this grim warning on the validity of the official death numbers, the published numbers are intrinsically consistent in a way that deaths inspected by medical personnel will be checked for signs of COVID-19. Thanks to that we can treat the number of deaths as a proxy of how much the virus has spread through the community. Even if you do not know how deadly the virus is or if the number of confirmed cases is trustworthy, you can observe the pressure on hospitals and the number of deaths to see if the epidemic is slowing down or not. What we want to show here is how to apply a typical epidemiological fit to the death growth rate and make estimates on the final death toll and how long the state measures need to be imposed.

Factors affecting death rates

Before we embark on this exercise, there are still a lot of factors affecting the death trends that one should be aware of when discussing comparisons between countries, or even between regions within the same country. For example:

  • There is a delay between the peak of infection and the daily rate of deaths. A disease has its pace of creating symptoms and escalating to life-threatening levels. Not only that — the growth of confirmed cases is also lagging behind the actual growth of infected, which is a stark warning to countries that are still rising fast in their number of confirmed cases.
  • The death growth will strongly depend on the imposed state measures and how strictly the population is following them. Cultural differences can play a dramatic difference between countries as people react differently to the imposed restrictions on freedom of movement or privacy.
  • Different countries (or even regions within a country or a province) have different capacities of their health systems to cope with the potential tsunami of infected people in need of hospitalization. At some point the hospitals will have to start using a medical triage approach and then the death rate will increase simply because many people will die while waiting to be treated. The stories from Italy epitomizing this issue are heartbreaking and should be a strong warning to anyone ignoring the severity of this disease.
  • When the first deaths occur they are essentially random events. Typically, people with very risky preconditions, such as oncological patients. Epidemiological trends on such small random events are not plausible and should be avoided. Hence, the growth of deaths should start showing some visible trends over several days, preferably a week, to be a useful guide in modelling. Health officials obviously have all the details, including the reconstructed networks of individual contacts that allow them very detailed models of where and how fast the virus is spreading.
  • The public often puts lot of attention to certain short-term trends. One must be aware that the total number of deaths is a sum of various clusters of infected communities, which results in random fluctuations. Deaths can go up or down for several days before they return to the overall trend.
  • The demographic and socioeconomic conditions within each country or province affect the death rate.
  • Finally, local climate can also be an important factor. The virus spreads with a different rate under different climate conditions.

These and many other disease-related factors make modelling of COVID-19 extremely difficultquickly escalating into some very complicated math.

The curve fitting

This does not mean we cannot learn something from the daily counts of infected and deceased. The public is becoming aware of the concept of exponential growth. The spread of diseases is a typical example of a complex dynamics in a network of social contacts that explains the emergence of an exponential function. It describes how the epidemic starts if state measures are not imposed immediately — with an exponential growth of the number of infected. Thus, we are now painfully aware that doubling the number of cases every two days is a much more severe crisis than doubling it every week. However, this function is not how the disease will eventually evolve.

A virus needs hosts to multiply and if the number of potential hosts to infect is dropping then the disease will not be able to expand exponentially anymore. If nothing is done the virus will eventually infect as many people as it can reach, and this will be the end of the epidemic. Of course, this means a huge number of deaths — a rough prediction is 2.2 million deaths in the US alone if the government did nothing.

We therefore know that governments will do something to reduce the virus’ access to new hosts. The non-medical measures of social distancing and suppression of human gatherings are known to work as they have been used before in other situations (e.g. the responses to the Spanish flu in 1918 were following these same methods, which helped scientists later to explore their efficacy).

The simple way of how to describe this is to setup a model that includes three groups of people: susceptible to the infection, infectious, and recovered (thus, immune). The result from such a simple epidemic model gives a good sense of what kind of curves we should expect from the counts of infected, deaths and recovered. In simple terms, as the virus is losing new hosts to invade, the exponential growth starts to slow down and eventually stops. This is what we now often call as the “bending of the curve”. The problem is, of course, how to adjust the model to include all the aforementioned factors.

A quick-and-dirty trick that we can use is based on the fact that we now have two parts of the world where COVID-19 has been present for long enough to see how this curve-bending looks like in real life: the Hubei province in China and Italy. Think of it this way. Imagine you’re following scores from your favorite sports team during a season, but you are kept in the dark on whom they play against or any other statistics of their games. What you have are the scores from teams in previous seasons. What you would do then is compare the performance of your team to the trends that various teams showed in previous years. A few games in the start will not reveal much, but as the season progresses you will notice that the range of possible scenarios for the final success of your team are becoming increasingly limited.

This is what we wish to show. We first fit an asymptotic regression curve to the number of deaths as a function of time: f(t)=exp(a-(a-b)exp(-c*t)), instead of a simple exponential curve. It has been known that this curve can provide a good fit to the epidemic spread of diseases. We therefore make predictions on what kind of alternative futures lie ahead for various countries.

Then we use the cases of Hubei and Italy to get two sets of a,b,c parameters. This will represent two possible future scenarios for the US, which will give us a range for the final death toll and a time-scale of the fight against COVID-19.

The Hubei province — a radical approach to COVID-19

The story of COVID-19 started in the city of Wuhan in the Hubei province of China. We know that the lock-down of Wuhan started on January 23rd, 2020. Soon, other cities and regions within Hubei imposed the same measure and quickly the entire province was under quarantine. At the same time, a health crisis of epic proportions was happening in the Wuhan hospitals.

What we see in the official data is that the death toll follows our curve (see graph below) all the way to the end of the lock-down two months later. Hence, the case of Hubei is an extreme approach where the economy and regular daily life stopped for two months. Their concern now is how to prevent the re-emergence of COVID-19 due to “imported” cases — the problem that forced even Singapore, one of the most successful countries in the fight against COVID-19, to impose a lock-down a few days ago.

The case of Italy — a warning that many ignored

Even though the situation in China was dramatic already in January 2020, politicians around the world were not willing to risk their political popularity by spreading fear of the new disease. The virus started to spread in Italy in the first half of February or maybe even earlier. It is not clear how exactly the epidemic started in the Northern Italy, but back-tracing of cases led some experts to suspect that the UEFA Champions League game between Atalanta and Valencia may have been the reason why Bergamo became one of the epicenters of the pandemic.

Italy also had bad luck given that the virus entered their hospitals almost from the beginning. Even though the situation escalated quickly in mid-February, the state measures were not imposed quickly enough to suppress the pandemic. Politicians were faced with the vision of terrible economic loses and they hesitated with the lock-down. In the meantime, the virus had been spreading exponentially. Our fit shows (the black solid line in the graph below) that the measures now bend the curve toward the final official death toll close to 40,000 people. Hopefully the curve bending will get stronger and somewhat reduce this projection, but currently this is the number that the curve will reach when it completely flattens.

The case of Italy now is interesting for several reasons. First, it is a case of a country that has drastic measures, but their severity depends on the region. This is a more realistic scenario for other countries than Hubei. It also shows how a lock-down indeed starts to work almost immediately, but it takes weeks for the daily death toll to flatten. It will also take about two months, as it did in Hubei, since the start of the lock-down to establish conditions for lifting the restrictions.

The big European economies in race to avoid the Italian scenario

About ten days after the COVID-19 deaths started to climb in Italy, an even worse situation occurred in Spain. When the daily death toll reached 100, the government introduced a lock-down. Aggressive measures to contain the disease have been taken and more radical state measures introduced as the total death toll was approaching 10,000 people. The curve is now flattening and our projection puts Spain just below 30,000 deaths in total.

France started with its rise of deaths about the same time as Spain, but with a slower rate. Unfortunately, France is still seeing a disturbing trend, reaching record high rates of more than 1000 deaths per day — currently comparable to the US. This latest increase in deaths is due to an unfortunate spread of COVID-19 within retirement homes. The overall trend is still so strong that we cannot make a convincing prediction. This means that the next days are crucial for bending the curve in France. If the trend does not show a convincing slowing very soon, France will end up with numbers worse than Italy.


Germany saw the rise of deaths slightly after Spain and France, but it took an aggressive approach in testing in order to trace all the infected. Currently, about a million people have been tested in Germany for COVID-19, which makes Germany the global leader in the number of tests per capita. Their efforts are paying back as our fit shows that their curve has a potential to stop at about 10,000 deaths in total. However, the testing policy is about to get more strict as shortages of the basic testing equipment and reagents are reported.

The UK experienced a sudden rise in deaths at about the same time as Germany, but the situation in the UK is dramatically different. While Germany managed to bend the curve and performs lots of testing, the UK’s response was somewhat chaotic in the beginning as the government was trying to delay a lock-down. Unfortunately, even though the exponential growth of deaths is slower than in the beginning, it does not show a convincing bending of the curve quite yet. This means that our fit cannot predict where the final death toll will end. The last few days are encouraging, but by looking at Hubei and Italy, it is hard to expect the final death toll to be smaller than in Italy.

The dramatic events in the US

The situation in the US is surprisingly chaotic. The lack of coherent federal policy and a deep political divide created a situation where each state is devising its own COVID-19 policy and competes for medical resources with other states. This makes projections of the final death toll extremely difficult and uncertain. For example, some states have taken urgent and drastic measures that enabled them to slow down the pandemic and avoid escalation. Washington and California are such examples.

After the pandemic was slowed down on the West Coast, the situation escalated on the East Coast. The New York metropolitan area is now hit hardest by the pandemic. Their imposed social measures have started to bend the curve, but our models at this point are too uncertain to make a convincing prediction. However, if more effort is not taken to speed up the curve bending, this region will end up with a large number of deaths (50,000 or more).

We can however look at the cases of Hubei and Italy to predict the lower numbers on the total number of deaths that will accumulate before this summer. We project a theoretical curve fit we obtained from the Hubei and Italy data and observe how this would play out using US data. The graph below shows that the total number of deaths before the summer will reach anything within the range of 80,000 to 180,000 cases. This approach assumes that the curve bending starts immediately for the entire US. The plausibility of this assumption will be revealed in the upcoming days, but one should be aware that each single day in delaying curve bending increases the final death toll by probably tens of thousands of people.

These numbers are in agreement with the projections presented by the White House, as well as some more detailed epidemiological models.

The biggest challenge for this best-case scenario is that social distance and stay-at-home rules have not been implemented in a strategic manner over the entire country. Many state governors have not been introducing measures on time, which resulted with a large fraction of the US population travelling extensively just a couple of weeks ago. It remains to be seen how much this helped the virus to spread.

Also, Americans underestimate how long the crisis and the restrictions will last. The difficult period will probably last until at least June, when everyone hopes the warm summer weather will help slow down the virus in addition to the state-imposed measures. It is not clear how people will react once they realize the situation is dragging for many weeks. This is where good political leadership and social cohesion are crucial. Unfortunately, Americans are still deeply politically divided — as much as they were before the start of this crisis.

The shocking discovery of asymptomatic spread

Today, when countries are struggling to avoid the Wuhan and Italian scenarios, the focus is on “bending the curve”. Nonetheless, many ask themselves what comes after that. Obviously, lifting the restrictions is highly desirable from an economic point of view, but disturbing new insights into the COVID-19 disease are worrisome. Some studies showed that a large fraction of people go through the disease asymptomatically — from 50% to maybe even about 80%! This means that the virus can be easily re-introduced into the population once the restrictions are lifted.

One popular view on this problem is that the current measures are simply useless as the virus will return, while the restrictions cannot stay on for a long time. But the dilemma of whether to lock-down or not is a false one: there is simply no choice as the country’s health system will collapse. Only a few countries, like Singapore, Taiwan, Japan or South Korea, had the appropriate procedures in place, thanks to their previous experience with the SARS epidemic. Even they are now under threat from imported COVID-19 cases which has stirred up the local transmission again (Singapore introduced a lockdown because of this).

Thus, the problem that countries will face now is what to do once they have managed to keep the number of hospitalized cases low enough to avoid a healthcare system meltdown. It is like bleeding heavily from a big wound. The very first thing you have to do is stop the bleeding, but then you need medical help or you will, most likely, die. The problem now is that we are faced with a possibility of bleeding in the middle of a forest and help (in the form of a COVID-19 vaccine) will not come anytime soon. Under this scenario, economic and political problems start to overtake medical concerns. And that’s when things really start to get tough.

Figuring out what to do next?

All this can be mitigated. We need to be able to understand what drives people’s behavioral patterns and response strategies to the restriction measures before we can figure out the health and economic impact of the measures being implemented. We need to figure out what drives panic and fear in order to be able to prevent it during the worst weeks of the quarantine. In Italy people stopped signing from the balconies. In Wuhan, after two and a half months under the quarantine, the city is “profoundly damaged”, with its spirits shattered. We need to start figuring out what people are thinking about during isolation. Help them cope with the situation and help both governments and businesses around the world figure out what to do next.

We will update our predictions on total death tolls on a weekly basis and add survey sentiment and social network data to try and understand how people are feeling right now and what can be done to alleviate their pain. We will build an index that will measure the condition the people are in which will help us figure out what the recovery will look like: a quick and joyful one or a long and depressing one.

Share This