The Hamilton Institute's COVID-19 Data Dive: How long will COVID-19 last in Ireland?

Andrew Parnell; Amin Shoari Nejad; Ken Duffy; HoChan Cheon; Bentelhoda Binaei

tl;dr

We created an app that predicts how long COVID-19 will last in Ireland
See below to play with the app yourself (or direct link here)
With current best guesses, and no further lockdowns, COVID-19 will last well into 2021 (and beyond) before it is likely to die out naturally

Do it yourself

Without further ado, this is the app. Put in what you think are your best guess values on the left hand side and see how long it will last by looking at the calendar on the right:

The app gives you the likely date of the 10% chance, the 50% chance (i.e. best guess) and the 90% chance that the virus will be extinct, i.e. Ireland will have no more infected people.

Perhaps the most important thing to try is to fiddle with the R-number at the top left. If you are a member of the Zero COVID crew you can see how long it would take to get to extinction by moving the R number all the way down to 0.1 (simulating an almost total lockdown). Or if you want to follow something like the Swedish strategy you could put it up to closer to 3 to see how quickly it would burn through the population. Or you could put in what you think are best guesses as to all the figures and see what the model comes up with for how long this will last.

How does the app work? (simple explanation)

The app takes four input values down the left hand side:

The R-number. In case you haven’t heard of this before (where have you been?) this is the average number of people infected by an infected person. You may have heard that if this number is above one then COVID-19 is spreading, and if it’s below 1 then it’s dying out. This isn’t exactly true, but you’ll have to read on if you want to learn more
The ‘Current number of non-symptomatic spreaders’. This value is the number of people who have the virus but aren’t yet showing symptoms (if they ever do). They can pass the virus on to other people (they are infectious), but they might not know or have been tested. This is a really hard value to estimate. I usually set it to the same as the …
… `Current number of symptomatic infected cases’. This number we do know a little bit better. It’s the number of people who currently have the virus and know about it (i.e. have symptoms). They have likely tested positive, and can perhaps be estimated by summing up the number of people over the last two weeks who have had a positive test result. At the time of writing, that’s just under 2000 people. Our app won’t handle values that large right now but we’re working on it.
The ‘Current total of immune/recovered/dead’. This number includes all the people who have some kind of natural immunity, or have recovered from the disease, and so very unlikely to get it again, or have unfortunately already died from it. We have put a default value of 300,000 in for this figure, base on a guess that around 6% of the population have antibodies to the disease.

Once we have these values they are run through an epidemic model of the type used by the National Public Health Emergency Team (NPHET) which simulates many thousands of future scenarios measuring the time to virus extinction. There is randomness in this model; in some scenarios we get lucky and the epidemic dies out quite quickly, in others we don’t and it goes on for quite some time. We summarise the thousands of future extinction times and this gives us an estimate of the date at which 10%, 50% and 90% of the scenarios had the virus extinct.

At this point you might have the question; but we’ve all read in the papers that if the R number is bigger than one then the virus is spreading and if it’s less than 1 then it’s dying out? Well unfortunately this isn’t quite true. It’s possible to have a scenario where the R number is bigger than one and the virus dies out quickly, and the opposite! The reason is that the R number only describes the ‘average’ person. At an individual level each person infects a random different number of people, equivalent to rolling a dice and infecting the number shown on the upward face. If we get lucky then everyone rolls a 1 and they infect very few people. If we get unlucky many people roll a 6 and infect lots of people.

Having said all of the above, there are caveats to our model, and understanding them can help judge whether or not the approach is useful for you. Quite a few of the assumptions are a bit abstract, and are listed in full at the end of the article. Perhaps the main disconnect from reality though is that it assumes that the R value stays the same (at whatever you set it to) until the virus goes extinct. As we are now fully aware, tuning our daily behaviour to stop the virus getting out of control (or killing the economy) is one of the key discussions being had at government level. Were you to set a high R value you might see the virus burn through the population quicker, but it’s unlikely the government would let that happen without being put under some political pressure.

How does the app work? (longer explanation)

The full app is a bit richer than that described above and in this section I’ll give some of the more technical details. Feel free to stop here and go back to playing with the app if you are not interested in the maths.

We use the stochastic discrete-time approximation to the common Susceptible, Exposed, Infected, Removed (SEIR) infection model. This is a multi-compartment epidemic model and is commonly used by governments and organisations around the world to forecast epidemic behaviour. There are versions of the model that we would consider overly simplistic; such as those without stochastic components which seem to yield way-too-certain estimates of cases and deaths. Other more advanced versions have changing R number behaviour which allows lockdown effects to be modelled, but that’s the subject of a post for a different day.

The basic stochastic models have exponential waiting times, meaning that the amount of time someone spends in each of the 4 states (S, E, I or R) follows an exponential distribution. We have extended this to allow for gamma distributions which provides a slightly richer and more realistic estimate of the waiting time. The reason this entire app works as a prediction tool is because the exponential distribution is memoryless, and so this model is appropriate to estimate the future behaviour of the epidemic given only current estimates of the parameters.

To run our stochastic simulator we have to set some values of the parameters of the SEIR model. We have tried to follow guidance from the documents provided by the IEMAG as given here. The values we have chosen include:

The mean length of time an individual spends in the Exposed category is 6.6 days
The mean length of time an individual spends in the Infected category is 7.4 days
The population of Ireland is 4.9 million people

If you think that we have got any of these values wrong, then please contact us.

Even with this model, that’s not the end of the story. Since repeatedly running the model thousands of times to get estimated extinction quantiles is very computationally intensive, we build an emulator of the output quantiles. At the first stage we used a Latin hypercube sample to cover a large range of R_0, E, I, and R values from the input space. We then repeatedly build Gaussian Process models to determine where the uncertainty in the quantiles is largest. After a large number of these simulations we then used the scikit-learn Python package to build a set of extreme boosted trees on the multivariate outputs to produce the final predictions of the quantiles.

A list of caveats

To end this post, here are some of the final caveats to the model, which we might try and improve at a later date:

The R_0 value is constant throughout the epidemic. In reality this changes as lockdowns are introduced and removed
The use of Exponential/Gamma distributions to model the waiting times in each compartment. There seems to be very little published information as to the proper shape of these distributions
The parameters we have used in the model may be inaccurate and also may change dynamically throughout the epidemic
The simulations of the time to extinction can be very noisy. This is especially true when R_0 approaches 1 as the behaviour starts to behave like a random walk (with an absorbing boundary) which leads to very large (infinite?) variance in the time to extinction.
The SEIR model itself might be unrealistic. For example there are versions which allow for re-infection of previously infected individuals. However the CDC say this is unlikely.
The emulation approach we have used introduces noise in the estimates of the time to extinction, and could be reduced with more simulations or a superior machine learning or emulation model.

If you can think of other caveats or issues with our app please contact us.

How long will COVID-19 last in Ireland?

tl;dr

Do it yourself

How does the app work? (simple explanation)

How does the app work? (longer explanation)

A list of caveats

Citation

Links

Developer area