BACKGROUND: The inherent difficulty to identify and monitor emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing COVID-19 outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events.
OBJECTIVE: Our aim is to propose a methodology able to forecast COVID-19 in real-time.
METHODS: We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method uses as inputs (a) official health reports (b) COVID-19-related internet search activity (c) news media activity and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number 1 of historical disease observations, characteristic of emerging outbreaks.
RESULTS: Our model is able to produce stable and accurate forecasts two days ahead of current time, and outperforms a collection of baseline models in 27 out of the 32 Chinese provinces.
CONCLUSIONS: Our methodology could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.