Evaluation of the methodology
Three regions of the United Kingdom are considered in our scheme: Central England & Wales,
South England and North England. The experiments are carried out on data from 22/06/2009 to
28/03/2010 (overall 40 weeks or 280 days). The datasets are comprised by
vector space representations of the Twitter corpus using a
vocabulary of 2675 candidate features and
flu rates published by the Health Protection Agency (HPA) denoting the GP consultations per
105 citizens where the diagnosis result was Influenza-like Illness (ILI).
| MAE: Mean Absolute Error |
Magenta Line: Inferred flu rates from the Twitter corpus |
| LCC: Linear Correlation Coefficient |
Red Line: Flu rates from the HPA |
- We train on days 1-259 (weeks 1-37) on all 3 regions and test our inferences on days 260-280 (weeks 38-40) per region.
Here is the list of selected markers derived by applying Bolasso.
| Central England & Wales |
South England |
North England |
 |
 |
 |
| MAE: 5.27 |
MAE: 4.26 |
MAE: 2.18 |
- We train on days 22-280 (weeks 4-40) on all 3 regions and test our inferences on days 1-21 (weeks 1-3) per region.
Here is the list of selected markers derived by applying Bolasso.
| Central England & Wales |
South England |
North England |
 |
 |
 |
| MAE: 18.34 |
MAE: 9.38 |
MAE: 27.29 |
| LCC: 0.94 -- P-value: 3.48e-10 |
LCC: 0.84 -- P-value: 2.18e-06 |
LCC: 0.87 -- P-value: 4.01e-07 |
-
As an overall performance quantification, 10-fold cross validation is performed where each
fold is formed by 4 contiguous weeks; the MAE is on average equal to 11.1 with a standard deviation of 10.04.
The MAE for folds 1-10 is respectively equal to 30.4592, 27.4488, 4.5617, 4.4625, 8.8856, 7.7987, 14.8908, 3.1253,
3.5159 and 5.8573.