The implementation and tuning of a machine learning architecture for the prediction of heat waves was the main achievement of the action.
The model features used as input have been separated between large scale (remote) and local scale predictors. Sea surface temperatures from several global ocean regions, together with sea ice concentration from four Arctic areas were the remote predictors, while soil moisture and temperature, surface fluxes and snow water equivalent were the local drivers. The two main groups differed for the prescribed lag time to which they produce their main effect on the heat waves: one month for the remote drivers, two weeks for the local drivers. In addition, a global predictor was used, namely the global CO2 concentration, as a proxy of the global warming trend that has been demonstrated to be a main driver of predictability, especially for summer weather in Europe.
For each region, a feature selection algorithm creates a subset of features that are ingested by a random forest base learner. This model calculates the root mean squared error (RMSE) between observed heat waves, used for training, and predicted heat waves. The feature subset allowing for the minimum RMSE represent the driver pool that better contribute to the prediction of heat waves in that region.
The goodness of the ML-model’s fit is given by the coefficient of determination. While there is a lot of diversity among regions and seasons, the summer prediction is generally better than the prediction for the other seasons, and this may be due to larger persistence and lesser interannual variability that characterize the summer season in Europe. Many areas are clearly characterized by no or very limited prediction skill; however, a few regions such as the Western Mediterranean, the Middle East, and west and central Europe, show considerable skill.
Heat waves are predicted by an ensemble of best models in every region for the four main seasons. he more represented features are generally those that contribute the most to the predicted European HWP: the relative contribution of each feature is given by the SHAP value. The most represented feature is the CO2 concentration, which appears as a predictor of summer heat waves in all the European regions. Heat waves in other seasons are also associated to CO2 concentration in many areas. CO2 concentration is an obvious proxy of the anthropogenic forcing that kept increasing since the 1950s, with an acceleration in the last 40 years, and has previously been linked to the skill of European seasonal forecasts . Apart from the external forcing, some predictors seem to be important for the heat waves in many regions in specific seasons: SST anomalies in different areas of the Atlantic ocean for the spring predictions and in the Mediterranean Sea for the summer predictions, North Pacific SST anomalies for the autumn, soil moisture anomalies at several layers for the fall and summer, snow water equivalent for the winter.
The western Mediterranean is one of the regions where the model skill is the highest in the summer, with correlation between observed and predicted heat waves higher than 0.6. This value is even higher than that of the dynamical prediction provided by the ECMWF System 5, one of the most widely used prediction systems.