Skip to main content

Enhancing data fusion, parallelisation for hydrological modelling and estimating sensitivity to spatial parameterization of SWAT to model nitrogen and phosphorus runoff at local and global scale

Periodic Reporting for period 1 - GLOMODAT (Enhancing data fusion, parallelisation for hydrological modelling and estimating sensitivity to spatialparameterization of SWAT to model nitrogen and phosphorus runoff at local and global scale)

Reporting period: 2019-09-01 to 2021-08-31

When rain falls, the water runs off over the surface to the streams as well as infiltrates into the soils and takes parts of fertilizer with it. A similar effect happens during the snowmelt in spring, where the meltwater carries nutrients from the fields. This is an example of diffuse water pollution (DWP). The main pollutants we are looking at are categorized into nitrogen- (N) and phosphorous-based (P) nutrients. Too many nutrients can have negative effects on the overall health of plant and animal life in our lakes and streams.
The main prerequisite for combatting excessive nutrient losses to water bodies is to study how the pollution sources can be traced and reduced. Compared to point pollution, where polluted water is directly pumped into a river, e.g. from wastewater plants or industrial water use, DWP is more difficult to control due to its numerous and dispersed sources, and the difficulties in tracing its pathways. Spatially distributed hydrological models like the Soil and Water Assessment Tool (SWAT) have been successfully used for these analyses.
However, there are challenges: First, the data demand for these models is considerable. Even with the recent advances in standardised data access, i.e. discovery, quality assessment and conversions are major challenges. Up to 50% of research time is still spent on data processing. Secondly, this type of modelling is very computationally expensive. Finally, there is a lack of scientific understanding, if and how these models would change their predictions in relation to different input datasets.
Previous studies have shown that SWAT results are impacted by different choices of input data, but tested typically only one type of input data, i.e. exchanging the soil or land cover dataset for another one, or testing different elevation models. But there are no comprehensive studies on all spatial input data types concertedly and in unison, and the effect at the field, catchment, national or even global level. In particular, the use of high-resolution data has been neglected, mainly because of the unavailability of very high-resolution data and/or the very high computational requirements. That is also the reason why to our knowledge nutrient runoff has not been modelled at a global scale.
If we could automate those analyses, computers would excel at testing various scenarios and analysing and predicting pollution load. At first, we want to enhance and automate data preparation. Subsequently, to improve large scale and high-resolution SWAT modelling we aim to design and test a computation framework that spreads stages of the model computation onto multiple servers, just like nowadays’ cloud-computing.
When the technical basis is ready, we test and estimate the effect of different resolution datasets of climate, topographical, soil and land use inputs on SWAT modelling results of flow and nutrient runoff at the local scale in smaller catchments. Then try to scale up to a global application in order to analyse and predict global nitrogen and phosphorus runoff. This could allow us in the future to more easily create SWAT models at any desired level with reasonable input data and understand its reliability.
We estimated runoff, effects on soil and water quality and applied an extensive parameter sensitivity analysis in Estonian catchments. The results confirm the general patterns, that agriculture is an important contributor in many places. However, predicting future nutrient pollution has limited applicability. Scenario testing prooved very helpful and gave a very detailed insight on sub-catchment level of sources and pathways of pollution.
And finally, although high-resolution spatial maps from satellite, radar, and other sources are available, it does not mean that all data should be used in modelling.
A major part of the project revolved around data preparation, data management, processing and conversion between formats. Existing datasets for catchment-scale modelling were translated into structures that conform to national and internationally agree standards and formats made available by the open Geospatial Consortium (OGC). European and global datasets were retrieved already in a recommended structure and formats, and made accessible online in so called Spatial Data Infrastructures (SDI) data services in Europe and globally. Many software modules were developed and shared under FAIR principles, that were used to retrieve and convert data and discretise into appropriate units for the SWAT model. We analysed at first the accuracy and reliability of freely available global digital elevation models and decided to use MERIT for all modelling exercises outside of Estonia.
Establishing a high-resolution reference soil dataset for Estonia and link the labels to international World Reference Base (WRB) soil types and USDA texture classes turned out to be extremely challenging. This resulted in the major effort of crating EstSoil-EH: a high-resolution eco-hydrological modelling parameters dataset for Estonia. The additional work had a great impact in Estonia and Europe.
Finally, for a global data management processing system, a new and innovative data structure was necessary in order to provide geospatially correct (area and/or distance preserving spatial reference system) and computationally efficient means of massive scale and parallelizable data access. Discrete Global Grid Systems (DGGS) provide these capabilities.
We implemented a network-/server-based software algorithm that can execute the SWAT modelling and read out the results and apply and control calibration and sensitivity analysis. This was subsequently also applied on national level in Estonia to assess nutrient pollution sensitivity across Estonia’s rivers and streams. Although we started it to be at smaller catchment level, the assessment was done across all of Estonia with 5m resolution DEM and the high-resolution Estonian soil (EstSoil-EH) and topographic databases (1:10 000), where 75 % of mapped units are smaller than 4.0 ha.
We also validated this parallelizing methodology in massive-scale parallel geospatial analysis exercise, where we could quantify the spatial and temporal patterns of forest dynamics in the Brazilian legal Amazon with high spatial resolution data for forest cover and forest loss.
During the project, I contributed in several working groups in order to develop and provide guidance for interoperable geo-scientific software development and FAIR data sharing standards and recommendations and on how to apply the methodology to other modelling packages. Geospatial advances and future directions were discussed at many national and international forums, incl. EUROGI’s Beyond SDI initiative, the OGC working groups, the Estonian Ministry of Environment and the Landboard.
We estimated runoff, effects on soil and water quality and applied an extensive parameter sensitivity analysis in Estonian catchments. The results confirm the general patterns, that agriculture is an important contributor in many places. However, predicting future nutrient pollution has limited applicability. Scenario testing prooved very helpful and gave a very detailed insight on sub-catchment level of sources and pathways of pollution.
And finally, although high-resolution spatial maps from satellite, radar, and other sources are available, it does not mean that all data should be used in modelling.
main textures in Estonian soils