Skip to main content

Methodologies and Data mining techniques for the analysis of Big Data based on Longitudinal Population and Epidemiological Registers

Periodic Reporting for period 2 - LONGPOP (Methodologies and Data mining techniques for the analysis of Big Data based on Longitudinal Population and Epidemiological Registers)

Periodo di rendicontazione: 2018-02-01 al 2020-07-31

European societies face rapid social changes, challenges and benefits, which can be studied with traditional tools of analysis, but with serious limitations. This rapid transformation covers changes in family forms, fertility, the decline of mortality and increase of longevity, and periods of economic and social instability. Owing to population ageing across Europe, countries are now experiencing the impact of these rapid changes on the sustainability of their welfare systems. At the same time, the use of the space and residential mobility has become a key topic, with migration within the EU countries and from outside Europe being at the center of the political agenda. Over the past decade research teams across Europe have been involved in the development and construction of longitudinal population registers and large research databases, while opening up avenues for new linkages between different data sources (ie administrative and health data) making possible to gain an understanding of these fast societal transformations. However, in order to work with these types of datasets requires advanced skills in both data management and statistical techniques.
This Marie Sklodowska-Curie Innovative Training Network project, LONGPOP, aimed to create network to utilize these different research teams to share experiences, construct joint research, create a training track for specialist in the field and increase the number of users of these large– possibly underused – databases, making more scientists and stakeholders aware of the richness in the databases.
The primary research goal was the analysis of the causes and consequences of the rapidly evolving European societies from different disciplinary approaches but via common methodologies and longitudinal analysis techniques.
Moreover, it aimed to train a new generation of Early-Stage Researchers (ESR), who were in their first four years of research with no PhD awarded, with innovative and creative skills to face nowadays challenges. The 15 ESR have moved to another country to develop their individual projects. These projects are built in the framework of a broader research. The network, composed by 10 beneficiaries and 1 partner, hosted the 15 ESR providing them training-through-research as well as in-house and external courses. Their professional development was completed by the secondments in the different organisations and in the partner organisation – IECA. A strong emphasis was put in their participation and active contribution to dissemination and communication activities and events, such as conferences, seminars, meetings with non-specialists audiences, etc.
The 15 recruited ESRs and their research teams, supported by the partner organisation, undertook analyses in the fields of humanities, demography, sociology, informatics, economics, statistics, health and geography, using longitudinal analyses techniques, Big Data, data mining and data linkage, under the common structure of 3 research Working Packages (WPs). The Network exploited a large number of historical and current databases, and also advanced in the complex process of conceptualising and implementing the intermediate data structure (IDS), to take advantage of the rich information of these databases and to understand social processes through individual life trajectories. The exchange of information and methodologies, the delivery of trainings and the implementation of the Individual Research Projects (IRP) by ESRs were essential to reach a substantial impact on the field of humanities, social sciences and health sciences.
The training programme was implemented according to the objectives, but also targeted to the ESRs’ needs in terms of research training and skills gap. Apart from the expected training, other courses organised by the network and by other organisations were offered to ESRs, according to their Career Development Plans. The exposure to the private sector was ensured by the participation of two companies (ESRI and TELNET). The secondments (in the private and/or the public beneficiaries) increased the cooperation among teams and the enhancement of fellows’ knowledge in research methodologies, data management and statistical techniques.
The dissemination and the outreach were implemented through the participation in conferences, seminars and workshops, the publication of articles in peer reviewed journals, the use of social networks and websites, the meeting with non-specialists audiences, etc.

The main exploitable results of the project are:
1.1 Catalogue of longitudinal registers
1.3 Report on Concepts and Techniques for Record Linkage
1.2 Report on Concepts and Techniques for data curation, management and statistical analysis
1.4 Report on Concepts and Techniques for Record Linkage
2.1 GIS mobility tool
2.2 Compilation of different GIS layers on mortality
2.3 Compilation of GIS layers on Italian demographic
2.4 Tool to visualize indicators of environmental exposures
2.5 Tool to locate life courses on maps
2.6 Web Portal for Health and Population
3.1 Report on the IDS and extraction software
3.2 Report on the coordination of the building of extraction software
3.3 Report on different Algorithms
3.4 Data mining, extraction software
6.7 Dissemination via a special volume of an e-journal
The exploitation of these results will be done as follows:
- Reports: this will be useful for other researchers and data scientists who work with data linkage and Big Data in different fields;
- Tools and methods: they will be exploited by scientists who work with similar kind of health or historical data, or need to compare different databases for their own researches.
- E-journal: the volume, «Major Databases with Historical Longitudinal Population Data: Development, Impact and Results» (Special issue 2020), will contribute to spread the results of different approaches to the main topic of the volume among the scientific community.

Finally, at individual level, the ESRs have produced different results according to their IRP: GIS layers on mortality; definition of concepts, data models, methods and analytical tools for longitudinal analyses, individual record linkage and data exploitation; working papers and reports on health, sociological and demographic analyses, etc.
The project advanced in the methods and techniques for the data management and the analyses of longitudinal and epidemiological registers. The ESRs acquired a scientific background characterised by a multidisciplinary approach to common problems in the H2020 agenda such as health, demography, ageing etc. The tools, the methodologies and the reports produced advanced in the knowledge and practical application for data analyses, with many examples of the usefulness of them. A new generation of scientists is now prepared to solve current sociological and health challenges using historical and actual registers (for example, by comparing the historical data of the Spanish flu with the current data of COVID-19). These skills and this knowledge has definitively foster their employability, as it is shown by the postdoctoral opportunities already offered to most of them.
In the next months, those ESRs who haven't defend their PhD thesis yet will do it. All the researchers will keep publishing the results of their researchers in different peer-reviewed journals. This will enhance the scientific and socio-economic impact of LONGPOP beyond the lifetime of the project.
Homepage of the LONGPOP website