Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Research and Innovation Staff Exchange Network of European Data Scientists

Periodic Reporting for period 2 - NeEDS (Research and Innovation Staff Exchange Network of European Data Scientists)

Reporting period: 2022-05-01 to 2024-10-31

The digital transformation is rapidly reshaping the landscape for users and producers of data across Europe and globally. New technologies for data processing, analysis, and communication are essential to support data-driven decision-making. However, companies and public sector institutions across Europe often lack the ability to build the necessary capabilities quickly enough. Addressing this challenge is central to the multidisciplinary and intersectoral NeEDS consortium.

All NeEDS partners share the common goal of strengthening European innovation capacity in Data Science. Industrial participants have emphasized that building more Data Science expertise is vital to the long-term success of their enterprises and the broader European economy. NeEDS' research is also relevant to citizens, who generate data through mobile use and social networks, consume data visualizations, and are influenced by models based on data such as demographics, finances, and education levels. This generates demand for user-friendly visualization tools and models that comply with the EU’s right-to-explanation regulation introduced in 2018.

Scientifically and technologically, challenges arise from complex raw data (e.g. large-scale, network-type, time-evolving, hierarchical, multivariate, unstructured, or noisy), new demands (e.g. interpretable or personalized models, or models under strict time constraints), and the need for nonexperts to visualize and interact with extracted knowledge. Meeting these challenges requires innovative mathematical modeling and advanced numerical optimization methods to create new Data Science tools that outperform the current state-of-the-art and become core skills for an increasingly mobile workforce.

NeEDS achieved its objectives through international and intersectoral mobility of both experienced and early-stage innovation and research staff. Over 100 person-months of secondments were implemented, involving researchers from academia and industry across Europe, the USA, and Latin America. These secondments enabled knowledge exchange between disciplines (Business Analytics, Computer Science, Operations Research), sectors (academia and industry), and locations. PhD students and postdocs tackled real-world challenges from industry and the public sector and developed valuable transferable skills. Industry professionals upgraded their competencies with the latest academic developments in Data Science. Senior academic and industry colleagues engaged in knowledge transfer activities, enhancing mutual understanding—especially regarding Explainable Artificial Intelligence. Short videos of these secondments are available on the NeEDS website and have been actively promoted on social media.

Further objectives were achieved through NeEDS events, including modeling weeks, PhD schools, workshops, and conferences. During modeling weeks and hackathons, PhD students worked in teams under the supervision of academic and industry professionals to solve real-world problems presented by NeEDS’ industrial partners and others. These events offered students valuable career development opportunities and gave companies exposure to emerging talent.

NeEDS has contributed to advancing the state-of-the-art in Data Science by addressing open research questions. It has developed and released open-source software tools, expanding the toolkit available to researchers and practitioners. It has also enhanced knowledge transfer between academic and industrial stakeholders, helping to build stronger Data Science capacity across Europe
WP1: Performed works include the development of an algorithm to train credit scoring models based on social network analytics applied to mobile phone call graphs, investigation of sequence mining techniques for urban mobility graphs, with the aim of detecting and visualizing behavioral patterns, as well as community detection of linguistic groups in call networks. Key ongoing works include network visualization of operational logistics data, graph representation learning for fraud detection in the context of payment transactions, and construction of new inductive algorithms that can automatically generate features from large transaction networks.

WP2: Performed works include the development of a Mixed Integer Nonlinear Programming model that embeds binary decisions associated with the selection of features in the nonlinear numerical optimization formulation that builds the Support Vector Machine model. Additionally, we have defined novel ways to pursue interpretability in a number of data analysis models, namely, linear regression models as well as generalized linear models, classification and regression trees, factor analysis models, contingency tables, and clustering.

WP3: We have developed scalable mixed-paradigm trace clustering technique that introduces the idea of so-called super-instances in the field of process analytics. Moreover, we have started working on a novel predictive process monitoring system in the context of airport operational logistics. An initial proof of concept of a predictive model for timely arrival of luggage items was built, which solely relies on time stamped event data recorded in the airport’s information system. We have formulated a new design space for origin-destination data visualization, and used it to create a novel software tool for visualizing complex transport data. Finally, we have also made significant progress in the area of representation learning applied to business process analytics.

WP4: We have developed a methodology to deal with hierarchical categorical data in linear regression. In our approach, we propose to jointly select the level of granularity of the hierarchical categorical variables as well estimate the linear regression model. We have developed novel Mixed Integer Nonlinear Programming formulations as well as numerical optimization solution approaches, in which we can trade-off accuracy and granularity of information. Additionally, we have developed innovative Mixed Integer Nonlinear Programming formulations and numerical optimization solution approaches for model specification in Benchmarking. With excellent results, we have successfully applied those to the Benchmarking of Electricity Distribution System Operators.

In addition to the completed secondments, several network activities have so far contributed to the transfer of knowledge between industrial and academic stakeholders. The results have been presented at leading conferences/workshops in the fields of Business Analytics, Computer Science and Operations Research, and published in open access in renowned peer-reviewed journals. This includes plenary/keynote speeches from senior NeEDS researchers at major conferences, as well as PhD dissertations and papers awards from early career researchers. The intersectoral secondments have created the right environment for exploitation of these results.
Until the end of the project, we expect to advance the state-of-the-art in the field of interpretable Data Science tools, thus enlarging the set of tools available to researchers and practitioners in Data Science. Furthermore, we aim to enhance the transfer of knowledge between industrial and academic stakeholders with the goal to improve Data Science capacity in Europe.
needs-logo.png
ppds-workshop-agenda-nov-1-2019-snl.png
modelling-week-needs-banner.jpg
2019-11-21-workshop-fake-news.png
needs-workshop-programme.png
poster-modelling-week-1.png
My booklet 0 0