European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Research and Innovation Staff Exchange Network of European Data Scientists

Periodic Reporting for period 1 - NeEDS (Research and Innovation Staff Exchange Network of European Data Scientists)

Período documentado: 2019-01-01 hasta 2022-04-30

The digital transformation is radically changing the landscape for users and producers of data in Europe and beyond. New technologies for data processing, data analysis and data communication are required to aid data driven decision making, but companies and public sector bodies around Europe find they cannot build up the required capabilities quickly enough to meet this demand. This pressing challenge is at the heart of the multidisciplinary and intersectoral NeEDS consortium.

All beneficiaries in NeEDS see a common objective to engage in strengthening European innovation capacity in the area of Data Science. The industrial participants have consistently underlined their belief that developing more Data Science capacity is incremental to the long-term success of their enterprises and to the success of the European economy at large. The research developed by NeEDS is also timely for citizens, who are providers of data through the use of mobile communication and social networks, consumers of data visualizations through the democratization of data, and affected by models using some data about them, such as demographics, finances and type and level of education. This creates demand for new self-explanatory visualization tools, but also for new models that fulfil the right-to-explanation in algorithmic decision making required by the EU as of 2018.

From a scientific and technological perspective the challenges come from the complexity of the raw data (many records and/or features, network-type, time-evolving, multilevel hierarchical, multivariate, unstructured, or noisy), from completely novel questions posed to data scientists (easy-to-interpret models, personalized models, or models running under strict time regulations), as well as the need of nonexperts to visualize and interact with the knowledge extracted from data. Tackling these challenges calls for innovative mathematical modeling and cutting-edge numerical optimization methods to build new Data Science tools that significantly improve today’s state-of-the-art and that will become an elementary part of skill sets of an increasingly mobile labor force. These overall objectives will be achieved through intersectoral and international mobility of experience as well as early stage innovation and research staff, as well as the NeEDS events, such as the NeEDS Modeling Weeks, the NeEDS workshops, and the NeEDS conferences.
WP1: Performed works include the development of an algorithm to train credit scoring models based on social network analytics applied to mobile phone call graphs, investigation of sequence mining techniques for urban mobility graphs, with the aim of detecting and visualizing behavioral patterns, as well as community detection of linguistic groups in call networks. Key ongoing works include network visualization of operational logistics data, graph representation learning for fraud detection in the context of payment transactions, and construction of new inductive algorithms that can automatically generate features from large transaction networks.

WP2: Performed works include the development of a Mixed Integer Nonlinear Programming model that embeds binary decisions associated with the selection of features in the nonlinear numerical optimization formulation that builds the Support Vector Machine model. Additionally, we have defined novel ways to pursue interpretability in a number of data analysis models, namely, linear regression models as well as generalized linear models, classification and regression trees, factor analysis models, contingency tables, and clustering.

WP3: We have developed scalable mixed-paradigm trace clustering technique that introduces the idea of so-called super-instances in the field of process analytics. Moreover, we have started working on a novel predictive process monitoring system in the context of airport operational logistics. An initial proof of concept of a predictive model for timely arrival of luggage items was built, which solely relies on time stamped event data recorded in the airport’s information system. We have formulated a new design space for origin-destination data visualization, and used it to create a novel software tool for visualizing complex transport data. Finally, we have also made significant progress in the area of representation learning applied to business process analytics.

WP4: We have developed a methodology to deal with hierarchical categorical data in linear regression. In our approach, we propose to jointly select the level of granularity of the hierarchical categorical variables as well estimate the linear regression model. We have developed novel Mixed Integer Nonlinear Programming formulations as well as numerical optimization solution approaches, in which we can trade-off accuracy and granularity of information. Additionally, we have developed innovative Mixed Integer Nonlinear Programming formulations and numerical optimization solution approaches for model specification in Benchmarking. With excellent results, we have successfully applied those to the Benchmarking of Electricity Distribution System Operators.

In addition to the completed secondments, several network activities have so far contributed to the transfer of knowledge between industrial and academic stakeholders.
Until the end of the project, we expect to advance the state-of-the-art in the field of interpretable Data Science tools, thus enlarging the set of tools available to researchers and practitioners in Data Science. Furthermore, we aim to enhance the transfer of knowledge between industrial and academic stakeholders with the goal to improve Data Science capacity in Europe.
needs-logo.png
ppds-workshop-agenda-nov-1-2019-snl.png
modelling-week-needs-banner.jpg
2019-11-21-workshop-fake-news.png
needs-workshop-programme.png
poster-modelling-week-1.png