Skip to main content

SoBigData Research Infrastructure

Periodic Reporting for period 3 - SoBigData (SoBigData Research Infrastructure)

Reporting period: 2018-09-01 to 2019-12-31

SoBigData serves a wide community of data scientists carrying out studies of all aspects of society. It supports policy making by novel ways to produce high-quality statistical information; empowers citizens with self-awareness tools; and promotes ethical uses of big data. In 2014, SoBigData anticipated the rising demand for cross-disciplinary research and innovation on the multiple aspects of social complexity. SoBigData’s initial vision may be summarised as follows:
-The necessary starting point to tackle the challenges is to observe how our society works, and the big data originating from the digital breadcrumbs of human activities offer a huge opportunity to scrutinize the ground truth of individual and collective behaviour in unprecedented detail and at a global scale.
-There is an urgent need to thoroughly exploit this opportunity for scientific advancement and social good as currently the predominant exploitation of Big Data revolves around either commercial purposes or social control and surveillance.
-There is an urgency to develop strategies that allow the coexistence between the protection of personal information and fundamental human rights together with the safe usage of information for scientific purposes There is a need to democratise the benefits of data science and Big Data within an ethical responsibility framework.

SoBigData was designed to promote large-scale, interdisciplinary social mining which is both repeatable and open-science oriented, based on three pillars:
1-A continuously growing, distributed data ecosystem for procurement, access and curation of big social data within an ethic-sensitive context.
2-A continuously growing, distributed platform of interoperable social data mining tools, methodologies and services for mining, analysing, and visualising massive datasets.
3-A social mining community comprising scientific, industrial and third party stakeholders, such as policy makers, supported by joint research, transnational and virtual access activities.

SoBigData integrated 12 European centres of excellence in Big Data analytics and social mining. The aim was to create a distributed, networked Research Infrastructure which leveraged each partners’ scientific resources. SoBigData designed a platform for open, ethically-minded social mining research and innovation that comprises a resource catalogue, storage and curation primitives for methods and datasets organised through the metaphor of Virtual Research Environments. SoBigData has made a design choice introducing a set of six exploratories: City of Citizens, Societal Debates, Well-being and Economy, Migration Studies, Sport Data Science and Explainable Machine Learning. The activity from the exploratories has produced over 170 high-profile publications stemming from Big Data experiments executed via resources provided by SoBigData. The approach developed by SoBigData created a comprehensive e-infrastructure, which offers access to over 205 resources designed to support data scientists in the execution of large scale experiments. The e-infrastructure is currently used by more than 4,700 users. The Transnational Access of the project supported on-site visits by 60 scientists that were hosted to develop their projects. The project’s training and innovation actions have produced courses for over 900 students and 120 pilot projects developed with companies. SoBigData has also promoted the creation of ethical guidelines and tools ( regarding the protection of personal data and intellectual property influencing national research bodies and public authorities, and becoming a reference initiative in this domain. SoBigData adhered to the FAIR (Findability, Accessibility, Interoperability and Reproducibility) and FACT (Fairness, Accuracy, Confidentiality and Transparency) making ethical, privacy-preserving and responsible data science research operational through the development of tools and training activities. The project has initiated a program of curation and publication of unique Big Data assets, fostering novel multi-disciplinary research performed by its user-scientists. This activity must be consolidated in order to engage scientists to publish data papers in order to link data and analytical workflows to scientists’ publication output.
Summary of major achievements in the reporting period:
1-SoBigData e-infrastructure: a software platform providing functionalities for sharing and exploring the integrated social mining resources of the national infrastructures and for executing experiments and a common working space. The integration of national infrastructures is now completed (WP7,8,9,10).
2-Six Exploratories on the following challenges: City of Citizens, Societal Debates, Well Being and Economy, Migration Studies, Sports Data Science, Explainable Machine Learning (WP11). Each exploratory is both a Virtual Research Environment and a multidisciplinary scientific community providing social mining resources and scientific achievements and research challenges.
3-Web site: for the external communication and dissemination and for driving the visitors to access the e-infra through exploratories and the various communication channels activated (WP3).
4-Legal and ethical framework operationalizing many of the legal and ethical issues needed in data management of cured datasets. On-line tutorial on responsible data science (WP2.
5-A wide outreach to a diverse range of stakeholders (900 trainees, 120 companies, 4700 e-infra users, 68 transnational)(WP3,4,5).
6-A consolidated community of researchers in Social Mining (200 scientific peer-to-peer international papers and 100 speeches) (WP8,9).
7-The consortium extended with other partners has been awarded for a new project for advance community: SoBigData++: European Integrated Research Infrastructure for Social Mining & Big Data Jan 2020 - Dec 2023 (WP1).
Scientific impact is represented by the more than 200 scientific publications in journal and international conferences (see D3.6). A number of open source software tools described in D5.2 are offered by the consortium forming the core seed for commercial exploitation, knowledge transfer, and consultancy services offered in this first part of the project. Another measure of the impact on society is the number of users (more than 4700) to SoBigData RI through virtual and transnational access and an outreach estimated more than 5000, and more than 900 trainees.

An important impact in society is the contribution of SoBigData to the debate on ethics and data protection. Some of the principal Investigators (J. van der Hooven, K. Bontcheva and D. Pedreschi) have been invited by policy making bodies at National and European Level to contribute as experts to new regulations. The project promoted responsibility in executing social mining experiments and on the basis of the Ethical and Legal framework (Wp2).

Another important impact has the presence of SoBigData in several project proposals as backbone infrastructure: 5 ERC Advanced Grants (XAI:Science and technology for the eXplanation of AI decision making; Designing algorithms to reduce filter bubbles in social media; The Process Improvement Explorer: Automated Discovery and Assessment of Business Process Improvement Opportunities; Urban Innovations and Smart Cities.- 5 H2020 Grants: HumaneAI; AI4UE; WeVerify: HamMingBird, and SoBigData++.