SoBigData serves a wide community of data scientists carrying out studies of all aspects of society. It supports policy making by novel ways to produce high-quality statistical information; empowers citizens with self-awareness tools; and promotes ethical uses of big data. In 2014, SoBigData anticipated the rising demand for cross-disciplinary research and innovation on the multiple aspects of social complexity. SoBigData’s initial vision may be summarised as follows:
-The necessary starting point to tackle the challenges is to observe how our society works, and the big data originating from the digital breadcrumbs of human activities offer a huge opportunity to scrutinize the ground truth of individual and collective behaviour in unprecedented detail and at a global scale.
-There is an urgent need to thoroughly exploit this opportunity for scientific advancement and social good as currently the predominant exploitation of Big Data revolves around either commercial purposes or social control and surveillance.
-There is an urgency to develop strategies that allow the coexistence between the protection of personal information and fundamental human rights together with the safe usage of information for scientific purposes There is a need to democratise the benefits of data science and Big Data within an ethical responsibility framework.
SoBigData was designed to promote large-scale, interdisciplinary social mining which is both repeatable and open-science oriented, based on three pillars:
1-A continuously growing, distributed data ecosystem for procurement, access and curation of big social data within an ethic-sensitive context.
2-A continuously growing, distributed platform of interoperable social data mining tools, methodologies and services for mining, analysing, and visualising massive datasets.
3-A social mining community comprising scientific, industrial and third party stakeholders, such as policy makers, supported by joint research, transnational and virtual access activities.
SoBigData integrated 12 European centres of excellence in Big Data analytics and social mining. The aim was to create a distributed, networked Research Infrastructure which leveraged each partners’ scientific resources. SoBigData designed a platform for open, ethically-minded social mining research and innovation that comprises a resource catalogue, storage and curation primitives for methods and datasets organised through the metaphor of Virtual Research Environments. SoBigData has made a design choice introducing a set of six exploratories: City of Citizens, Societal Debates, Well-being and Economy, Migration Studies, Sport Data Science and Explainable Machine Learning. The activity from the exploratories has produced over 170 high-profile publications stemming from Big Data experiments executed via resources provided by SoBigData. The approach developed by SoBigData created a comprehensive e-infrastructure, which offers access to over 205 resources designed to support data scientists in the execution of large scale experiments. The e-infrastructure is currently used by more than 4,700 users. The Transnational Access of the project supported on-site visits by 60 scientists that were hosted to develop their projects. The project’s training and innovation actions have produced courses for over 900 students and 120 pilot projects developed with companies. SoBigData has also promoted the creation of ethical guidelines and tools (
http://fair.sobigdata.eu/moodle/(odnośnik otworzy się w nowym oknie)) regarding the protection of personal data and intellectual property influencing national research bodies and public authorities, and becoming a reference initiative in this domain. SoBigData adhered to the FAIR (Findability, Accessibility, Interoperability and Reproducibility) and FACT (Fairness, Accuracy, Confidentiality and Transparency) making ethical, privacy-preserving and responsible data science research operational through the development of tools and training activities. The project has initiated a program of curation and publication of unique Big Data assets, fostering novel multi-disciplinary research performed by its user-scientists. This activity must be consolidated in order to engage scientists to publish data papers in order to link data and analytical workflows to scientists’ publication output.