Skip to main content

EUrope-BRAzil Collaboration on BIG Data Scientific REsearch through Cloud-Centric Applications

Periodic Reporting for period 2 - EUBra-BIGSEA (EUrope-BRAzil Collaboration on BIG Data Scientific REsearch through Cloud-Centric Applications)

Reporting period: 2017-01-01 to 2017-12-31

The exponential increase of the available Open Data and the affordability of cloud computing resources are an excellent opportunity for the democratisation of Data analysis. However, the development of Data Analytic applications in the cloud is a complex task that faces essential challenges such as the Quality of Service and the minimisation of privacy risks, requiring high-level technical skills.

EUBra-BIGSEA developed a framework, a platform and a library to ease the development of highly-scalable, privacy-aware data analytic applications running on top of Quality of Service cloud infrastructures, reducing development cycles and deployment costs. While EUBra-BIGSEA targets Data Scientists in general in the context of the project timeline it has been demonstrated implementing a set of applications for analysing data transportation data, aiming at improving urban transportation users experience.

The work has required the collaboration of European and Brazilian experts combining their expertise on data analysis, application performance modelling, privacy management, data analytics, parallel processing and cloud services.
The EUBra-BIGSEA project has produced three novel components (a QoS Data Analytics Platform, a Data Analytics Development Framework, and a toolbox of models for building applications on traffic data) demonstrated on three applications for urban transportation data management and disseminated in 60 Publications and 50 contributions in conferences. The software is available in the project GitHub (https://github.com/eubr-bigsea) and DockerHub (https://hub.docker.com/u/eubrabigsea/) as well as in the EUBra-BIGSEA website (http://www.eubra-bigsea.eu) along with papers and presentations. Video demos are available on the youtube channel of the project (https://goo.gl/FTCq3g).

These components and applications target cloud providers, data analysis application developers, Data Scientists and Municipalities.

The 20 EUBra-BIGSEA building blocks are organised in 6 layers (infrastructure services, Big Data Services, Programming Frameworks, Security Services, High-Level services and Applications). Out of this 20 building blocks, 15 are new developments and 5 legacy components notably improved during the project.

System Administrators and cloud infrastructure providers can benefit from the QoS Data Analytics Platform, which comprises a set of applications and services that can be conveniently deployed, which provide transparent horizontal and vertical elasticity, performance modelling and optimisation. The platform enables the automatic deployment of services to run Big Data application ensuring deadline enactment through the automatic reconfiguration of the infrastructure and it is tailored for OpenStack, OpenNebula and Mesos frameworks, using IM, CLUES and MONASCA.

Data Scientists and Data Analytics Application developers can benefit from the Data Analytics Development Framework, which integrates a graphics framework (LEMONADE) capable of building up data analytic parallel workflows on Spark and COMPSs and supporting OLAP functions through Ophidia with privacy and QoS constraints. The programming framework includes a broad suite of pre-built tools and descriptive and predictive models for traffic data analytics, ready to be integrated on the applications developed Data Scientists. The programming models include the support to privacy policies and generic Data Quality and Entity Matching services and a toolbox of models for building applications on traffic data, with models for extracting routes, predicting crowdedness, estimating traffic jams, sentiment analysis, computing trajectories among others.

The dissemination of results have produced 60 publications in scientific and technical journals (some of them are high-impact journals, such as the Future Generation of Computer Systems, the Journal of Grid Computing, the Journal of Systems and Software or the Journal of Parallel and Distributed Computing), as well as 50 participation in events (including the organisation of a satellite workshop in CCGRID 2017). The outcome of the publications is expected to increase if joint publications submitted are finally accepted.

The Europe-Brazil collaboration has been crucial to achieving:
- The QoS Cloud services, integrating vertical elasticity framework from UFCG with horizontal elasticity and convenient deployment from UPV and the performance modelling and optimisation of the configuration by applications from POLIMI and UFMG.
- The Data Analytic development framework, which includes LEMONADE developed from UFMG executing workflows in parallel through COMPSs developed by BSC and connecting to processing functions from Ophidia, developed by CMCC. The framework provides privacy preservation through PRIVaaS developed jointly by UNICAMP and UC, Entity Matching, developed by UFCG and Data Quality developed by POLIMI.
- The toolbox includes models for Extracting routes, predicting crowdedness, estimating traffic jams, sentiment analysis, computing trajectories, developed by UFMG, UFCG and UFTPR, integrated into applications developed by CMCC, UFCG and UPV.
EUBra-BIGSEA has developed a Big Data application development framework that comprises three primary assets that are not available in the market:
- The QoS Data Analytics Platform based on cloud services for enabling a running application to meet an execution deadline.
- The Open-source Data Analytics applications development framework that provides a graphical interface to build up data-analytics workflows that include automatic discovery of parallelism, OLAP functions, Privacy annotation, quality assurance and Entity Matching.
- A toolbox of 8 Descriptive and Predictive models for building traffic data analysis applications.

The maturity of the components has been assessed externally by the project CloudWatch by performing a Technology and Market Readiness Level assessment. The minimum score of the TRL was 4, with an average TRL score of 6,25. The project components are released under Open Source licenses (Apache 2 and GPLv3).

EUBra-BIGSEA components have been validated on infrastructures involving different geographically distributed sites with up to 120 cores. Components of the project have been transferred to large-scale projects and initiatives, such as the EOSC-Hub, INDRA, Bioexcel CoE, the TANGO project, the ESiWACE (Centre of Excellence in Simulation of Weather and Climate in Europe) and the European Network for Earth System Modeling (ENES). The project has also produced upstream contributions to OpenStack in MONASCA and TOSCA parser modules and started discussions with the Brazilian RNP for the promotion of the components and the Curitiba municipality for the adoption of the applications.

As socio-economic impacts, the developments of the project have achieved:
- The project provided the analysis of other aspects, such as noise, cost of land, accidents, sidewalk-based route planners for wheelchair users, business activity, speed limit devices, health, and related municipality laws.
- The project enhanced the communication of Municipality and Academy. As a result, the project enabled the storage of the complete data at UFPR (http://dadosabertos.c3sl.ufpr.br/curitiba/) along with a GIS database opened to the community (stored at UFPR - bigsea.c3sl.ufpr.br).