Skip to main content

EUrope-BRAzil Collaboration on BIG Data Scientific REsearch through Cloud-Centric Applications

Periodic Reporting for period 1 - EUBra-BIGSEA (EUrope-BRAzil Collaboration on BIG Data Scientific REsearch through Cloud-Centric Applications)

Reporting period: 2016-01-01 to 2016-12-31

EUBra-BIGSEA (Europe-Brazil Collaboration on BIG Data Scientific Research through Cloud-Centric Applications), funded under the topic EUB-1-2015 Cloud Computing, aims at providing services in the cloud for the processing of massive data coming from highly connected societies, which impose challenges on resource provision, performance, QoS and privacy.
The three main aims of the project are:
● The development of innovative Big Data services for capturing, federating and annotating high volumes of data on top of efficient programming models.
● The Development of advanced cloud services to support Big Data.
● The demonstration of such services on applications with high social and business impact, addressing main scenarios of high interest for both Europe and Brazil.
During the first year, EUBra-BIGSEA has fully achieved the goals set on management, infrastructure, user scenarios and dissemination, what is confirmed and detailed in the 18 deliverables produced and the 13 milestones achieved.
Project coordination and management is shared between Europe and Brazil. All the management and communication procedures and tools were setup and explained in D1.1 Quality Assurance & Risk Management Plan.
In this period, the activities of the WP2 (Community Engagement, Communication & Impact) focused on the definition of the communication and dissemination strategy and the assessment of its impact. D2.2 User communities engagement and dissemination strategy provides an overview of the identified user communities and the actions defined to engage the target audiences. Furthermore, the D2.3 Preliminary Action Plan has been released presenting the EU-BR challenges and joint research and innovation opportunities.
The main achievement of WP3 (Quality of Service Cloud Infrastructure) has been the definition of the architecture released in D3.1 QoS Monitoring System Architecture as well as the design of new optimization and analytical and simulation models to predict the performance of Big Data Spark and Hadoop applications in deliverable D3.2 Big Data Application Performance models and run-time Optimization Policies. WP3 performed initial work for the implementation of the configuration and contextualization service and its planned compliance with the TOSCA standard. EUBra-BIGSEA has released a tool to configure and set-up a fully working cloud environment, described in D3.3 QoS infrastructure services initial version, with horizontal elasticity, automatic reconfiguration and monitoring capabilities. A Docker container is published to facilitate its deployment in different IaaS cloud providers.
WP4 (Integrated fast and Big Data cloud platform) activities relate to the design and implementation of the integrated ecosystem Big Data analytics and mining in the cloud. Based on the design stated on D4.1 Design of the integrated big and fast data eco-system, the implementation of the first prototype has followed. In this context, WP4 and WP7 partners have worked jointly to select a set of applications. A first set of the candidate technologies, among those assessed in D4.1 has been adopted, deployed and tested on a development infrastructure at CMCC. Moreover, an investigation and identification of security and privacy concerns at the WP4 level (linked to WP6 and D6.1 Requirements and Coordinated Security Strategy) has been performed.
WP5 (Programming abstractions layer) has completed the D5.1 EUBra-BIGSEA software architecture. The report describes the overall functioning and interactions between the platform components. COMPSs has been extended to elastically negotiate resources through a Mesos framework and to execute in Docker containers. A prototype of a COMPSs application on top of the Ophidia framework is also available. Another important asset is Lemonade (Live Environment for Mining Of Non-trivial Amount of Data from Everywhere), an analytics platform that supports intuitive definition of tasks for knowledge discovery, mining, and learning from large amounts of data that come from a wide spectrum of scenarios, and exposing a web GUI. A detailed description of the WP5 components is provided in D5.2 Programming abstractions design together with the software artefacts available in the project repository.
WP6 (Security Provisioning and Assurances) has defined the coordinated strategy that will allow addressing the security concerns of the project in D6.1 Requirements and Coordinated Security Strategy (M6), together with an analysis of the state of the art, the definition of the EUBra-BIGSEA security scope and requirements. Since then, work has been carried out towards three goals: 1) the definition of an integrated solution for AAA requirements, 2) researching security assessment methodologies and tools t
EUBra-BIGSEA general impact on the state of the art is twofold: on one side, EUBra-BIGSEA is implementing a cloud monitoring system which will reflect cloud dynamic aspects. On the other side, the monitoring system will integrate monitoring metrics at multiple levels of abstraction, including physical and virtual resources and Big Data service applications.
A key point beyond the state of the art is represented by the design of an integrated, dynamic and elastic ecosystem addressing Big Data challenges by taking into account privacy, security and QoS aspects. For this purpose, we defined a SW architecture that uses the proactive policies to request the proper amount of resources. Finally, reactive elasticity linked to the monitoring system is defined to correct deviations to guarantee the QoS.
Although there are plenty of programming models (e.g. Hadoop, Spark, Storm) for the development of Big Data analytics software, these are limited by the strong requirement on the end users in having to adapt their applications to use a specific API, and specific of single big data solutions rather than an integrated big data eco-systems/platforms. Lemonade facilitates application development and QoS constraints by exposing data analytic components that have been previously modelled.