Community Research and Development Information Service - CORDIS

H2020

CloudLightning Report Summary

Project ID: 643946
Funded under: H2020-EU.2.1.1.3.

Periodic Reporting for period 1 - CloudLightning (Self-Organising, Self-Managing Heterogeneous Cloud)

Reporting period: 2015-02-01 to 2016-07-31

Summary of the context and overall objectives of the project

Cloud Computing is defined as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell and Grance, 2011). It represents a convergence of two major trends in information technology – (a) IT efficiency and (b) business agility. Cloud computing is typically elaborated by reference to three service models (Cloud Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud Infrastructure as a Service (IaaS)); and, four deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud) (Mell and Grance 2011). Current cloud infrastructures are typically centrally managed and composed of a large number of machines of the same type. These clouds often make use of homogenous resources; typically, identical general purpose microprocessors that are relatively inexpensive. Currently, due to constraints arising from homogeneity of resources, cloud service providers (CSPs) use over-provisioning as a method for assuring service availability and performance. The use of homogenous resources and over-provisioning are major contributors to energy inefficiency in the cloud computing sector.

For clouds to be successful they need to provide affordable and reliable computing and storage services. They must continue to adapt to an ever growing user community whose service needs are unpredictable and are always changing. This extends to accessing High Performance Computing (HPC) capabilities. Consumers of cloud services seek efficient access to cloud resources and look to reduce efforts required for application development and deployment. To address this, clouds must embrace heterogeneous hardware to provide specialist services. As clouds increase in size and as machines of different types are added to the infrastructure to maximise performance and power efficiency, heterogeneous clouds are being created.

The growing size and complexity of the evolving heterogeneous cloud creates a confluence of issues for CSPs. CloudLightning is funded under an EC Research & Innovation Action with regard to high performance heterogeneous cloud infrastructures under the Advanced Cloud Infrastructure and Services (ICT-07-2014) call. CloudLightning proposes a novel architecture for provisioning heterogeneous cloud resources to deliver services, specified by the user, using a bespoke service description language. The CloudLightning system will be built on the principles of self-management and self-organisation. Our use cases relate to high performance computing and specifically the use of the cloud for workloads related to ray tracing, genome processing, oil and gas exploration and related scientific computing use cases.

Issues of heterogeneous clouds and how CloudLightning addresses them.

• Issue 1: Rising complexity of managing cloud infrastructures
How it's addressed:
Rather than a typical centrally managed cloud infrastructure, CloudLightning uses the principles decentralisation, self-organisation and self-management to manage complexity effectively and tackle the challenges of providing a Services Oriented Architecture

• Issue 2: Cloud infrastructures are typically composed of homogeneous resources.
How it's addressed:
The CloudLightning architecture is built to accommodate increasing heterogeneity and specifically for CPUs, GPUs, MICs and DFEs.

• Issue 3: Providing consumers with access to Infrastructure as a Service and relinquishing of control over that infrastructure
How it's addressed:
The CloudLightning system establishes a clear services interface between the service consumer and the service provider. The essence of this interface is the establishment of separation of concerns between the consumer and the provider. Thus, consumers should only be concerned with what they want to do, and providers should be concerned only with how that should be done.

• Issue 4: Increasing the energy efficiency of cloud infrastructures.
How it's addressed:
The CloudLightning system increases energy efficiency through a variety of strategies. Firstly, it uses energy efficient, non-commodity heterogeneous resources. Secondly, it reduces over-provisioning, where possible, from the perspectives of both the end user and the CSP. Thirdly, it maximises VM/server density, where appropriate; and finally it turns off idle servers, when possible.

• Issue 5: Management and efficiencies of resource utilisation
How it's addressed:
The CloudLightning system uses dynamic workload and resource management to increase the efficiency of resource utilisation. Resource management in the CloudLightning system is a local activity in each component. Components cooperate to flexibly adjust the utilisation of the resources under its control.

• Issue 6: Service deployment for non-technical consumers
How it's addressed:
The CloudLightning deployment mechanism simplifies the operational overhead of deploying services in the resource fabric by eliminating the need for the Enterprise IT Decision-Maker to know about the intricacies of the CSP’s infrastructure.

Service descriptions, provided by prospective cloud consumers, will result in the cloud evolving to deliver the required services. The self-organising behaviour built into, and exhibited by, the cloud infrastructure will result in the formation of a number of potential resource coalitions capable of meeting the service needs. These coalitions will typically be composed of heterogeneous components and thus the quality of service that each could deliver will differ. The end user will choose from these offerings and appropriate resources will be commissioned to deliver the desired service.

An important objective in creating this system is to remove the burden of low-level service provisioning, optimisation and orchestration from the cloud consumer and to vest them in the collective response of the individual resource elements comprising the cloud infrastructure. A related objective is to locate decisions pertaining to resource usage with the individual resource components, where optimal decisions can be made.

By addressing the inefficient use of resources as a result of over-provisioning, the CloudLightning system delivers savings to the cloud provider and the cloud consumer with reduced power consumption and improved service delivery, with hyperscale systems particularly in mind.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The work performed within the first half of the project is divided between technical and non-technical work.

1.1.1 Technical Work
The technical work during Months 1-18 was undertaken and delivered under the technical work packages (WP2, 3, 4, 5 and 6). The remaining technical work package, WP7, does not commence until Month 20.

The Use Case Requirements Report (Deliverable 2.1.1) detailed the requirements of each of the use cases being considered in the CloudLightning project, the opportunities and challenges that each presents and the approaches used in evaluating those use cases. In addition to the three target application domains of Genomics, Oil & Gas Exploration and Ray Tracing, the opportunities around both dense and sparse matrix analysis are also explored. This was due to the frequency with which matrix operations arise in scientific computing domains. Primary and secondary research on market dynamics for each use case and wider market opportunities and challenges was undertaken and will be updated on an iterative basis throughout the project. Work on this continued into Task 2.2 and the conversion of these use cases into deployable services is due in Month 24. D2.1.1 is available for download here: http://cloudlightning.eu/portfolio-view/d2-1-1-use-case-requirements/

Resource Characterisation was performed under Task 6.1 and reported in Deliverable D6.2.1. The report, which focused on three co-processor/accelerator categories (GPUs, MICs, DFEs), captured some of the main characteristics of these resources and how they impact resource allocation. Additional architectures and cloud environments were examined including multicore CPU servers and massively parallel CPU environments.

The CloudLightning architecture was developed under Deliverable D3.1. This involved a thorough review of the start of the art of cloud architectures, specifically in the context of large-scale server clusters, resource scheduling, resource monitoring and heterogeneity of resources. The review also included an examination of platforms for abstracting compute resources such as container technologies and cluster management frameworks. Other concepts examined include coalition formation to optimise the resource utilisation and resource allocation in proposed heterogeneous cloud infrastructure environments. The deliverable is available for download on the project website here: http://cloudlightning.eu/portfolio-view/d3-1-report-on-state-of-the-art-and-draft-architecture/.

In addition to the core architecture, the core system components, their semantics and implementations were defined under D4.1.1 Prototype Specification and API. These components comprise the Gateway, the Catalogue and the Cell Management Service. The report also outlined how the components interact with each other and the APIs they expose. The proposed APIs are designed in such a way that they can be easily extended as the project evolves in time. The report, which provides the foundation work for the framework that is going to be developed to successfully support the entire CloudLightning infrastructure, is available for download from the CloudLightning website, here: http://cloudlightning.eu/work-packages/public-deliverables/.

The CloudLightning service description language (CL-SDL) was defined and specified under Task 5.1 Service Description Format for Deliverable 5.1.1. This was achieved after an in-depth state-of-the-art survey of service description languages for different service delivery models/levels: IaaS, PaaS and SaaS. The CL-SDL intersects with five other tasks (the integrated use cases in Task 2.2; the service deployment and management mechanism developed under Task 4.4; and the web service endpoint in Task 5.2 and Task 6.2. It is also aligned with the service delivery model articulated in the Architecture Deliverable (D3.1.1).

The algorithms for the Local Decision Strategies were selected in line with Milestone M4.2 in Month 15. While investigating and integrating these strategies, a framework was developed that addresses the hosting and execution of self-organising and self-managing strategies associated with each of the various levels in the CloudLightning hierarchy. To concretely investigate a number of self-management and self-organisation strategies, a dedicated simulator was developed to model the execution of the strategies. The resulting investigation demonstrates how the CloudLightning architecture reconfigures, through the process of self-organisation, to achieve dynamic stasis in balancing the tendency towards the desired performance objectives with the physical constraints captured by these specific characteristic functions.

Work on Coalition Formation began in Month 13. This task is due to end in Month 20. Coalition Formation is a task performed by the vRack Manager and is concerned with locating, forming and managing the resources associated with each service. The Coalition Formation mechanism has been built on the self-organizing framework designed in Deliverable 4.2.1. The Coalition Formation strategy has been designed and implemented using both OpenStack and Mesos frameworks. The implementation will be extended to accommodate Docker Swarm and/or Google Kubernetes. Similarly, as with Coalition Formation, Service Deployment and Management began in Month 13.

This involved an initial scoping of service lifecycle management and monitoring to identify the relationship and dependencies with adjacent tasks in the service gateway, service description format and the self-organisation and self-management (SOSO) systems. The state of the art on resource/service instrumentation and monitoring tools was researched to develop an integrated telemetry system for the CloudLightning system. This extended to gathering the telemetry requirements for CL heterogeneous resources i.e. MIC, GPU, DFE, CPU clusters. This contributed to the development of schematics to illustrate the interfaces between the service gateway engine, SOSO engine and the telemetry system. The design and integration of telemetry and monitoring system is, at the time of writing this report, a work in progress.

1.1.2 Non-Technical Work
Non-technical work comprises WP1 and WP8 relating to management and coordination, and exploitation, dissemination and concertation respectively.

WP1 relates to the project management and coordination of the project and is led by UCC. The consortium agreement was signed and submitted prior to starting the project in M1 (Deliverable 1.4). Except for system issues in SyGMa as reported in 2 below, the administrative and financial management (Task 2.1) and the coordination and reporting to the EC has largely been performed as expected. Relevant financial summary tables and graphs were compiled in M1 (Deliverable 1.1.1), M12 (Deliverable 1.1.2) and now M18 (Deliverable 1.1.3) as per the DoA. The programme management infrastructure including systems to support appropriate coordination among partners, was put in place. Appropriate governance as set out in DoA was established including an Executive Board, Technical Board and Work Package teams. The Executive Board Meeting Minutes were recorded as per Deliverable 1.3.1 in M1 and M12. There is a detailed project plan which is maintained and reviewed periodically. The initial project plan and quarterly control reports have been collected and compiled as per the DoA in M3, M6, M9, M12, M15, M18 (Deliverables 1.3.2 – 1.3.7). There is active collaboration and communication between partners; over 450 files and 228 discussions are recorded in Basecamp and over 75 meetings, virtual and physical, have been held at various levels within the project. A detailed quality plan has been established and a formal risk assessment was initiated in M3 and updated in M11 as per the DoA (Deliverable 1.3.14). A second formal risk assessment will take place in M23. In addition, an ethics review was completed in M15 and is discussed in Section 6 of this report. No regulatory, policy-related or health and safety issues have been reported. The External Advisory Board was formed in M11 and comprises 16 members reflecting the various stakeholders identified in WP8. The first meeting was held in M14. As the project evolved, additional industry participants were added. Milestone 1.1 was met; milestone 1.2 relates to the submission of this mid-term report.

WP8 relates to exploitation, dissemination and concertation. The bulk of the exploitation activity will take place in the second phase of the project in line with the release of technical deliverables and results. An exploitation plan including strategic guidelines for exploitation was prepared as part of Milestone 8.2 which was completed on time. The associated deliverable (Deliverable 8.2.1) was delivered on time in M12. Three market briefings were prepared based on research in WP2 and will input in to the exploitation plan. 27 contacts have indicated that they will participate in the Delphi research (Milestone 8.2) scheduled to be completed by M30. Academic exploitation has commenced including activities within the EU, the US, Japan and Mexico (non-EU activities were funded separately); a gold open access text is planned for the final phase to support both exploitation and dissemination objectives. The first report on the CloudLightning Exploitation Plan (Deliverable 8.1.1) is available to download at http://cloudlightning.eu/portfolio-view/d8-1-18-1-4-exploitation-plan-and-strategic-guidelines-for-exploitation/.

During the first 12 months, a dissemination plan and communications toolkit was developed for the project. A core set of communication material was designed and disseminated including a brand style guide, the project website, project research posters, flyers, and templates. Localised materials were provided on demand e.g. Japanese. By M18, unique website visits and social network engagement was on target or exceeding expectations. Since M13, the project has focused on increasing awareness through online content marketing including three market briefings based on WP2, and guest articles in trade publications including Information Management and InsideHPC. In M18, the website was reviewed for usability and received a standard usability score (SUS) of 80.75 and identified issues are being addressed. The project has presented at 15 conferences or workshops targeting the research community and has had 3 journal articles accepted for publication. As the project enters the second phase of the project and the technical deliverables are refined, dissemination impact is expected to accelerate. Both milestones for dissemination (M8.1 and M8.2) due during the review period and Deliverable 8.2.1 were delivered on time. The first report on the CloudLightning Dissemination Plan (Deliverable 8.2.1) is available to download at http://cloudlightning.eu/portfolio-view/d8-2-18-2-3-dissemination/.

A concertation plan was prepared and submitted as Deliverable 8.3.1 in M12. CloudLightning is an active participant in the NATRES and Inter-cloud Challenges, Expectations and Issues clusters. As part of these clusters, CloudLightning contributed to three publications and participated in three cluster working group meetings. CloudLightning had a number of concertation activties with other EU projects including CACTOS, HARNESS, MO-BIZZ, MODAClouds, and DICE. Other related projects were invited to both write guest blogs for the CloudLightning website and present at CloudLightning meetings or related events. Separately, CloudLightning e-mailed 30 funding and standardisation organisations to build awareness of CloudLightning and to seek opportunities for knowledge sharing. The milestone (M8.2) and deliverable for concertation (Deliverable 8.3.1) during Phase 1 were delivered on time. The first report on the CloudLightning Concertation Plan (Deliverable 8.3.1) is available to download at http://cloudlightning.eu/portfolio-view/d8-3-18-3-3-concertation/.

The next deliverables due for WP8 are in M24 and are reports on CloudLightning Exploitation (Deliverable 8.1.2), Dissemination (Deliverable 8.2.2) and Concertation (Deliverable 8.3.2).

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

Firstly, we have designed and implemented a self organising, self-managing heterogeneous cloud architecture that supports separation of concerns. The CloudLightning system establishes a clear services interface between the service consumer and the service provider. The essence of this interface is the establishment of separation of concerns between the consumer and the provider. Thus, consumers should only be concerned with what they want to do, and providers should be concerned only with how that should be done.

Secondly, a description language was developed for specifying HPC workloads and for capturing general application blueprints. This language included facilities for expressing general service level agreement parameters and for capturing inter-service relationships such as collocation.  This SDL can be used for describing CloudLightning application Blueprints, including services constituting HPC workloads and an orchestration framework for the supporting those applications, while maintaining the separation of concerns between application lifecycle management and management of heterogeneous resources. The CL-SDL extends existing approaches through the definition of CL Blueprints. While the Blueprint mechanism is currently linked with Brooklyn blueprints, the CL-SDL extension is a TOSCA extension. As a service description language, by using the blueprints mechanism, CL-SDL enables usage of heterogeneous resource coalitions. The exploitation of the CL-SDL will be realised through the Gateway Service which, in turn, will offer the means to exploit both abstract and concrete CL-SDL specifications (blueprints), and trigger concrete specifications via a decomposition mechanism.

Thirdly, we created a local decision strategies framework that allows the flexible incorporation of future strategies in to any hierarchical architecture. This framework hosts and executes the self-organising and self-managing strategies associated with the various layers in the CloudLightning hierarchy. This framework is, itself, an additional scientific output and is a potential contribution to
the field of self-organising and self-managing systems. In operation, the framework is configured to use customised characteristic functions that embody the requirements and goals of a particular system design. Thus a mechanism is accommodated for incorporating steering information, which derives from the process of directed evolution.

Fourthly, coalition formation in the CL system moves beyond the state of the art in the following ways:

• The concept of resource coalitions is novel and allows for the co-formation, colocation and co-management of resources required to support process parallelism within a service.

• By persisting and re-using resource coalitions, the time to initiate a service is reduced.

•CloudLightning coalition formation attempts to reduce the overheads associated with overprovisioning by taking advantage of profiled services.

The Coalition Formation mechanism is transparently implemented across a number of cloud frameworks.

Contributions beyond state of the art in the chosen application domains:

Ray Tracing

We have developed a containerised workload orchestration model for applications targeting Xeon phi co-processors. This approach allows the applications packaged inside Linux containers to directly access the underlying co-processors using Kernel device pass-through. However, deciding the assignment of specific co-processors to an application is managed by central resource manager in order to optimise the utilisation of resources.

Oil and Gas

We have contributed to the extension of the Upscaling Module for the Open Porous Media (OPM) initiative. The Upscaling Module previously relied on the DUNE library for its solver. In CloudLightning, interfaces were re-written to work with PETSc enabling richer access to modern solvers. The Upscaling Module was ported to the NUMAScale cluster, taking advantage of its fast interconnects.

Genome Processing

We have developed a dataflow acceleration model for genomics processing. The Myer's algorithm is a compute-intensive string matching algorithm that is used for sequence alignment in the genomics domain. Since it is the bottleneck in the overall execution time it has been targeted on a Maxeler Dataflow Engine (DFE), resulting in significantly higher performance and improved energy efficiency compared to the original CPU-based implementation. This DFE-accelerated model will form the basis of an efficient cloud-based genomics processing service.

Contributions beyond state of the art in large scale simulation

We have derived a class of differential indices that can be used to characterise performance, power efficiency etc., and can scale seamlessly across different types of hardware. This can be potentially used in the large scale simulation of the proposed self-organising, self-managing cloud environment. The work package for large scale simulation work (WP7) does not commence until M20. However, it is expected that the CloudLightning simulation engine will move beyond the state of the art by enabling the simulation of cloud systems with heterogeneous resources.

IMPACT
CloudLightning remains confident that it is and will contribute to (i) the expected impacts listed in the call topic, (ii) the innovative capacity of the consortium members, (iii) the innovative capacity of European industry, and (iv) other European environmental and societal priorities, as detailed in our original proposal.

Proposed Impact 1: Significant higher quality of user experience
Impact Type: Listed in Call Topic
How it’s addressed: The CloudLightning deployment mechanism simplifies the operational overhead of deploying services in the resource fabric by eliminating the need for the Enterprise IT Decision Maker to know about the intricacies of the CSP’s infrastructure. CloudLightning reduces the complexity in deploying HPC workloads, both generally through the cloud and specifically using heterogeneous resources.

Proposed Impact 2: Demonstration of cloud-based services in federated, heterogeneous and multi-layered cloud environments; of the dynamic provisioning of interoperable applications and services over heterogeneous resources and devices, of high level of performance and quality of service even in highly secure solutions.
Impact Type: Listed in Call Topic
How it’s addressed: The CloudLightning project can demonstrate cloud-based services in a heterogeneous cloud including dynamic provisioning. The project will demonstrate end-to-end solutions in Phase 2 including performance studies using large scale simulations.

Proposed Impact 3: Increased innovation opportunities for service providers evidenced through implementations of advanced cloud infrastructures and services.
Impact Type: Listed in Call Topic
How it’s addressed: The CloudLightning system can dramatically increase energy efficiency resulting in a significantly more attractive cost structure for service providers, whether multinationals, SMEs or public sector service providers.

Proposed Impact 4: Promotion of the reuse of open source software solutions in cloud environments.
Impact Type: Listed in Call Topic
How it’s addressed: CloudLightning is building awareness for the use of the cloud for high performance computing and specifically the disruption of the lower end of the loosely-coupled workload sector, opening up the market for SMEs and public administrations.
CloudLightning intends to open source the majority of the system and extensions for associated use case applications from M24 onwards. CloudLightning is making use of and informing the OPM and OpenStack community, amongst others.

Proposed Impact 5: Demonstration through appropriate use cases of the potential to improve the competitive position of the European cloud sector.
Impact Type: Listed in Call Topic
How it’s addressed: The CloudLightning self-organising and self-optimising approach continues to be novel; the energy efficiency and performance impacts remain relevant. HPC in the cloud remains a small but fast growing part of the worldwide cloud market but one in which European cloud service providers can lead. CloudLightning will demonstrate its impact potential through at least three use cases, namely ray tracing, genome processing and oil and gas exploration. CloudLightning has the potential open up the market for these use cases to smaller organisation but also dramatically reduce the computational cost of these workloads.

Proposed Impact 6: Greater insight and information on the needs and preferences of customers and end-users of HPC services
Impact Type: Improving the Innovative Capacity of (i) Project Partners and (ii) European Industry
How it’s addressed: CloudLightning has completed both primary and desk research on the use cases including drivers and barriers to adoption of cloud computing for HPC. This has included over 350 participants including over 150 SMEs and 70 public sector decision-makers. By the end of the project, this will increase to knowledge on the needs and preferences of over 500 potential customers of HPC services. These insights include insights on the wider market and the agreed use cases specifically. These insights are being circulated and distributed to stakeholders on an ongoing basis.

Proposed Impact 7: Technical knowledge regarding how self-organising and self-managing principles can address the issues of energy efficiency and costs associated with over-provisioning in the cloud sector.
Impact Type: Improving the Innovative Capacity of (i) Project Partners and (ii) European Industry
How it’s addressed: CloudLightning is leading the discussion regarding how self-organising and self-managing principles can be applied to cloud computing. For many CloudLightning partners and wider stakeholder groups, this is novel and represents a science communication challenge in itself. CloudLightning has engaged widely with both the business and research community through its dissemination and concertation strategies. This will increase as technical deliverables and performance results are released in the second phase of the project.

Academic partners have commenced exploiting this knowledge both internally and externally by highlighting the innovation taking place within the EU to stakeholders in major markets including the USA, Japan and Mexico. The consortium is planning a gold open access book which will be distributed widely. Industrial partners are demonstrating their resources in a new architecture and in some instances novel use cases.

In the second phase of the project, CloudLightning’s impact in terms of performance and energy efficiency will be demonstrated through large scale simulations and communicated to stakeholders.

Proposed Impact 8: Realisation of significant energy and carbon dioxide savings.
Impact Type: Other European Environmental and Societal Priorities
How it’s addressed: The energy efficiencies generated by CloudLightning through the use of heterogeneous resources and reducing over-provisioning can dramatically increase data centre energy efficiency thus supporting the Europe 2020 Strategy. CloudLightning will demonstrate its energy efficiency through large scale simulations in Phase 2 of the project.

Proposed Impact 9: Fast, low-cost genomic processing
Impact Type: Other European Environmental and Societal Priorities
How it’s addressed: A cloud-based dataflow acceleration model for genomics processing is being developed as part of CloudLightning which we believe will reduce the overall execution time and improve energy efficiency for sequence alignment. This may make a significant contribution to fast low-cost genomic processing, a commonly cited key requirement for personalised medicine, control of infectious disease and the development of new medicines.

Proposed Impact 10: Increase in EU GDP and Job Creation
Impact Type: Other European Environmental and Societal Priorities
How it’s addressed: HPC in the cloud is one of the smallest but fastest growing segments in the market. CloudLightning will demonstrate a least three high-value use cases and build awareness of the potential for HPC in the cloud on the demand and supply side.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top