Skip to main content

Reliable Capacity Provisioning and Enhanced Remediation for Distributed Cloud Applications

Periodic Reporting for period 2 - RECAP (Reliable Capacity Provisioning and Enhanced Remediation for Distributed Cloud Applications)

Reporting period: 2018-07-01 to 2019-12-31

"Large-scale computing systems are today built from servers hosted in data centres for reasons of scale, cost and energy efficiency. Software components and software services are distributed as well and accessed over the Internet. Since the announcement of cloud computing, we have seen advances, but still resources are often provisioned using best effort models. This limitation is a major hindrance of the evolution of IoT and the networked society. More drastically, it has also manifested in limited cloud adoption of systems with demands beyond best-effort such as telco systems.
Addressing these challenges for the upcoming cloud-edge era requires an understanding of network topology, user and application behaviour in order to understand where in the network to place compute resources. It further demands understanding of application properties and infrastructure capabilities in order to understand what compute resources to provide. As edge data centres will have smaller scale than today’s cloud data centres, a major question to be answered is which parts of which applications should be run at the edge at any given time.
Overall, RECAP has addressed four goals: (i) provisioning of applications and workload models as well as a modelling framework for self-adaptive mechanisms that enables the creation and simulation-based evaluation of these models. (ii) enablement of resource- and energy efficient provisioning of infrastructure capacity in the planning phase of an IT infrastructure; but also in the operational phase where current application demands are continuously revisited and weighted against each other. (iii) simulation-support for large-scale distributed edge and cloud computing environments. (iv) In order to foster research in the area of workload prediction, infrastructure planning, and application characterization, RECAP has released several gigabytes of data from their use cases and provided as OpenData (https://zenodo.org/record/3458559#.XjgbOmhKguU).
RECAP has provided major contributions to the development of the next generation of cloud and edge computing capacity provisioning. It provides an overarching methodology for creating, simulating, and enacting a set of six models that capture the behaviour of applications, users, applications, and IT infrastructure. The RECAP reference architecture defines the interaction between the various RECAP tools focussing on a strong separation of concerns and easy integration into existing systems. This led to the following results and architectural components shown in the figure:
The Landscaper Component (1) acquires information on the state and configuration of the physical and virtual infrastructure resources and represents the same in a graph database. The Monitoring Component (2) collects telemetry from applications and infrastructure These are input to the optimisers. The output from the optimizers is used to orchestrate and enact resource changes in the cloud network. The Application Optimiser (3) is used to derive the optimal application configuration for a specific application (hence, there is an individual optimizer per deployed application) including scaling decisions. Applications can be scaled locally or globally and may be in response to run-time traffic limits or resource levels being reached. Application Optimizers may make use of predictors to base their decisions not only on current, but also on future workload. The Orchestrator (11) collects suggestions from multiple Application Optimizers and merges them into input for the Infrastructure Optimiser (4). The Infrastructure Optimiser ingests and augments these placement suggestions by considering additional and more granular information pertaining to the available physical infrastructure, infrastructure specific features, infrastructure policies and SLAs. Its decisions reache the Enactor (12) responsible for mapping tasks to a specific backend such as OpenStack or Kubernetes. The Simulator (5) is utilised by the Infrastructure Optimiser (4) to formulate deployment mapping selections and calibrate its algorithmic process"
The beginning of the project focussed on gaining understanding of the use cases and their requirements. Web site and flyers were created (MS2). The start of simulation work led to the design of a simulation architecture as well as infrastructure models. WP5, WP6, WP8 started in M9 of the project. The second quarter of the project provided the project testbed (MS3) enabling the collection of monitoring data. Initial models for describing aspects of a (geo-)distributed infrastructure, distributed, elastic applications, as well as users accessing them, and constraints imposed by their operators were created. These models guided technical work and resulted in various tools and individual prototypes. The prototypes were composed to a common, overarching architecture (MS4).
The third quarter was characterized by a refinement of the use case requirements and the definition of scenarios for the validation of RECAP based on the use cases as well as the production of an integrated prototype (MS5). In parallel all technical work packages (WP4-WP8) continued research/technical work refining existing models and expanding results to other work packages, generating improved prototypes. The installation of an improved testbed led to MS6 and started the validation phase of the project. At the end of the project, the final version of the architecture has been released (MS7) as well as the system demonstrators (MS8). Several GB of RECAP data sets as presented in D5.3 have been made publicly accessible: https://zenodo.org/record/3458559
Validation by applying the RECAP method in several use cases show that RECAP is able to save up large amounts of hardware and infrastructure investments (up to 60%) while increasing the utilization of existing infrastructures by more than one quarter.
RECAP results were presented at 25 conferences, 13 workshops and in 9 journal publications. Furthermore, RECAP partners attended 17 trade conferences and distributed more than 1,000 flyers. Since the beginning of the project more than 16,000 unique visitors were recorded on the website and more than 1,000 followers could be gathered across all social media platforms. Overall, we estimate that we reached an audience of more than 45 thousand people from science, industry, and other fields.
RECAP realizes a novel concept in the provisioning and operation of cloud-edge services. The project could show an increased and predictable performance of cloud offerings, facilitating deployment of critical applications and services. Through the use of simulation and emulation as well as the use of automation, RECAP was able to achieve increased trust in clouds.
Experimentation across a variety of settings showed that RECAP can enhance Quality of Experience. RECAP outcomes are being open sourced and dissemination and exploitation has addressed SMEs and public sector. This leads to readiness for adoption of project outcomes. By applying RECAP to appropriate use cases we demonstrate the potential to improve the competitive position of the European cloud sector.
Regarding societal impact, the application of RECAP methods and tools could show a significant reduction of required servers, resources, and consequently, power consumption leading to a reduced CO2 footprint while maintaining the same or achieving improved Quality of Experience.
architecture-flow.png