Project description
An open and portable cloud management framework with smart scheduling
Cloud computing has grown enormously over the past decade. This increase has been accompanied by a rise in new applications that rely on specialised hardware. Meanwhile, user requirements like security and location awareness are becoming common in smart cities, industrial automation and data analytics. Also, modern cloud applications are complicated and require a new cloud management framework. In this context, the EU-funded DECICE project will optimise the placement of workloads across the heterogeneous hardware landscape including cloud, edge and HPC. Bringing together 13 partners from Austria, Germany, Italy, Sweden, Turkey and the United Kingdom, the project will use a digital twin of the system to create a virtual training environment to test data for the training of machine learning models and the exploration of what-if scenarios.
Objective
The cloud computing industry has grown massively over the last decade and with that new areas of application have arisen. Some areas require specialized hardware, which needs to be placed in locations close to the user. User requirements such as ultra-low latency, security and location awareness are becoming more and more common, for example, in Smart Cities, industrial automation and data analytics. Modern cloud applications have also become more complex as they usually run on a distributed computer system, split up into components that must run with high availability.
Unifying such diverse systems into centrally controlled compute clusters and providing sophisticated scheduling decisions across them are two major challenges in this field. Scheduling decisions for a cluster consisting of cloud and edge nodes must consider unique characteristics such as variability in node and network capacity. The common solution for orchestrating large clusters is Kubernetes, however, it is designed for reliable homogeneous clusters. Many applications and extensions are available for Kubernetes. Unfortunately, none of them accounts for optimization of both performance and energy or addresses data and job locality.
In DECICE, we develop an open and portable cloud management framework for automatic and adaptive optimization of applications by mapping jobs to the most suitable resources in a heterogeneous system landscape. By utilizing holistic monitoring, we construct a digital twin of the system that reflects on the original system. An AI-scheduler makes decisions on placement of job and data as well as conducting job rescheduling to adjust to system changes. A virtual training environment is provided that generates test data for training of ML-models and the exploration of what-if scenarios. The portable framework is integrated into the Kubernetes ecosystem and validated using relevant use cases on real-world heterogeneous systems.
Fields of science
Keywords
Programme(s)
Funding Scheme
RIA - Research and Innovation actionCoordinator
37073 Gottingen
Germany