Periodic Reporting for period 1 - CloudSkin (Adaptive virtualization for AI-enabled Cloud-edge Continuum)
Reporting period: 2023-01-01 to 2024-06-30
• The CloudSkin platform will leverage novel AI/ML techniques to optimize workloads, resources, and network traffic in a holistic manner for a rapid adaptation to changes in application behaviour and data variability.
• The CloudSkin platform will also help users to achieve “stack identicality” across the continuum, where legacy software stacks running in data centres and HPC clusters (e.g. MPI) can seamlessly run at remote edges, and the code has not need to be rewritten for the targeted platform at hand. Not less important, this KET also pursues a high level of security, a critical requirement when processing data off-premises at the edge.
• CloudSkin will also contribute to instrument the storage infrastructure with hooks that enable optimizing end-to-end performance and other key performance indicators (KPIs). This includes developing novel storage systems that can cover an even wider range of use cases (e.g. bursty workloads, a real-time use case with improved fault-tolerance).
The various KETs will be showcased through four use cases that belong to different strategic domains in the EU: automotive; metabolomics; surgery; and agriculture.
To create consensus and escape the trap of fragmented views of the "problem," the major achievement has been the design of a layered architecture able to meet the three objectives of the project. In more detail, the main activities in each layer have been:
L3. Orchestration layer. A fundamental piece of the software stack is its AI-enabled orchestration layer. By M18, this layer contributes an early implementation of the Learning Plane capable of identifying the best provisioning and partitioning strategies between the cloud and edge servers for two use cases in the project: automotive and surgery.
L2. Execution layer. Another requirement of the platform is the ability to support applications capable of spanning the entire cloud-edge continuum. By M18, this layer contributes a fully functional implementation of an innovative execution abstraction built upon WebAssembly and termed C-Cells. C-Cells enable the execution of scientific applications (e.g. written in MPI) across the continuum for the very first time. In parallel, a prototype is under development to run TEE-protected C-Cells with Intel SGX.
L1. Infrastructure layer. The modern lightweight virtualization technology at the execution layer must be complemented with a varied set of storage abstractions that can support everything from real-time streaming to bursty workloads. At the current stage of the project, we highlight two activities: a new storage system termed GEDS specialized for the management of ephemeral data, and the addition of AI-enabled auto-scalability to Pravega, an open-source storage system that uses streams to store continuous data.
Additionally, all the use cases have an early PoC that leverages a subset of the components in the platform. To cover some needs, it has been necessary to develop new software components that do not exist on the market. Among them, we highlight a serverless AI model serving system termed Lithops Serve to run offline batch inference at scale.
C-Cells is going beyond SOTA as it allows to run scientific applications based on shared memory and message-passing programming models everywhere in the continuum with low-overhead migrations. This unique feature is interesting for cloud and edge stakeholders since such workloads have been incompatible with cloud and edge resource management. Moreover, TEE-protection of C-Cells is realized with SCONE. SCONE is SOTA technology for confidential computing (TRL8). Exploited by Scontain, SCONE, together with the Azure Confidential Computing service, leverages Intel SGX-enabled CPUs to provide solutions to protect applications at rest and at runtime. SCONE will create added value for C-Cells and other TEE-protected software in the project, which can also benefit from Scontain's market leadership in the cloud sector.
Lithops Serve is a new AI model serving system specialized for offline batch classification. Lithops Serve was born to fulfil the needs of the metabolomics use case due to the ill-suitability of de facto software stacks to do so. As distinguishing feature uses Function-as-a-Service (FaaS) technology to run batch inferences at scale. It is expected to put into production in the METASPACE platform (https://metaspace2020.eu/(opens in new window)) by the end of this year. METASPACE is the largest metabolomics open science platform with 2,770+ users from research laboratories and companies and 11,100+ datasets.
Additions to the CNCF Pravega system, such as AI-enabled instance auto-scalability, are exploitable. Pravega lies at the core of the Dell Streaming Data Platform (SDP) product (TRL8). SDP is one of the software offerings that are part of the Dell NativeEdge product, which is an automated, secure, multi-cloud edge operations software platform to help businesses centrally manage and securely scale their edge applications across multiple locations. This opens up the possibility of considering CloudSkin-related outcomes in the future roadmap of Dell products.
Finally, the NearbyOne orchestrator (TRL5) by Nearby Computing can improve its SOTA performance through the project contributions. For instance, the novel AI-oriented interfaces will allow the enhancement of NearbyOne orchestration capabilities, placing it in an advantageous position in the market of zero-touch service and network management.