Community Research and Development Information Service - CORDIS

H2020

SSICLOPS Report Summary

Project ID: 644866
Funded under: H2020-EU.2.1.1.3.

Periodic Reporting for period 1 - SSICLOPS (Scalable and Secure Infrastructures for Cloud Operations)

Reporting period: 2015-02-01 to 2016-07-31

Summary of the context and overall objectives of the project

Context
Over the past decade, IT workloads have increasingly migrated to “cloud” infrastructures, i.e., homogeneous compute fabrics built from commodity servers, interconnected by Ethernet fabrics and supported by NAS and SAN storage backends, managed by control software such as OpenStack and Eucalyptus. Such cloud infrastructure exists in two flavors, public clouds and private clouds. Public cloud infrastructure is provided by global “hyper-giants” such as Amazon (EC2), Microsoft (Azure) and Google, but also by more regional providers such as major telecom operators. In addition to those cloud service providers, most enterprises and other large organizations prefer to run critical workloads on private cloud infrastructure in their own datacenters.

However, these companies have a hard time competing based on achievable performance for their clouds using commodity components and open-source systems at the same cost as the large-scale cloud providers with their custom-based solutions (leveraging economies of scale). Moreover, because of their smaller operations, enterprises running their private clouds do not have the resources to scale their operations quickly on demand, unless they would reach out to those public cloud providers they are trying to avoid in the first place.

The SSICLOPS project puts these companies into a better position and reduces the performance gap, offering a unique opportunity for European manufacturers and service providers to supply the market with the definitely urgently needed technology.

Objectives
The overall objective of SSICLOPS is to empower enterprises to create and operate high-performance private cloud infrastructure that allows flexible scaling through federation with other private clouds without compromising their service level and security. The SSICLOPS federation supports the efficient integration of clouds, no matter if they are geographically co-located or distributed, belong to the same or different administrative entities or jurisdictions: in all cases, SSICLOPS delivers maximum performance for inter-cloud communication, enforces legal and security constraints, and minimizes the overall resource consumption. In such a federation, individual enterprises will be able to dynamically scale in/out their private cloud services: because they dynamically offer own spare resources (when available) and take in resources from others when needed. This allows maximizing own infrastructure utilization while minimizing excess capacity needs for each federation member.
To realize this vision, SSICLOPS targets the following concrete objectives:
1. To build a framework for on-demand and pro-active scale-in/out in private clouds that supports enterprises in matching highly variable service demands without compromising service quality, while maximizing infrastructure utilization and minimizing excess capacity needs for private cloud providers. This framework encompasses:
a. a control plane for scheduling and migrating workloads within a cloud (intra-cloud) as well as across federated clouds (inter-cloud) per objectives 2 and 3;
b. a dataplane for efficient and secure data transport within a cloud, across different clouds, and towards the end user per objectives 4 and 5; and
c. tools for supporting application development such as different programming abstractions and for performance monitoring of applications and cloud infrastructure per objectives 6 and 7.
2. To provide models characterizing the static and dynamic properties of workloads and topologies of federated clouds. The static workload properties include the known resource demands (CPU, memory/storage, data), while the dynamic ones extend to the runtime footprint (memory, temporary storage). Using these properties, SSICLOPS will develop (a) workload scheduling algorithms for utility-driven workload placement as well as (b) mechanisms for workload migration including determining which workloads should be migrated and when migration shou

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

From the beginning of the project, work has progressed on the protocols and algorithms, driven by the use cases and their requirements regarding the needed infrastructure. All tasks on the use case scenarios, testing, simulation and experimentation facilities and the evaluation instrumentation and measurements are in progress with detailed descriptions of the use cases provided by partners. A common template is used to unify the use case descriptions. In particular, the mininet topologies, htsim environments, a CDN simulator, an instrumented storage system, and an OpenStack+SDN testbed have been established. Initial training on OpenStack was given by HPI.

In terms of protocols, a number of ideas have been generated and are currently in various stages of exploration. These include send-buffer size advertisements, congestion mitigation by dropping payload and retaining TCP headers only, and application agnostic offloading of packet processing into the kernel (UDP/TCP). Work was also done to develop faster and more resilient TCP connections between datacenters, as well as work on the client communication and content caching logic in the Accelerator.

A protocol evaluation system (for WP1, WP2, and WP3) was established using 10GE and 40GE NICs using network nodes with GPUs. Baseline measurements for different transport protocols were carried out.

Performance improvements have been looked at in various places of the networking stack. Activities here include the integration of netmap and the kernel TCP stack, fast switching in software, a PCIe DMA engine, speed mismatches between communicating virtual machines, and multi-core processing of packets. On a network-wide scope, an SDN controller for initiating and placing the optimal number of multipath subflows has been implemented.

Work has progressed on refining ideas and implementations for improving intra-cloud network performance. Various testbeds are being set up in order to evaluate the impact of the proposed mechanisms through the scenarios managed by WP4.

Work has been performed on the modelling of the network resources and topology including the definition of scenarios and requirements to be deployed with the given model.

Work was also performed on initial bandwidth and latency measurements on several NUMA systems along with defining key parameters to optimize in server-client communication. Instrumentation for performance measurements is progressing, covering PCI, network drivers, and software switches. Latency measurements of the networking stack and within unikernels have progressed well. Work on improving efficiency of the networking stack focused on rules for response creation, PCI-DMA driver, analyzing speed mismatches and packet scheduling are progressing well. StackMap is being integrated with the distributed database use case. Implementation of the NDP protocol continued.

In terms of policy languages, partners first assembled knowledge on secure cloud data storage and content distribution. A detailed study of related work was made and the requirements for a custom policy language were discussed. The design of the privacy policy language has been finished. Prototype implementation has begun and major efforts have been invested into submissions of policy work to different venues. Research efforts also covered the aspect of how applications like Hyrise-R and Cloud-Frameworks like OpenStack might implement such policies. To this end, the virtualized OpenStack testbed is undergoing major revisions in order to provide a multi-site testbed for further evaluation of policy implementations.

With regard to secure storage and processing, related work has been collected and opportunities for approaching this topic have been identified from a wide range of different approaches. Furthermore, a multi-path protocol has been investigated with the goal of impeding Man-in-the-Middle attacks.

An OpenStack testbed is being built at CERN and work is being performed in a variety

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

In line with Section 2.1.3 Ambition of the DoA, SSICLOPS is making progress beyond the state of the art in the following areas:
• Cloud (hypervisors/commodity and operating systems, to allow easy optimization for specific applications, incorporate data security as a fundamental building block and allow federations of private clouds to ensure scalability).
• Networking and Operating System Infrastructure (Datacenter networks, Transport Protocols, Sharing datacenter networks, High-speed packet processing).
• Distributed Computing Platforms (Elastic Cloud Storage Systems, Running In-Memory Database Management Systems on Cloud Storage, High performance computing in clouds).
• Securing data in clouds and across clouds
• Inter-cloud communication and computation.

Progress beyond the state of the art and expected potential impact as reported by the each partner are as follows:

UNIPI (Università di Pisa) has extended its netmap-based solutions for fast network I/O in VMs. A very fast link emulation has been implemented on top of netmap. An extensive model of I/O interactions in VMs has been studied, explaining several previously puzzling performance problems. Work has been done on the definition and prototype implementation of a novel packet scheduler architecture achieving high scalability through the separation of a centralized scheduler from parallelized packet I/O.

NEC (Nippon Electric Company) implemented an SDN controller which can initiate creation of the optimal number of MPTCP subflows and can place them optimally in the data centre. For inter-data centre connections, NEC developed an MPTCP proxy which splits up regular TCP connections into multiple MPTCP subflows in order to increase resilience and throughput. In addition, NEC released two pieces of software as open source: a high-speed software switch (mSwitch) and a CDN simulation system which can simulate the Internet’s AS-level topology.

AALTO (AALTO-KORKEAKOULUSAATIO) has developed an inter-cloud federation agent in an OpenStack environment that transparently and seamlessly interconnects the networks between different cloud systems. The inter-cloud federation agent is based on Software Defined Network techniques, and supports multipath operation to provide improved robustness. The platform allows high availability of services utilising the different cloud domains and solutions for optimising cost and performance of cloud services.

UH/CERN's (Helsingin Yliopisto) High-Energy Physics (HEP) scenario was examined for the effect of latency and throughput for resource usage and processing time. Both longer latency and lower throughput in the network increased the job energy usage and processing time. The developed Layer 2 interconnectivity for OpenStack instances makes it possible to run HEP jobs in separate OpenStack instances. In this way, HEP Virtual Machines (VM) can connect directly to each other’s resources even though they are running behind strict corporate firewalls.

HPI (Hasso Plattner Institut) is investigating resource management ranging from core to cloud, presented workload placement strategies for heterogeneous hardware and NUMA architectures on the intrasystem level. On the intersystem level, replication mechanisms for the Hyrise In-Memory database have been presented, enabling scale-out capabilities in cloud setups. Furthermore, we investigated implementation strategies for integrating policy concepts into a comprehensive use case built on top of OpenStack and Hyrise.

UCAM (University of Cambridge) focuses its work on reconfigurable hardware to support better instrumentation. As a first step, it released the first NetFPGA-SUME codebase which serves as the building block for future designs. Progress on the testing, fixing and integration of the new hardware DMA (Direct Memory Access) for the NetFPGA was also made. In addition, UCAM worked on frameworks to generate synthetic OpenFlow rules, to asses PCIe performances and to enable the e

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top