Community Research and Development Information Service - CORDIS

H2020

SSICLOPS Report Summary

Project ID: 644866
Funded under: H2020-EU.2.1.1.3.

Periodic Reporting for period 1 - SSICLOPS (Scalable and Secure Infrastructures for Cloud Operations)

Reporting period: 2015-02-01 to 2016-07-31

Summary of the context and overall objectives of the project

Context
Over the past decade, IT workloads have increasingly migrated to “cloud” infrastructures, i.e., homogeneous compute fabrics built from commodity servers, interconnected by Ethernet fabrics and supported by NAS and SAN storage backends, managed by control software such as OpenStack and Eucalyptus. Such cloud infrastructure exists in two flavors, public clouds and private clouds. Public cloud infrastructure is provided by global “hyper-giants” such as Amazon (EC2), Microsoft (Azure) and Google, but also by more regional providers such as major telecom operators. In addition to those cloud service providers, most enterprises and other large organizations prefer to run critical workloads on private cloud infrastructure in their own datacenters.

However, these companies have a hard time competing based on achievable performance for their clouds using commodity components and open-source systems at the same cost as the large-scale cloud providers with their custom-based solutions (leveraging economies of scale). Moreover, because of their smaller operations, enterprises running their private clouds do not have the resources to scale their operations quickly on demand, unless they would reach out to those public cloud providers they are trying to avoid in the first place.

The SSICLOPS project puts these companies into a better position and reduces the performance gap, offering a unique opportunity for European manufacturers and service providers to supply the market with the definitely urgently needed technology.

Objectives
The overall objective of SSICLOPS is to empower enterprises to create and operate high-performance private cloud infrastructure that allows flexible scaling through federation with other private clouds without compromising their service level and security. The SSICLOPS federation supports the efficient integration of clouds, no matter if they are geographically co-located or distributed, belong to the same or different administrative entities or jurisdictions: in all cases, SSICLOPS delivers maximum performance for inter-cloud communication, enforces legal and security constraints, and minimizes the overall resource consumption. In such a federation, individual enterprises will be able to dynamically scale in/out their private cloud services: because they dynamically offer own spare resources (when available) and take in resources from others when needed. This allows maximizing own infrastructure utilization while minimizing excess capacity needs for each federation member.
To realize this vision, SSICLOPS targets the following concrete objectives:
1. To build a framework for on-demand and pro-active scale-in/out in private clouds that supports enterprises in matching highly variable service demands without compromising service quality, while maximizing infrastructure utilization and minimizing excess capacity needs for private cloud providers. This framework encompasses:
a. a control plane for scheduling and migrating workloads within a cloud (intra-cloud) as well as across federated clouds (inter-cloud) per objectives 2 and 3;
b. a dataplane for efficient and secure data transport within a cloud, across different clouds, and towards the end user per objectives 4 and 5; and
c. tools for supporting application development such as different programming abstractions and for performance monitoring of applications and cloud infrastructure per objectives 6 and 7.
2. To provide models characterizing the static and dynamic properties of workloads and topologies of federated clouds. The static workload properties include the known resource demands (CPU, memory/storage, data), while the dynamic ones extend to the runtime footprint (memory, temporary storage). Using these properties, SSICLOPS will develop (a) workload scheduling algorithms for utility-driven workload placement as well as (b) mechanisms for workload migration including determining which workloads should be migrated and when migration should take place.
3. To provide means for specifying constraints for workloads and their data in terms of security requirements, geographic restrictions, and other properties that will be adhered to for processing, storing, and exchanging data in the private cloud. SSICLOPS will define a language to annotate workloads with metadata that describe additional requirements for data processing such as in which jurisdictions data may be stored and processed. These metadata will be interpreted by the SSICLOPS federation and scheduling infrastructure to ensure that workload placement and migration decisions are in line with the specific constraints.
4. To develop an efficient and secure intra-cloud dataplane by combining specialized protocol design with smart interaction with the underlying datacenter network fabric (e.g., using SDN traffic engineering and cloud-tailored smart queue management).
5. To develop a hardened dataplane tailored to inter-cloud transport and transport towards clients along with supportive mechanisms in the network infrastructure. These mechanisms will differ from those developed for objective 4, for a number of reasons. First, latencies are orders of magnitude higher and capacity may be much less predictable. Second, the communication takes place across the Internet so that the traffic must be “friendly” to other Internet traffic. Third, the protocols must be compatible with the expectations of routers, firewalls and other devices on the path for not to be dropped. Fourth, the control mechanisms (if any) for traffic engineering between datacenters are less elaborate. Fifth, the open network traversed sees a broader spectrum of adversaries to defend against.
6. To provide different application programming interfaces (APIs) to allow newly developed applications to maximize networking performance while preserving backward compatibility so that existing applications can continue operation as is (without the need for recompiling).
7. To offer tools for measuring the performance of cloud systems for both optimization at system design time and monitoring during live system operation. In particular, we are investigating software instrumentation techniques, integration with CPU performance counter facilities, developing new bus monitoring approaches, adding new instrumentation to NICs and switches, and developing distributed coordination and analysis approaches.
8. To validate the SSICLOPS results in four different scenarios with diverse workloads and system setups of - or derived from - the real world. The scenarios have been chosen to reflect a broad spectrum of industry and scientific use cases:
a. Using an in-memory database system for processing large industrial real-world data volumes at HPI.
b. Performing high-performance computing tasks for high-energy physics workloads using data from the CERN Large-Hadron Collider (LHC) at HIP.
c. Instantiating content delivery and caching networks for service providers to flexibly support cloud services towards end users and for cloud-internal processing at Deutsche Telekom and F-Secure.
d. Network Function Virtualization for providing network functions, service enablers and services at a next generation point of presence at Orange, Poland.
9. To contribute the project results as open source to the standard platforms used by the private cloud providers to allow for easy and broad adoption of the project results.
10. To contribute the technical designs to the appropriate standardization bodies and industry forums to ensure a lasting consensus. The SSICLOPS partners serve in many different leadership positions in the IETF and IRTF as well as in industry forums such as OpenStack.
11. To carry the project results into the (academic) community through publications in renowned venues and by introducing the concepts, tools and open source systems broadly into the academic education of future engineers and scientists.
12. To demonstrate the commercial value of the project results with a commercial industry evaluation.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

From the beginning of the project, work has progressed on the protocols and algorithms, driven by the use cases and their requirements regarding the needed infrastructure. All tasks on the use case scenarios, testing, simulation and experimentation facilities and the evaluation instrumentation and measurements are in progress with detailed descriptions of the use cases provided by partners. A common template is used to unify the use case descriptions. In particular, the mininet topologies, htsim environments, a CDN simulator, an instrumented storage system, and an OpenStack+SDN testbed have been established. Initial training on OpenStack was given by HPI.

In terms of protocols, a number of ideas have been generated and are currently in various stages of exploration. These include send-buffer size advertisements, congestion mitigation by dropping payload and retaining TCP headers only, and application agnostic offloading of packet processing into the kernel (UDP/TCP). Work was also done to develop faster and more resilient TCP connections between datacenters, as well as work on the client communication and content caching logic in the Accelerator.

A protocol evaluation system (for WP1, WP2, and WP3) was established using 10GE and 40GE NICs using network nodes with GPUs. Baseline measurements for different transport protocols were carried out.

Performance improvements have been looked at in various places of the networking stack. Activities here include the integration of netmap and the kernel TCP stack, fast switching in software, a PCIe DMA engine, speed mismatches between communicating virtual machines, and multi-core processing of packets. On a network-wide scope, an SDN controller for initiating and placing the optimal number of multipath subflows has been implemented.

Work has progressed on refining ideas and implementations for improving intra-cloud network performance. Various testbeds are being set up in order to evaluate the impact of the proposed mechanisms through the scenarios managed by WP4.

Work has been performed on the modelling of the network resources and topology including the definition of scenarios and requirements to be deployed with the given model.

Work was also performed on initial bandwidth and latency measurements on several NUMA systems along with defining key parameters to optimize in server-client communication. Instrumentation for performance measurements is progressing, covering PCI, network drivers, and software switches. Latency measurements of the networking stack and within unikernels have progressed well. Work on improving efficiency of the networking stack focused on rules for response creation, PCI-DMA driver, analyzing speed mismatches and packet scheduling are progressing well. StackMap is being integrated with the distributed database use case. Implementation of the NDP protocol continued.

In terms of policy languages, partners first assembled knowledge on secure cloud data storage and content distribution. A detailed study of related work was made and the requirements for a custom policy language were discussed. The design of the privacy policy language has been finished. Prototype implementation has begun and major efforts have been invested into submissions of policy work to different venues. Research efforts also covered the aspect of how applications like Hyrise-R and Cloud-Frameworks like OpenStack might implement such policies. To this end, the virtualized OpenStack testbed is undergoing major revisions in order to provide a multi-site testbed for further evaluation of policy implementations.

With regard to secure storage and processing, related work has been collected and opportunities for approaching this topic have been identified from a wide range of different approaches. Furthermore, a multi-path protocol has been investigated with the goal of impeding Man-in-the-Middle attacks.

An OpenStack testbed is being built at CERN and work is being performed in a variety of areas including analyzing current solutions, automation installation, accelerator design and implementation, client interfaces, scheduling of instances, developing and running networking tests, collecting performance metrics, and improving resource utilization. Measurements of sample traces of e2e delays have been taken and the eager replication approach and elastic docker cluster for Hyrise-R have been implemented.

The kernel offloading testbed has been extended and upgraded to 10Gbps. Better test automation and evaluation has been implemented with further investigation to follow.

A test environment for High Energy Physics (HEP) computing tools and for benchmarking different computing scenarios has also been set up and an instance of a private cloud for further development of the vCPE use case has been deployed.
The federated testbed started working, and there is an initial implementation of the secure multipath communication protocol and further work focuses on proxies and client-facing traffic acceleration.

The first results for the solutions developed by the partners in the SSICLOPS project are presented and evaluated. The improvement in solving problems and bottlenecks indicated by SSICLOPS are discussed in detail and concluded. Deliverable D4.2 was submitted to EU.

All of the scheduled deliverables and milestones have been completed.

Management Status
The project started with a successful Kick-Off meeting which was held on 11th -13th February 2015 at NEC in Heidelberg. Subsequent combined meetings of the Project Management Board (PMB), General Assembly (GA) and Plenary took place as follows:
• At Aalto in Helsinki from 3rd - 5th June 2015.
• At NetApp in Munich from 23rd - 25th November 2015
• At UCAM in Cambridge from 21st - 23rd March, 2016.

PMB teleconference calls are held on the last Friday of each month.
WP teleconference calls are also held on a monthly basis (more frequently when deliverables are due).
An interim project technical review took place on 19th November 2015 with the PO in Brussels.
Contract Amendment 1 was approved on 15th October 2015 to take account of the unforeseen sale of F-Secure‘s personal cloud business to Synchronoss Technologies, which led to a re-distribution of their resources. F-Secure was no longer able to support the use case on efficient and secure cloud bursting, and Deutsche Telekom agreed to provide a new use case on CDN caching. Also, the University of Pisa joined the project as a new partner in order to replace some cloud competence lost from F-Secure.

At the same time, the following non-related changes were requested and accepted:
• Joerg Ott took PMs and budget with him from Aalto University to the Technical University of Munich (TUM), as a result of his new appointment. TUM became a new partner in the project.
• The University of Cambridge streamlined their allocation of PMs into fewer WPs.
• NetApp shifted 180K of Consumables (for a 1Gbps access line to their premises) into Equipment (testbed), and used the high capacity link to TUM instead.
• Every partner reduced their travel budget by approximately 5% in order to allocate €26K travel money for the Advisory Board members.

Contract Amendment 2 was approved on 2nd March 2016. This Amendment confirms that SSICLOPS opts out of the Open Research Data Pilot and corrects the University of Helsinki’s (HIP) method of claiming personnel costs (average, not actual).
The list of Advisory Board members was refined and they were invited to a first Advisory Board meeting co-located with SIGCOMM in London (17th - 21st August, 2015), where some current papers from SSICLOPS were presented.

Dissemination Status

As part of the dissemination activities, the SSICLOPS partners have been successful in presenting multiple papers at prestigious top level international conferences, workshops, scientific journals, invited talks as well as producing many internet drafts. There has been 17 standard contributions, organisation of 3 workshops (including Dagstuhl 16012 seminar and Dagstuhl workshop), 2 press releases, 23 participations to a conference or workshop, 9 invited talks, 2 open source codes disseminated , 1 bachelor thesis, 2 conference posters, 1 Twitter social media account and 1 project website,. There are 17 future disseminations which have been submitted to or will be submitted to conferences and scientific journals.

Additionally, two pieces of software have been released to the public. An accelerated software switch (called “mSwitch”, http://cnp.neclab.eu/vale) is available as part of the netmap/VALE software distribution at https://github.com/luigirizzo/netmap. The CDN simulator code has been released at https://github.com/cnplab/cdnsim.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

"In line with Section 2.1.3 Ambition of the DoA, SSICLOPS is making progress beyond the state of the art in the following areas:
• Cloud (hypervisors/commodity and operating systems, to allow easy optimization for specific applications, incorporate data security as a fundamental building block and allow federations of private clouds to ensure scalability).
• Networking and Operating System Infrastructure (Datacenter networks, Transport Protocols, Sharing datacenter networks, High-speed packet processing).
• Distributed Computing Platforms (Elastic Cloud Storage Systems, Running In-Memory Database Management Systems on Cloud Storage, High performance computing in clouds).
• Securing data in clouds and across clouds
• Inter-cloud communication and computation.

Progress beyond the state of the art and expected potential impact as reported by the each partner are as follows:

UNIPI (Università di Pisa) has extended its netmap-based solutions for fast network I/O in VMs. A very fast link emulation has been implemented on top of netmap. An extensive model of I/O interactions in VMs has been studied, explaining several previously puzzling performance problems. Work has been done on the definition and prototype implementation of a novel packet scheduler architecture achieving high scalability through the separation of a centralized scheduler from parallelized packet I/O.

NEC (Nippon Electric Company) implemented an SDN controller which can initiate creation of the optimal number of MPTCP subflows and can place them optimally in the data centre. For inter-data centre connections, NEC developed an MPTCP proxy which splits up regular TCP connections into multiple MPTCP subflows in order to increase resilience and throughput. In addition, NEC released two pieces of software as open source: a high-speed software switch (mSwitch) and a CDN simulation system which can simulate the Internet’s AS-level topology.

AALTO (AALTO-KORKEAKOULUSAATIO) has developed an inter-cloud federation agent in an OpenStack environment that transparently and seamlessly interconnects the networks between different cloud systems. The inter-cloud federation agent is based on Software Defined Network techniques, and supports multipath operation to provide improved robustness. The platform allows high availability of services utilising the different cloud domains and solutions for optimising cost and performance of cloud services.

UH/CERN's (Helsingin Yliopisto) High-Energy Physics (HEP) scenario was examined for the effect of latency and throughput for resource usage and processing time. Both longer latency and lower throughput in the network increased the job energy usage and processing time. The developed Layer 2 interconnectivity for OpenStack instances makes it possible to run HEP jobs in separate OpenStack instances. In this way, HEP Virtual Machines (VM) can connect directly to each other’s resources even though they are running behind strict corporate firewalls.

HPI (Hasso Plattner Institut) is investigating resource management ranging from core to cloud, presented workload placement strategies for heterogeneous hardware and NUMA architectures on the intrasystem level. On the intersystem level, replication mechanisms for the Hyrise In-Memory database have been presented, enabling scale-out capabilities in cloud setups. Furthermore, we investigated implementation strategies for integrating policy concepts into a comprehensive use case built on top of OpenStack and Hyrise.

UCAM (University of Cambridge) focuses its work on reconfigurable hardware to support better instrumentation. As a first step, it released the first NetFPGA-SUME codebase which serves as the building block for future designs. Progress on the testing, fixing and integration of the new hardware DMA (Direct Memory Access) for the NetFPGA was also made. In addition, UCAM worked on frameworks to generate synthetic OpenFlow rules, to asses PCIe performances and to enable the execution of high-level descriptions (written in c#) of network services on reconfigurable FPGA hardware. Finally, UCAM studied the many contributors to network latency, by providing a break-down of end-to-end latency from the application level to the wire.

TDG's (Telekom Deutschland Gmbh) work on efficient usage of caches for content delivery via clouds and CDNs is proposing new caching methods with low update effort for the standard Least Recently Used (LRU) strategy. Assuming independent Zipf distributed requests by large user communities, we confirm 10-20% absolute LRU hit rate deficits compared to the proposed score-based strategies with statistics on past requests. We are extending the results for traces of request pattern on popular web platforms including Wikipedia.

NetApp (Network Appliance) has been focusing on reducing the communication latency of transactional workloads, through a combination of approaches that target commodity OSs. They include soft switch improvements (mSwitch), a new low-latency communications framework (StackMap), an efficient approach for scaling out over TCP (Prism), and an investigation into how NVM can further reduce latencies. NTAP is also collaborating with HPI and other partners to incorporate those advances into the prototypes.

RWTH (Rheinisch Westfaelische Technische Hochschule Aachen) has worked on the application agnostic packet processing engine, which was implemented in the Linux kernel, allowing faster responses to common requests by providing generic match and response composition capabilities to applications. HTTP (TCP) and DNS (UDP) use cases have shown speedups by up to 150% and 450%, respectively. Our policy language design focuses on applicability in general scenarios, extensibility to future developments, support of provider expectations, efficient processing, and a small memory footprint to enable fine-grained privacy policies as data-annotations.

OPL (Orange Polska) develops virtual Home Gateway (vHGW). The idea of an NFV-based vHGW assumes that most of the traditional advanced HGW logic is shifted from the physical device to network resources (e.g. commercial of-the-shelf servers). The SSICLOPS vHGW implementation is based on open source components. It leverages Openstack software, Open vSwitch instances, native Linux networking mechanisms (e.g. network namespaces), and VxLAN tunneling, etc.

FSC (F-Secure) 's key outcome has been its approaches to content distribution and caching for use cases where the most frequently accessed item set is highly dynamic and the clients require low response latency. A caching content distribution server was developed, and its performance was validated in the production use by F-Secure's mobile security clients. The approaches and observations based on the validation results contribute to the best practices for content distribution in similar use cases, and the implementation will serve as a testbed for future optimization of caching strategies.

UPB (Universitatea Politehnica Din Bucuresti) has worked on three main directions in the project. The first direction is exploring the use of multipath communication to provide improved security without PKI support (secure multipath communication). Here the state of the art has looked at using opportunistic encryption on a single path (TCPCrypt) to make the task of eavesdropping much harder by forcing attackers to become mount active man-in-the-middle attacks. We take a complementary approach and ask what can be done when multiple communication paths are available where each path has its own attacker. This problem has not been explored in the literature until now. Specifically, in the context of two paths, we show that when the two attackers mount active man-in-the-middle attacks but do not communicate we can either establish a secure key or detect the attack, and provide a protocol that achieves this goal.

The second direction proposes a novel datacenter networking architecture called NDP that has low latency and high utilization as its main goals. The main difference to related work is that NDP changes the way switches handle packets: when buffers are full, packets are trimmed to their headers which are then priority forwarded to the destination. By leveraging this novel feature of the networking fabric, we have developed a protocol where the sender starts aggressively (sends at line rate in the first RTT) and the traffic is then paced by the receiver which paces pull packets to all its senders such that there is no overload after the first round-trip time. In collaboration with UCAM, we have implemented the proposed switch design in hardware on a NetFPGA platform.

In the third direction, UPB (Universitatea Politehnica Din Bucuresti) has worked in changing the API between the cloud-provider and the tenant in Cloudtalk (in submission) and TCP sendbuffer advertising (HotCloud 2015). Both works are based on the observation that there is very little communication about tenant needs or cloud provider resource availability and proposes novel APIs to enable cloud-tenant optimisations that are otherwise not possible. In CloudTalk, the tenant specifies a few possible ways to perform its work and the cloud provider suggest the best approach; in TCP sendbuffer advertising all flows carry the bytes remaining in the send buffer in every packet, and this can be used by the cloud network to optimise various aspects.

TUM (Technische Universitat Munchen) started with a workforce on March 1st, 2016 and used the last five months in building up their eventual environment and get baseline tests done. This will result in TUM gradually moving towards technical contributions beyond the state of the art."

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top