Skip to main content

Scalable management Of LArge Scale cloud computing environments using enhanced software-defined networking technologies

Final Report Summary - SOLAS (Scalable management Of LArge Scale cloud computing environments using enhanced software-defined networking technologies)

Scalable management of large scale cloud computing environments using enhanced software-defined networking technologies (SOLAS)

FP7 Marie Curie Actions, Industry Academia Partnerships and Pathways Project
Grant agreement no: 612480
1 October 2013 - 30 September 2017
http://www.solas-project.eu

The SOLAS project was selected for funding under the FP7 Marie Curie Actions Industry Academia Partnership and Pathways (IAPP) strand during 2013. The project commenced in October 2013 and ran for four years until September 2017. The project partners were Waterford Institute of Technology (Ireland) [project coordinator], KTH — Royal Institute of Technology (Sweden), ETH Zürich (Switzerland), EMC Information Systems International (Ireland) [now known as Dell EMC following the company's acquisition by Dell Technologies] and Amadeus SAS (France). Over its four years, SOLAS involved fourteen researchers who spent periods working on secondment with a partner organisation from a different sector (academic or industry); during these secondment periods they not only collaborated on research topics but also accessed training and facilitated knowledge transfer between academia and industry.

Nowadays computer software plays a central role in the every day lives of the majority of citizens, be it through applications accessed via mobile phones, computers, TV, or through voice assistants. In most cases a significant portion of the processing necessary to deliver the appropriate content to the application user takes place in large data centres, accessed via the Internet. The SOLAS project addressed on resource management for data centres, which is widely acknowledge as a hard problem, due to the scale of the data centres (they may host hundreds of thousands of computers), the heterogeneity of the resource types and their interdependencies, the variability and unpredictability of the demand for application usage, and the range of objectives of the different actors in the business ecosystem. The project focussed on the potential for Software Defined Networking (SDN) to provide a base upon which resource management solutions could be built. Software Defined Networking is a technique that usually refers to and relies on the decoupling of the control plane functionality and the data/forwarding plane functionality in a network. Through SDN it is possible to provide a coherent view of available network resources and to make those resources highly programmable and thus better and more swiftly adapt to demands as they may change by the applications and services making use of the network for their connectivity needs. Application of SDN in data centre environments is attractive as it offers the potential to significantly improve the monitoring and control of networking resources, which today are extremely difficult to efficiently allocate in data centres and thus are the source of many performance issues.

The initial work of the project centred on exploring how the application of SDN could provide valuable point solutions for resource management problems in data centres. A first example is a solution for the live migration of ensembles of Virtual Machines (and associated Virtual Switches and Virtual Disks) that together deliver an application’s functionality. Noting that migration of virtual disks can take a significant amount of time a system that harnessed SDN to manage the cloning of virtual switches at the source and destination so that the cloned switches appeared to the virtual machines as a single logical switch. A prototype was built of live ensemble migration controller built on top of OpenFlow.
A second example is an SDN enabled system that seeks to ensure that Quality-of-Service (QoS) targets for applications are met when multi paths are used to route traffic between virtual machines in a data centre. SOLAS researchers developed and prototyped a latency-aware flow scheduling system that schedules flows based on an application’s QoS target and empirical estimations of the effective bandwidth required to meet these QoS targets. In particular, the system incorporates an SDN-based flow schedule that monitors the network changes in switches and dynamically resettles flows to meet the QoS requirements, whilst balancing link utilisation. Thus, where the traffic profile between the virtual machines realising application functionality change over time, we can ensure that any negative impacts such as increased latency are avoided or minimised.

A parallel strand of research explored how data gathered by SOLAS partner Amadeus for its large data centre could be used to gain insights in to the performance of the data centre and to assess the potential benefits, if any, of deployment of SDN technologies. The first problem considered in this strand was how to measure the globally of the kind of On-Line Transaction Processing (OLTP) workloads handled by enterprise data centres such as the one operated by Amadeus. Globality in this context is, informally, a measure of the degree to which a single transaction utilises resources that are distributed across the data centre network topology; generally low globality will be desirable as it will localise transactions to a particular part of the data centre, making it easier to minimise latency. SOLAS researchers developed a globality measure, with a view towards its use in the quantification of the degree to which an SDN based resource management solution decreases globality of transactions and thus increase efficiency and performance. Subsequently, SOLAS researchers developed a simulation-based model of the Amadeus data centre which captured the complexity of a representative subset of the transaction types, which was used to show that a judicious adoption of SDN could provide a means of prioritising transactions in danger of exceeding latency targets, thus minimising violations of QoS targets. A separate piece of work demonstrated an alternative means of monitoring a data centre—SOLAS researchers extended a previous approach called “network search,” which enables searching operational and configuration data in a networked system in real-time using keywords and relational operators, to facilitate searching of time series data.

Whilst developing metrics, monitoring approaches and simulation models is very worthwhile, SOLAS also contributed significantly to the development of a novel approach involving the use of queryable online simulation that allows predicting the behaviour of a data centre in hypothetical scenarios. The developed system, Strymon (which has been open-sourced and is available from: http://strymon.systems.ethz.ch/) leverages existing logging and monitoring pipelines of modern production datacenters to ingest cross-layer events in a streaming fashion and predict possible effects of such events in what-if scenarios. The actual predictions are made online by simulating the hypothetical datacenter state alongside the real one. Furthermore, SOLAS researchers built a routing prototype using Strymon that shows a marked improvement in performance over that of state-of-the-art SDN-based routing schemes. The prototype express routing as an incremental dataflow computation on a stream of network updates. It proactively runs all-pairs shortest path, instead of reactively computing single-source shortest path per flow request as most SDN controllers do. As results, Strymon’s module has latency an order of magnitute lower compared to ONOS, a state-of-the-art controller.

The above describes some of the main outputs of the SOLAS project. For further information please contact Dr Brendan Jennings, SOLAS Project Coordinator, Waterford Institute of Technology. Tel: +353 51 302917, Email: bjennings@wit.ie.