Skip to main content

Improving Performance and Cost of Content Delivery in a Hyperconnected World

Final Report Summary - CDN-H (Improving Performance and Cost of Content Delivery in a Hyperconnected World)

In the Internet content was and is still recognized as “the King”. Indeed, the tremendous traffic growth over the last decades has been driven by the continues and still growing user demands for content, including the Web, videos, file sharing, social networks, games, etc. Moreover, people as well as companies rely on the Internet and its services and it is hard to imagine what a prolonged Internet outage would do to our information society. In today’s hyperconnected world a substantial fraction of the global population is almost continuously online, e.g. for work, shopping, entertainment. Thus, the Internet has become the predominant channel for innovation, disruptions, and creating new revenue streams.

In this project we performed a large-scale measurement-driven study to understand how content flows in the Internet, what are the new trends, and how the insights gained by this study can enhance future Internet architectures. In particular we structure our study in three main research questions:

1. How is it possible to enable the collaboration among different content delivery stake holders?

2. Which are the trade-offs between peering cost and content delivery performance?

3. Is it possible to push content delivery analytics at the edges or at the core?

During the three years of the projects we were able to tackle these questions and establish collaborations with a number of Universities (MIT, TU Berlin, Yale University, Northwestern University, University Politehnica of Bucharest, Duke University, NTUA, University of Waikato) research Institutes (CAIDA), and corporations (Akamai, Niksun, NEC Labs, AT&T Labs).

Our research was awarded by two best paper awards at ACM IMC 2016 (for our work on "New Perspectives on the Active IPv4 Address Space")
and at ACM CoNEXT 2015 (for our work on "Mapping Peering Interconnections to a Facility"). These conferences are among the most competitive ones in the are of computer networks.

Our results can be summarized as follows:

- Regarding the collaboration among content delivery stakeholders:

(1) We propose IN-NET, an architecture that allows untrusted endpoints as well as content-providers to deploy custom in-network processing to be run on platforms owned by network operators. IN-NET relies on static analysis to allow platforms to check whether the requested processing is safe, and whether it contradicts the operator's policies. We have implemented IN-NET and tested it in the wide-area, supporting a range of use-cases that are difficult to deploy today. Our experience shows that IN-NET is secure, scales to many users (thousands of clients on a single inexpensive server), allows for a wide-range of functionality, and offers benefits to end-users, network operators and content providers alike. We show how IN-NET can be used to deploy a CDN within an ISP operated by thrird-party CDN operators or end-users.

(2) It is well reported that the Internet is flat and that content delivery network now peer at many locations and use new type of peerings, e.g. multilateral peerings. We report on an empirical analysis that is based on a unique collection of IXP-provided datasets from two different European IXPs that operate a route server and gave us access to a wealth of route server-specific BGP data. Both IXPs also made the traffic datasets that they routinely collect from their public switching infrastructures available to us. Using this information, we perform a first-of-its-kind study that correlates a detailed control plane view with a rich data plane view to reason about the different peering options available at these IXPs and how some of the major Internet players make use of them. In the process, we highlight the important role that the IXPs' route servers play for inter-domain routing in today's Internet and demonstrate the benefits of studying IXP peerings in a manner that is not agnostic but fully aware of traffic. We conclude with a discussion of some of the ramifications of our findings for both network researchers and operators.

(3) We introduce a novel approach to characterize inter-domain traffic by reusing large, publicly available traceroute datasets. Our approach builds on a simple insight -- the popularity of a route on the Internet can serve as an informative proxy for the volume of traffic it carries. It applies structural analysis to a dual-representation of the AS-level connectivity graph derived from available traceroute datasets. Drawing analogies with city grids and traffic, it adapts data transformations and metrics of route popularity from urban planning to serve as proxies for traffic volume. We call this approach Network Syntax, highlighting the connection to urban planning Space Syntax. We apply Network Syntax in the context of a global ISP and a large Internet eXchange Point and use ground-truth data to demonstrate the strong correlation (r^2 values of up to 0.9) between inter-domain traffic volume and the different proxy metrics. Working with these two network entities, we show the potential of Network Syntax for identifying critical links and inferring missing traffic matrix measurements. With this we can infer the potential of peering between networks and content providers.

(4) While the performance characteristics of access networks and end-user-to-server paths are well-studied, measuring the performance of the Internet's core remains, largely, an uncharted territory. With more content being moved closer to the end-user, server-to-server paths have increased in length and have a significant role in dictating the quality of services offered by content and service providers. In this paper, we present a large-scale study of the effects of routing changes and congestion on the end-to-end latencies of server-to-server paths in the core of the Internet. We exploit the distributed platform of a large content delivery network, composed of thousands of servers around the globe, to assess the performance characteristics of the Internet's core. We conduct measurement campaigns between thousands of server pairs, in both forward and reverse directions, and analyze the performance characteristics of server-to-server paths over both long durations (months) and short durations (hours). Our analyses show that there is a large variation in the frequency of routing changes. While routing changes typically have marginal or no impact on the end-to-end round-trip times (RTTs), 20% of them impact IPv4 (IPv6) paths by at least 26 ms (31 ms). We highlight how dual-stack servers can be utilized to reduce server-toserver latencies by up to 50 ms. Our results indicate that significant daily oscillations in end-to-end RTTs of serverto-server paths is not the norm, but does occur, and, in most cases, contributes about a 20 ms increase in server-toserver path latencies.

- Regarding the investigation of the trade-offs between peering cost and content delivery performance:

(1) We investigate new trends in IXP deployment that has the potential to improve content delivery performance by installing additional peering points. The recently launched initiative by the Open-IX Association (OIX) to establish the European-style Internet eXchange Point (IXP) model in the US suggests an intriguing strategy to tackle a problem that some Internet stakeholders in the US consider to be detrimental to their business; i.e. a lack of diversity in available peering opportunities. We examine in this paper the cast of Internet stakeholders that are bound to play a critical role in determining the fate of this Open-IX effort. These include the large content and cloud providers, CDNs, Tier-1 ISPs, the well-established and some of the newer commercial datacenter and colocation companies, and the largest IXPs in Europe. In particular, we comment on these different parties' current attitudes with respect to public and private peering and discuss some of the economic arguments that will ultimately determine whether or not the currently pursued strategy by OIX will succeed in achieving the main OIX-articulated goal - a more level playing field for private and public peering in the US such that the actual demand and supply for the different peering opportunities will be reflected in the cost structure.

(2) we revisit the question of the application mix in today's Internet and make two main contributions. First, we develop a methodology for classifying the application mix in packet-sampled traces collected at one of the largest IXPs in Europe and worldwide. We show that our method can classify close to 95% of the traffic by relying on a stateful classification approach that uses payload signatures, communication patterns, and port-based classification only as a fallback. Second, our results show that when viewed from this vantage point and aggregated over all the IXP's public peering links, the Internet's application mix is very similar to that reported in other recent studies that relied on different vantage points, peering links or classification methods. However, the observed aggregate application mix is by no means representative of the application mix seen on individual peering links. In fact, we show that the business type of the ASes that are responsible for much of the IXP's total traffic has a strong influence on the application mix of their overall traffic and of the traffic seen on their major peering links. We assess how much of the traffic in the core of the Internet can be attributed to content delivery platforms.

(3) We report on advanced technologies that are used by CDNs to optimize content delivery and has the potential to enable CDN and ISP collaboration. Although traffic between Web servers and Web browsers is readily apparent to many knowledgeable end users, fewer are aware of the extent of server-to-server Web traffic carried over the public Internet. We refer to the former class of traffic as front-office Internet Web traffic and the latter as back-office Internet Web traffic (or just front-office and back-office traffic, for short). Back-office traffic, which may or may not be triggered by end-user activity, is essential for today's Web as it supports a number of popular but complex Web services including large-scale content delivery, social networking, indexing, searching, advertising, and proxy services. This paper takes a first look at back-office traffic, measuring it from various vantage points, including from within ISPs, IXPs, and CDNs. We describe techniques for identifying back-office traffic based on the roles that this traffic plays in the Web ecosystem. Our measurements show that back-office traffic accounts for a significant fraction not only of core Internet traffic, but also of Web transactions in the terms of requests and responses. Finally, we discuss the implications and opportunities that the presence of back-office traffic presents for the evolution of the Internet ecosystem.

(4) As more large content/cloud and service providers are making colocation facilities that house an Internet eXchange Point (IXP) their location-of-choice for interconnecting with one another in an effort to shrink the physical as well as network distances between where content resides and where it is consumed, the providers of these facilities and/or the operators of the co-located IXPs are reacting by constantly innovating and expanding their interconnection service offerings. These developments directly impact the long-standing public vs private peering debate, and performing a rigorous measurement study that provides a solid understanding of the actual rather than the perceived cost-performance tradeoffs between the different interconnection service offerings that are available to networks in today's Internet in one and the same colocation facility would go a long way towards putting this debate on scientifically solid foundations. In particular, we argue that an important first step towards achieving this goal is to establish a proven set of measurement methods and techniques to infer both the existence and type of the different interconnections that the networks in a colocation facility with an IXP presence have established with one another. In fact, any realistic assessment of applicationlevel performance and user-perceived quality-of-experience of the traffic that is routed through such a facility will rely critically on our ability to infer the existence and usage of the established interconnections in that facility.

(5) We also provide techniques to locate the peering location that will assist us to better evaluate performance and peering trade-offs. Annotating Internet interconnections with robust physical coordinates at the level of a building facilitates network management including interdomain troubleshooting, but also has practical value for helping to locate points of attacks, congestion, or instability on the Internet. But, like most other aspects of Internet interconnection, its geophysical locus is generally not public; the facility used for a given link must be inferred to construct a macroscopic map of peering. We develop a methodology, called constrained facility search, to infer the physical interconnection facility where an interconnection occurs among all possible candidates. We rely on publicly available data about the presence of networks at different facilities, and execute traceroute measurements from more than 8,500 available measurement servers scattered around the world to identify the technical approach used to establish an interconnection. A key insight of our method is that inference of the technical approach for an interconnection sufficiently constrains the number of candidate facilities such that it is often possible to identify the specific facility where a given interconnection occurs. Validation via private communication with operators confirms the accuracy of our method, which outperforms heuristics based on naming schemes and IP geolocation. Our study also reveals the multiple roles that routers play at interconnection facilities; in many cases the same router implements both private interconnections and public peerings, in some cases via multiple Internet exchange points. Our study also sheds light on peering engineering strategies used by different types of networks around the globe

(6) With economists, we provide a progress report on the evolution of content delivery in the Internet. Since the commercialization of the Internet, content and related applications, including video streaming, news, advertisements, and social interaction have moved online. It is broadly recognized that the rise of all of these different types of content (static and dynamic, and increasingly multimedia) has been one of the main forces behind the phenomenal growth of the Internet, and its emergence as essential infrastructure for how individuals across the globe gain access to the content sources they want. To accelerate the delivery of diverse content in the Internet and to provide commercial-grade performance for video delivery and the Web, content delivery networks (CDNs) were introduced. This paper describes the current CDN ecosystem and the forces that have driven its evolution. We outline the different CDN architectures and consider their relative strengths and weaknesses. Our analysis highlights the role of location, the growing complexity of the CDN ecosystem, and its relationship to and the implications for interconnection markets.

- Regarding the collection of data at the core and the edge of the network to improve content delivery:

(1) we developed a Datix, a fully decentralized, open-source analytics system for network traffic data that relies on smart partitioning storage schemes to support fast join algorithms and efficient execution of filtering queries. We outline the architecture and design of Datix and we present the evaluation of Datix using real traces from an operational IXP. Datix is a system that deals with an important problem in the intersection of data management and network monitoring while utilizing state-of-the-art distributed processing engines. In brief, Datix manages to efficiently answer queries within minutes compared to more than 24 hours processing when executing existing Python-based code in single node setups. Datix also achieves nearly 70% speedup compared to baseline query implementations of popular big data analytics engines such as Hive and Shark.

(2) We report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to network restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others. We also discuss how these new perspectives can be utilized to improve content delivery.