Staying Alive: Connection Path Reselection at the Edge

Authors: 

Raul Landa, Lorenzo Saino, Lennert Buytenhek, and Joao Taveira Araujo, Fastly

Abstract: 

Internet path failure recovery relies on routing protocols, such as BGP. However, routing can take minutes to detect failures and reconverge; in some cases, like partial failures or severe performance degradation, it may never intervene. For large scale network outages, such as those caused by route leaks, bypassing the affected party completely may be the only effective solution.

This paper presents Connection Path Reselection (CPR), a novel system that operates on edge networks such as Content Delivery Networks and edge peering facilities and augments TCP to deliver transparent, scalable, multipath-aware end-to-end path failure recovery.

The key intuition behind it is that edge networks need not rely on BGP to learn of path impairments: they can infer the status of a path by monitoring transport-layer forward progress, and then reroute stalled flows onto healthy paths. Unlike routing protocols such as BGP, CPR operates at the timescale of round-trip times, providing connection recovery in seconds rather than minutes. By delegating routing responsibilities to the edge hosts themselves, CPR achieves per-connection re-routing protection for all destination prefixes without incurring additional costs reconstructing transport protocol state within the network. Unlike previous multipath-aware transport protocols, CPR is unilaterally deployable and has been running in production at a large edge network for over two years.

NSDI '21 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {264997,
author = {Raul Landa and Lorenzo Saino and Lennert Buytenhek and Joao Taveira Araujo},
title = {Staying Alive: Connection Path Reselection at the Edge},
booktitle = {18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)},
year = {2021},
isbn = {978-1-939133-21-2},
pages = {233--251},
url = {https://www.usenix.org/conference/nsdi21/presentation/landa},
publisher = {USENIX Association},
month = apr
}

Presentation Video