Keynote talk: Adaptive Resource Allocation in the Cloud


Srikanth Kandula, Microsoft Research, Redmond

Carefully allocating resources can improve throughput, lower latency and offer more predictable service. In this talk, I will present three recent examples and point out future directions.

With SWAN, we show that given responsive networks and responsive applications adapting who gets to send how much, when, and along which network paths can improve network utilization without losing out on business priorities. We show how SWAN can be incorporated into the wide-area network of enterprises that have a global datacenter footprint. With Kwiken, we show how to improve the tail latency of datacenter services which are built as workflows over many components by appropriately allocating additional resources across the various stages in the workflow. Interestingly, we also cast incompleteness (i.e., returning partial results) as a resource and show that small amounts of incompleteness can improve latency by a lot. Finally, with RoPE, we show how execution plans for jobs in big data clusters can improve given additional information about properties of the user code, data and how the code and data interact. We also describe a system that extracts such properties at scale.