Main-memory Database Systems, Programming Languages, Computer Architecture
Main-memory database systems waste valuable execution time because the processor waits for data to be fetched from the main memory instead of executing independent work from the many other tasks present in database workloads.
My current work focuses on hiding unavoidable cache misses through cooperative multitasking. I leverage language and compiler support to efficiently interleave the execution of parallel tasks and overlap memory accesses and computation across them. With such interleaved execution, we can eliminate the majority of memory stalls in latency-bound operations like index joins, leading to drastic performance improvements.
This work is done in collaboration with SAP SE.
|Ph.D.||Computer Science||Ecole Polytechnique Fédérale de Lausanne (EPFL)||Ongoing|
|M.Sc.||Electrical and Computer Engineering||National Technical University of Athens (NTUA)||July 2013|
Non-Volatile Memory (NVM) technologies exhibit 4× the read access latency of conventional DRAM. When the working set does not fit in the processor cache, this latency gap between DRAM and NVM leads to more than 2× runtime increase for queries dominated by latency-bound operations such as index joins and tuple reconstruction. We explain how to easily hide NVM latency by interleaving the execution of parallel work in index joins and tuple reconstruction using coroutines. Our evaluation shows that interleaving applied to the non-trivial implementations of these two operations in a production-grade codebase accelerates end-to-end query runtimes on both NVM and DRAM by up to 1.7× and 2.6× respectively, thereby reducing the performance difference between DRAM and NVM by more than 60%.
Index joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be interleaved, so that, when a cache miss occurs in one task, the processor executes independent instructions from another one. Yet, the literature provides no concrete performance model for such interleaved execution and, more importantly, production systems still waste processor cycles on cache misses because (a) hardware and compiler limitations prohibit automatic task interleaving and (b) existing techniques that involve the programmer produce unmaintainable code and are thus avoided in practice. In this paper, we address these shortcomings: we model interleaved execution explaining how to estimate the speedup of any interleaving technique, and we propose interleaving with coroutines, i.e., functions that suspend their execution for later resumption. We deploy coroutines on index joins running in SAP HANA and show that interleaving with coroutines performs like other state-of-the-art techniques, retains close resemblance to the original code, and supports both interleaved and non-interleaved execution in the same implementation. Thus, we establish the first systematic and practical approach for interleaving index joins of any type.
Index join performance is determined by the efficiency of the lookup operation on the involved index. Although database indexes are highly optimized to leverage processor caches, main memory accesses inevitably increase lookup runtime when the index outsizes the last-level cache; hence, index join performance drops. Still, robust index join performance becomes possible with instruction stream interleaving: given a group of lookups, we can hide cache misses in one lookup with instructions from other lookups by switching among their respective instruction streams upon a cache miss. In this paper, we propose interleaving with coroutines for any type of index join. We showcase our proposal on SAP HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, we claim that coroutines make interleaving practical for use in real DBMS codebases.