Computer Science Department
School of Computer Science, Carnegie Mellon University
Staged Database Systems
Keywords:Database Management Systems (DBMS), query execution,
query pipelining, operator parallelism, cache performance, instruction
cache misses, thread scheduling, Online Transaction Processing (OLTP),
Decision Support Systems (DSS).
Advances in computer architecture research yield increasingly powerful
processors which can execute code at a much faster pace than they can
access data in the memory hierarchy. Database management systems
(DBMS), due to their intensive data processing nature, are in the
front line of commercial applications which cannot harness the
available computing power. To prevent processors from idling, a multitude
of hardware mechanisms and software optimizations have been proposed.
Their effectiveness, however, is limited by the sheer volume of data
accessed and by the unpredictable sequence of memory requests. This
Ph.D. dissertation introduces Staged Database Systems, a new software
architecture for optimizing data and instruction locality at all levels
of the memory hierarchy. The key idea is to break database request
execution in stages and process a group of subrequests at each stage.
Group processing at each stage allows for a context-aware execution
sequence of requests that promotes reusability of both instructions
and data. The Staged Database System design requires only a small number
of changes to the existing DBMS codebase and provides a new set of
execution primitives that allow software to gain increased control
over what data and instructions are accessed, when, and by which
requests. The central thesis is the following:
"By organizing and assigning system components into self-contained
stages, database systems can exploit instruction and data commonality
across concurrent requests thereby improving performance."