CMU-CS-15-132
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-15-132

Resource-Efficient Data-Intensive System
Designs for High Performance and Capacity

Hyeontaek Lim

September 2015

Ph.D. Thesis

CMU-CS-15-132.pdf


Keywords: Log Structure, Hash Table, Indexing, Concurrent Data Structure, I/O, Remote Procedure Call

Data-intensive systems are a critical building block for today's large-scale Internet services. These systems have enabled high throughput and capacity, reaching billions of requests per second for trillions of items in a single storage cluster. However, many systems are surprisingly inefficient; for instance, memcached, a widely-used in-memory key-value store system, handles 1–2 million requests per second on a modern server node, whereas an optimized software system could achieve over 70 million requests per second using the same hardware. Reducing such inefficiencies can improve the cost effectiveness of the systems significantly.

This dissertation shows that by leveraging modern hardware and exploiting workload characteristics, data-intensive storage systems that process a large amount of fine-grained data can achieve an order of magnitude higher performance and capacity than prior systems that are built for generic hardware and workloads. As examples, we present SILT and MICA, which are resource-efficient key-value stores for flash and memory. SILT provides flash-speed query processing and 5.7X higher capacity than the previous state-of-the-art system. It employs new memory-efficient indexing schemes including ECT that requires only 2.5 bits per item in memory, and a system cost model built upon new accurate and fast analytic primitives to find workloadspecific system configurations. MICA offers 4X higher throughput over the network than previous in-memory key-value store systems by performing efficient parallel request processing on multi-core processors and low-overhead request direction with modern network interface cards, and by using new key-value data structures designed for specific workload types.

138 pages

Thesis Committee:
David G. Andersen (Chair)
Michael Kaminsky (Intel Labs)
Andrew Pavlo
Eddie Kohler (Harvard University)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu