CMU-CS-25-126
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-126

Toward Sustainable Datacenters through Efficient Data Retrieval

Sara McAllister

Ph.D. Thesis

August 2025

CMU-CS-25-126.pdf


Keywords: Datacenters, sustainability, storage, caching, flash, hard disk drives

Datacenters are projected to account for 33% of the global carbon emissions by 2050. As datacenters increasingly rely on renewable energy for power, the majority of datacenter emissions will be embodied – emissions from life-cycle stages including acquiring raw materials, manufacturing, transportation, and disposal. To reach the ambitious emission reduction goals set by both companies and governments, datacenters need to reduce emissions throughout their operations, including (and particularly relevant for this thesis) the storage system. Unfortunately, while data storage and retrieval systems are large contributors to embodied emissions, reducing their embodied emissions have largely been overlooked.

This dissertation addresses how to reduce emissions in data retrieval for large-scale storage systems. These storage systems can reduce their carbon footprint by enabling storage devices to have longer lifetimes and use denser media. However, storage hardware's IO limits combined with software's unnecessary additional IO often severely restrict emission reductions, or at worse cause increased emissions. Thus, this thesis focuses on reducing IO in several parts of the storage stack to enable efficient and sustainable data retrieval. First, this dissertation addresses the sustainability of flash caching, a critical layer in datacenter storage systems that is limited by flash write endurance.

This improvement results from two caching systems: Kangaroo and Fairy-WREN. Together, these caches dramatically reduce writes by over 28x, allowing flash devices to use denser flash for longer lifetimes, ultimately reducing emissions. Then, this thesis enables more sustainable bulk storage, where bandwidth limitations prevent deployment of denser HDDs. Declarative IO, a new interface for distributed storage, empowers the storage system to eliminate duplicate IO accesses in maintenance tasks through exposing the time- and order-flexibility in maintenance tasks. This work enables deployment of larger HDDs, further reducing emissions from storage systems.

159 pages

Thesis Committee:
Nathan Beckmann (Co-Chair)
Gregory R. Ganger (Co-Chair)
George Amvrosiadis
Daniel S. Berger (Microsoft Azure / University of Washington)
Margo Seltzer (University of British Columbia)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Creative Commons: CC-BY (Attribution)


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu