![]() |
CMU-HCII-25-102 Human-Computer Interaction Institute School of Computer Science, Carnegie Mellon University
Intreactive Data Profiling Will Epperson June 2025 Ph.D. Thesis
This thesis develops systems for Interactive Data Profiling that accelerate data exploration through a fast feedback loop between interactive interfaces and data programming workflows. We first motivate this problem through a large-scale interview study and survey of data scientists that reveals the potential for tools to help users manage the repetitive code used for data profiling. We then discuss the design, implementation, and evaluation of three systems that develop the approach of interactive data profiling. First, we describe AUTOPROFILER, a system that augments programming environments with automatic data profiles that show summaries of the data in memory and update as a user programs. We then extend this approach with SOLAS which tracks the history of a user's analysis code to create data profiles adapted to the current task and user interest. User evaluations demonstrate how the lightweight visualizations and fast feedback loops enabled by these systems help users quickly identify important patterns and data quality issues. Finally, we present TEXTURE, a general-purpose text exploration tool that enables users to iterate on attributes for describing their text and then explore results in the interactive UI. Expert user studies show how TEXTURE enables more efficient exploration and helps users uncover new insights from their text datasets. Together, these tools establish how to situate interactive data profiling within data science workflows to enable a fast feedback loop between manipulating data and inspecting the results. As data remains an increasingly important component of modern work, interactive data profiling systems can play a critical role in enabling faster, more reliable understanding of the data behind models and decisions.
120 pages
Brad A. Myers, Head, Human-Computer Interaction Institute
|
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |