CMU-ISR-09-105
Institute for Software Research
School of Computer Science, Carnegie Mellon University



CMU-ISR-09-105

Topes: Enabling End-User Programmers
to Validate and Reformat Data

Christopher Scaffidi

May 2009

Ph.D. Thesis
Software Engineering

CMU-ISR-09-105.pdf


Keywords: End-user software engineering, end-user programming, data validation, dependability, assertions, data reformatting, data formats, data consistency, web macros, web applications, spreadhsheets, programming by demonstration


Millions of people rely on software for help with everyday tasks. For example, a teacher might create a spreadsheet to compute grades, and a human resources worker might create a web form to collect contact information from co-workers.

Yet, too often, software applications offer poor support for automating certain activities, which people must do manually. In particular, many tasks require validating and reformatting short human-readable strings drawn from categories such as company names and employee ID numbers. These string-containing categories have three traits that existing applications do not reflect. First, each category can be multi-format in that each of its instances can be written several different ways. Second, each category can include questionable values that are unusual yet still valid. During user tasks, such strings often are worthy of double-checking, as they are neither obviously valid nor obviously invalid. Third, each category is application-agnostic in that its rules for validating and reformatting strings are not specific to one software application–rather, its rules are agreed upon implicitly or explicitly by members of an organization or society.

For example, a web form might have a field for entering Carnegie Mellon office phone numbers like "8-3564" or "412-268-3564". Current web form design tools offer no convenient way to create code for putting strings into a consistent format, nor do they help users create code to detect inputs that are unusual but maybe valid, such as "7-3564" (since our office phone numbers rarely start with "7"). In order to help users with their tasks, this dissertation presents a new kind of abstraction called a "tope" and a supporting development environment. Each tope describes how to validate and reformat instances of a data category. Topes are sufficiently expressive for creating useful, accurate rules for validating and reformatting a wide range of data categories commonly encountered by end users. By creating and applying topes, end users can validate and reformat strings more quickly and effectively than they can with currently-practiced techniques. Tope implementations are reusable across applications and by different people, highlighting the leverage provided by end-user programming research aimed at developing new kinds of application-agnostic abstractions. The topes model demonstrates that such abstractions can be successful if they model a shallow level of semantics, thereby retaining usability without sacrificing usefulness for supporting users' real-world goals.

259 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu