CMU-CS-04-133
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-04-133

Helping Everyday Users Find Anomalies in Data Feeds

Orna Raz

April 2004

Ph.D. Thesis
(Software Engineering)

Also appears as Institute for Software Research International
Technical Report CMU-ISRI-04-119

CMU-CS-04-133.ps
CMU-CS-04-133.pdf


Keywords: Semantic anomaly detection, user expectations, everyday information systems, data feeds


Much of the software people use for everyday purposes incorporates elements developed and maintained by someone other than the developer. These elements include not only code and databases but also data feeds. Although everyday information systems are not mission critical, they must be dependable enough for practical use. This is limited by the dependability of the incorporated elements.

It is particularly difficult to evaluate the dependability of data feeds. The specifications of data feeds are often even sketchier than the specifications of software components, the data feeds may be changed by their proprietors, and everyday users of data feeds only have enough knowledge about the application domain to support their own usage. These factors inhibit many dependability enhancement techniques, which require a model of proper behavior for failure detection, preferably in the form of specifications.

The research presented here addresses this problem by providing CUES, Checking User Expectations about Semantics. CUES is a method and a prototype implementation for making user expectations precise and for checking these precise expectations. CUES treats the precise expectations as a proxy for missing specifications. It checks the precise expectations to detect semantic anomalies---data feed behavior that does not adhere to these expectations. Three case studies and a validation study, all with real-world data, provide evidence of the practicality and usefulness of CUES. The case studies and the validation study indicate that a user of CUES gets substantial benefit for a modest investment of time and effort. In addition to automated detection of anomalies, the benefit often includes a better understanding of the user's own expectations, of the data feeds, and of existing and missing documentation.

164 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu