CMU-CS-98-119
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-98-119

MindReader: Querying Databases through Multiple Examples

Yoshiharu Ishikawa*, Ravishankar Subramanya**, Christos Faloutos

April 1998

CMU-CS-98-119.ps


Users often can not easily express their queries. For example, in a multimedia/image by content setting, the user might want photographs with sunsets; in current systems, like QBIC, the user has to give a sample query, and to specify the relative importance of color, shape and texture. Even worse, the user might want correlations between attributes, like, for example, in a traditional, medical record database, a medical researcher might want to find "mildly overweight patients", where the implied query would be "weight/height ~ 4 lb/inch."

Our goal is to provide a user-friendly, but theoretically solid method, to handle such queries. We allow the user to give several examples, and optionally, their "goodness" scores, and we propose a novel method to "guess" which attributes are important, which correlations are important, and with what weight.

Our contributions are twofold: (a) we formalize the problem as a minimization problem and show how to solve for the optimal solution, completely avoiding the ad-hoc heuristics of the past. (b) Moreover, we are the first that can handle "diagonal" queries (like the "overweight" query above). Experiments on synthetic and real datasets show that our method estimtes quickly and accurately the "hidden" distance function in the user's mind.

Keywords: Databases, information retrieval, access methods, multimedia


27 pages

*Visiting from Nara Institute of Science and Technology, Japan.
**Pittsburgh Supercomputing Center, Carnegie Mellon
University.


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu