Computer Science Department
School of Computer Science, Carnegie Mellon University


Locality in Search Engine Queries and Its Implications for Caching

Yinglian Xie, David O'Hallaron

May 2001

Keywords: Search engines, query locality, caching, trace analysis, user lexicon, prefetching

Caching is a popular technique for reducing both server load and user response time in distributed systems. In this paper, we are interested in the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis results show that: (1) Queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. (2) If caching is to be done at the user side, short-term caching for hours will be enough to cover query temporal locality, while server/proxy caching should be based on longer periods such as days. (3) Most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on user lexicon looks promising.

21 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by