CMU-ML-09-100
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-09-100

VeWra: An Algorithm for Wrapper Verification

Charalampos R. Tsourakakis, Georgios Paliouras*

March 2009

CMU-ML-09-100.pdf


Keywords: Wrapper verification, wrapper maintenance, web wrappers


Web wrappers play an important role in extracting information from distributed web sources and subsequently in the integration of heterogeneous data. Changes in the layout of web sources typically break the wrapper, leading to erroneous extraction of infomation. Monitoring and repairing broken wrappers is an important hurdle for data integration, since it is an expensive and painful procedure. In this paper we present VEWRA, a new approach to wrapper verification, which improves the successful family of trainable content - based methods. Compared to its predecessors, the new method aims to capture not only the syntactic patterns but the correlations that exist among them due to the underlying semantics of the extracted information. Experiments show that our method achieves excellent performance, being always better or equal than DATAPROG, the state-of-art related work.

28 pages

*Institute of Informatics & Telecommunications, NCSR "Demokritos", 15210, Ag. Paraksevi, Attiki, Greece


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu