During the past few years, XML has become the dominant format for storing and exchanging information on the Internet. XML is often used to represent large text data sets, such as scientific corpora, repositories of Web pages, or streams of stock quotes. Processing large XML data sets efficiently has thus become one of the major challenges that researchers at the database, information retrieval, and WWW communities face today.
This proposal focuses on three issues at the forefront of the XML research at the database community:
1) Evaluation of queries over XML streams.
2) Evaluation of queries over indexed XML data sets.
3) Fast approximate evaluation of queries over XML data sets.
The goals of the proposed project are three fold:
1) Develop the first theoretical and systematic framework of lower bounds on the amount of resources needed to accomplish the above tasks.
2) Exploit insights gained from the theoretical study to design more efficient and comprehensive algorithms that solve the above problems.
3) Build an experimental system to test the proposed algorithms on real and artificial data.
During the course of working on the project, I plan to continue existing collaborations in the area with researchers from the IBM Research Centre in California as well as to bring along new collaborators from the Technion whose areas of interest overlap the subject of the project. I plan to leverage on the expertise of my colleagues at the Technion in the areas of communication complexity, database, and information theory in order to obtain high quality results in this project.
Field of science
- /natural sciences/computer and information sciences/databases
- /natural sciences/computer and information sciences/data science/data mining
Call for proposal
See other projects for this call