A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the Bloom filter, which compactly represents a set of items, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. Yet the full potential of summaries is far from being fully realized.
The Principal Investigator will lead a team, working on important problems around creating Small Summaries for Big Data. The goal is to substantially advance the state of the art in data summarization, to the point where accurate and effective summaries are available for a wide array of problems, and can be used seamlessly in applications that process big data. Several directions will be pursued, including: designing and evaluating new summaries for fundamental computations such as tracking the data distribution; summary techniques for complex structures, such as massive matrices, massive graphs, and beyond; and summaries that allow the verification of outsourced computation over big data. Success in any one of these areas could lead to substantial impact on practice, as evidenced by the influence of existing summary
Support in the form of a five-year research grant will allow the PI to consolidate his research in this area, and build an expert team to focus on these challenging algorithmic questions.
Field of science
- /natural sciences/physical sciences/theoretical physics/particle physics
- /natural sciences/computer and information sciences/data science/data processing
- /engineering and technology/civil engineering/urban engineering/smart city
- /natural sciences/computer and information sciences/data science/big data
Call for proposal
See other projects for this call