The design of efficient algorithms and mapping of the `boundary of tractability' have been the major goal of computer science since the dawn of the digital era, and polynomial runtime has been the classical notion of efficiency since then. Over the past few decades, our computational, measurement and storage capabilities have grown exponentially. However, the sizes of datasets to be processed have grown even faster, resulting in the `big data' phenomenon, which has led to a shift of the `boundary of tractability'. Indeed, on modern inputs quadratic, and sometimes even linear time algorithms often become prohibitively expensive. This calls for a new class of techniques with sublinear resource requirements. Specifically, processing large datasets requires algorithms that can compute answers using sublinear runtime, operate under tight restrictions on space (streaming algorithms) and communication (sketching algorithms), or even minimize the number of accesses to the input
(sample complexity). The goal of this project is to design such techniques for fundamental data processing problems, thereby building a solid theoretical foundation for modern data analysis.
This project is focused on three main directions: sublinear time graph algorithms and graph sketching, understanding the limits of robust graph exploration in small space and sparse Fourier transform beyond sparsity. The first of these directions amounts to designing space optimal algorithms for solving processing very large networks (e.g. community detection, clustering, finding matchings). The second direction asks for impossibility results that show that algorithms that we developed in the first direction are optimal. Such impossibility results are an integral part of our goal of `mapping the boundary of tractability', as they show us when the algorithmic results that we have are best possible in our computational models. The last direction asks for very fast algorithms for one of the central tools of data analysis, namely the Fourier transform. Specifically, our goal in this direction is to design techniques for computing the Fourier transform that exploit structural properties of inputs that often occur in practice to obtain fast algorithms.
The project has resulted in several top results on central problems in sublinear algorithms, and has opened several exciting new lines of inquiry that I am sure will continue to drive big data algorithms forward for years to come.