Skip to main content

Learning and Testing Structured Probability Distributions

Periodic Report Summary 1 - LTSPD (Learning and Testing Structured Probability Distributions)

We live in an era of "big data," where the amount of data that can be brought to bear on questions of biology, economics, etc, is vast and expanding rapidly. The majority of available data in many domains come in a raw and unlabeled form. This project has been advancing an ambitious research program of developing efficient unsupervised learning algorithms for a wide range of probabilistic models, by bringing together techniques and insights from theoretical computer science, probability theory, and statistics.

The performed research has focused on sublinear-time algorithms, that is, algorithms that run in time that is significantly less than the domain of the underlying distributions. We have developed sublinear-time algorithms for estimating various classes of both continuous and discrete distributions over very large domains. This includes optimal algorithms to estimate probability distributions that satisfy various natural types of "shape restrictions" on the underlying probability density function, sums of simple random variables, and mixtures of structured distributions.

Highly efficient algorithms for these estimation tasks may play an important role for the next generation of large-scale machine learning applications. The Career Integration Grant has greatly facilitated the researcher's career aspirations by allowing him to build solid collaborative relations with the algorithms community in Europe.