Skip to main content

Trading Quantity for Quality in Machine Learning

Final Report Summary - QQML (Trading Quantity for Quality in Machine Learning)

Machine learning was born in an era when most datasets were small, low-dimensional, and used carefully hand-crafted features. However, recent years have seen a dramatic change in the nature of typical machine learning tasks: These are now routinely performed on huge, web-scale datasets, with data quantity no longer being a major bottleneck. On the flip side, the large-scale and automated data-gathering methods used to create such massive datasets often go hand-in-hand with mediocre quality of individual data items. This data quality problem can hamper standard learning algorithms, despite the availability of more data. A related issue is the quality of available features: with more data, we are in a position to tackle harder tasks - particularly in AI-related areas such as computer vision and natural language processing. However, it is also becoming increasing hard to hand-craft good features for such tasks, and much recent research is devoted to automatically learn higher-quality, multi-level representations of the data.

The objective of this project is to study how increasing data quantity can be used to improve or compensate for poor data quality, provably and efficiently. In particular, our goal is to study how to use large-scale, low-quality datasets, to achieve the same learning performance as if we had a high-quality, yet more moderately sized dataset. We plan to explore several important settings where we believe such a trade-off can be obtained, using a theoretically principled approach. These include (1) Learning deep data representations, which capture complex and high-level features; (2) Learning from incomplete data, where some or even most of the data is missing; and (3) bandit learning and optimization, which capture learning and decision making under uncertainty. Our research plan builds on concrete preliminary results and several novel ideas, which are outlined as part of the proposal.

With the successful termination of the project, we have made the following contributions:
- In the context of deep representations and deep learning, we provided new results on how to efficiently train neural networks, which is an enormously-popular yet poorly-misunderstood class of learning algorithms. We also did pioneering work on the benefits of depth, provably showing how certain functions require deep networks in order to be represented in a compact manner. Moreover, we showed how this applies to natural functions, which are of practical relevance and can be learned using existing methods. In addition, we identified and studied natural situations where deep learning actually does not work well, as well as potential remedies. We also delved into related important problems involving non-convex optimization, such as PCA, developing fundamentally new algorithms.
- In the context of learning with incomplete data, we developed new algorithms to tackle learning with missing features at training time; learning from data distributed across several
machines; and handling partial information in the context of kernel learning and spectral learning. More recently, we developed a novel approach and analysis for distributed learning, which simultaneously achieve near-optimal runtime, communication costs and sample complexity, at least for certain learning problems. The approach is based on without-replacement of data instances, and its analysis is of independent interest in the context of stochastic gradient methods. Finally, a few interesting off-shoot of our distributed learning work has been the first ever lower bounds for second-order methods in convex optimization, as well as tight lower bounds for learning linear predictors with the squared loss.
- In the context of bandit learning and optimization, we made several contributions to the theory of this field, characterizing the attainable performance limits, and disproving some well-known conjectures and commonly-held beliefs. More recently, we introduced additional, practically motivated partial information settings settings, such as multi-armed bandits with multiple users, bandit learning with smooth losses across arms, as well online learning where the learner can leverage the ability to control the order of predictions which she is required to provide, in order to acquire more information on future examples. We also studied the setting of bandits with two-point feedback, obtaining simpler and essentially tight results compared to previous work.

Overall, the performed research provides a well-rounded contribution to the theory and practice of learning in the face of low-quality, complex and partial data. Since such problems are ubiquitous in data analysis and machine learning, these can have high impact.