Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS
Content archived on 2024-06-18

Sparse Structured Methods for Machine Learning

Final Report Summary - SIERRA (Sparse Structured Methods for Machine Learning)

Sparse structured methods for machine learning

Machine learning is now a core part of many research domains, where the abundance of data has forced researchers to rely on automated information processing. In practice, today, machine learning techniques are applied in two stages: practitioners first build a large set of features; then, off-the-shelf algorithms are used to solve the appropriate prediction tasks, such as classification or regression. While this has led to significant advances in many domains, the potential of machine learning was far from being fulfilled.

The tenet of the SIERRA project was that to achieve the expected breakthroughs, this two-stage paradigm should be replaced by an integrated process where the specific structure of a problem is taken into account explicitly in the learning process. This allows the consideration of massive numbers of features, in both numerically efficient and theoretically well-understood ways. The SIERRA project-team attacked this problem through the tools of regularization by sparsity-inducing norms. The scientific objective was thus to marry structure with sparsity: this was particularly challenging because structure may occur in various ways (discrete, continuous or mixed) and the targeted applications in computer vision and audio processing lead to large-scale convex optimization problems.

Throughout the project, the SIERRA project-team members have been able to provide a general and flexible framework that can formalize what can and cannot be achieved with tools from convex optimization. In particular, given explicit prior knowledge obtained from experts, we have developed an automated way to transform it into a principled stable and efficient convex optimization formulation. These theoretical developments came hand-to-hand with provably efficient optimization algorithms for problems with millions of features or observations. This has already allowed applications to large-scale problems in neuro-imagery (brain reading in functional magnetic resonance imaging), computer vision (image denoising and category-level object recognition) and audio processing (separation of instruments and voice in musical audio tracks).
My booklet 0 0