Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano it
CORDIS - Risultati della ricerca dell’UE
CORDIS
Contenuto archiviato il 2024-05-29

Scalable Online Learning Systems

Final Activity Report Summary - SCOLES (Scalable online learning systems)

Classifiers are programs which classify input data to a set of categories. Learning classifiers automatically from examples is subject to the multidisciplinary field of machine learning. The key properties of learning systems are: performance, which depends on how well the used cost function corresponds to the application at hand; scalability, ensuring that memory and time complexity of the learning grows gracefully with data size; and ability to process examples online as they come.

Goals of the project were twofold. First, to develop scalable systems that learn online and use structured (hence more natural) costs. Second, to apply these new learning methods in computer security and bioinformatics.

The main achievements are as follows:
First, we have developed a new 'Linear support vector machine' (SVM) classifier training algorithm. The method, called the optimised cutting plane algorithm for SVMs (OCAS), is based on acceleration of the cutting plane algorithm. In an extensive empirical evaluation, OCAS significantly outperforms current state of the art SVM solvers achieving speedups of over 1,000 on some datasets while obtaining the same precise Support Vector solution. Even in the early optimisation steps, OCAS shows often faster convergence than in this domain prevailing approximative on-line methods. We proved that the computational cost of the OCAS scales gracefully with the number of examples; in particular, the computational cost scales with O(n*log*n) where n is the number of examples. We implemented the proposed methods into an open source library including support for parallel computation.

Second, we developed new methods for supervised learning of so called max-sum classifiers from example data. The max-sum classifier is a general model which subsumes e.g. the structured classifiers based on the Markov Networks. Learning parameters of the max-sum classifiers has been a long standing open problem because even computing the response is NP-complete. We developed polynomial time learning algorithms for two subclasses of the max-sum classifiers whose response can be computed in polynomial time. Moreover, we showed approximative learning problem is possible even for a general max-sum classifier. We demonstrated effectiveness of the proposed algorithms on real-life problems.

Third, we demonstrated that the proposed learning methods are useful in real-life applications including the malware classification (computer security) and the DNA splice site detection (bioinformatics). For instance, we were able to train on a Human splice dataset of size 15 million examples (itself about 32GB in size) in just 671 seconds; a competing state-of-the-art string kernel SVM required 97,484 seconds to train on 10 million examples sub-sampled from the same dataset.

Fourth, we have organised a challenge which was concerned with the scalability and efficiency of existing ML approaches with respect to computational, memory or communication resources. The challenge was designed to allow fair and direct comparison of current large scale classifiers aimed at answering the question "Which learning method is the most accurate given limited resources?" We provided a generic evaluation framework tailored to the specifics of the competing methods and collected a wide range of datasets. The challenge was funded by EU under the PASCAL framework. In addition, we organised a workshop adjoint to the 'International conference on machine learning' (ICML'08) where the challenge results were presented.
Il mio fascicolo 0 0