Scalable Online Learning Systems

Informazioni relative al progetto

SCOLES

ID dell’accordo di sovvenzione: 42107

Progetto chiuso

Data di avvio 1 Novembre 2006

Data di completamento 31 Ottobre 2008

Finanziato da

Human resources and Mobility in the specific programme for research, technological development and demonstration "Structuring the European Research Area" under the Sixth Framework Programme 2002-2006

Costo totale

Nessun dato

Contributo UE

€ 149 692,00

Coordinato da

FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWANDTEN FORSCHUNG E.V.
Germany

Final Activity Report Summary - SCOLES (Scalable online learning systems)

Classifiers are programs which classify input data to a set of categories. Learning classifiers automatically from examples is subject to the multidisciplinary field of machine learning. The key properties of learning systems are: performance, which depends on how well the used cost function corresponds to the application at hand; scalability, ensuring that memory and time complexity of the learning grows gracefully with data size; and ability to process examples online as they come.

Goals of the project were twofold. First, to develop scalable systems that learn online and use structured (hence more natural) costs. Second, to apply these new learning methods in computer security and bioinformatics.

The main achievements are as follows:
First, we have developed a new 'Linear support vector machine' (SVM) classifier training algorithm. The method, called the optimised cutting plane algorithm for SVMs (OCAS), is based on acceleration of the cutting plane algorithm. In an extensive empirical evaluation, OCAS significantly outperforms current state of the art SVM solvers achieving speedups of over 1,000 on some datasets while obtaining the same precise Support Vector solution. Even in the early optimisation steps, OCAS shows often faster convergence than in this domain prevailing approximative on-line methods. We proved that the computational cost of the OCAS scales gracefully with the number of examples; in particular, the computational cost scales with O(n*log*n) where n is the number of examples. We implemented the proposed methods into an open source library including support for parallel computation.

Second, we developed new methods for supervised learning of so called max-sum classifiers from example data. The max-sum classifier is a general model which subsumes e.g. the structured classifiers based on the Markov Networks. Learning parameters of the max-sum classifiers has been a long standing open problem because even computing the response is NP-complete. We developed polynomial time learning algorithms for two subclasses of the max-sum classifiers whose response can be computed in polynomial time. Moreover, we showed approximative learning problem is possible even for a general max-sum classifier. We demonstrated effectiveness of the proposed algorithms on real-life problems.

Third, we demonstrated that the proposed learning methods are useful in real-life applications including the malware classification (computer security) and the DNA splice site detection (bioinformatics). For instance, we were able to train on a Human splice dataset of size 15 million examples (itself about 32GB in size) in just 671 seconds; a competing state-of-the-art string kernel SVM required 97,484 seconds to train on 10 million examples sub-sampled from the same dataset.

Fourth, we have organised a challenge which was concerned with the scalability and efficiency of existing ML approaches with respect to computational, memory or communication resources. The challenge was designed to allow fair and direct comparison of current large scale classifiers aimed at answering the question "Which learning method is the most accurate given limited resources?" We provided a generic evaluation framework tailored to the specifics of the competing methods and collected a wide range of datasets. The challenge was funded by EU under the PASCAL framework. In addition, we organised a workshop adjoint to the 'International conference on machine learning' (ICML'08) where the challenge results were presented.

Final Activity Report Summary - SCOLES (Scalable online learning systems)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina