Scalable Similarity Search

Project Information

SSS

Grant agreement ID: 614331

Project closed

Start date 1 May 2014

End date 30 April 2019

Funded under

Specific programme: "Ideas" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)

Total cost

€ 1 889 711,73

EU contribution

€ 1 889 711,73

1 889 711,73

Coordinated by

IT-UNIVERSITETET I KOBENHAVN
Denmark

Objective

Similarity search is the task of identifying, in a collection of items, the ones that are “similar” to a given
query item. This task has a range of important applications (e.g. in information retrieval, pattern
recognition, statistics, and machine learning) where data sets are often big, high dimensional, and
possibly noisy. State-of-the-art methods for similarity search offer only weak guarantees when faced with
big data. Either the space overhead is excessive (1000s of times larger than the space for the data itself),
or the work needed to report the similar items may be comparable to the work needed to go through all
items (even if just a tiny fraction of the items are similar). As a result, many applications have to resort to
the use of ad-hoc solutions with only weak theoretical guarantees.

This proposal aims at strengthening the theoretical foundation of scalable similarity search, and
developing novel practical similarity search methods backed by theory. In particular we will:

- Leverage new types of embeddings that are kernelized, asymmetric, and complex-valued.

- Consider statistical models of noise in data, and design similarity search data structures whose
performance guarantees are phrased in statistical terms.

- Build a new theory of the communication complexity of distributed, dynamic similarity search,
emphasizing the communication bottleneck present in modern computing infrastructures.

The objective is to produce new methods for similarity search that are: 1) Provably robust, 2) scalable
to large and high-dimensional data sets, 3) substantially more resource efficient than current state-ofthe-
art solutions, and 4) able to provide statistical guarantees on query answers.

The study of similarity search has been an incubator for techniques (e.g. locality-sensitive hashing and
random projections) that have wide-ranging applications. The new techniques developed in this project
are likely to have significant impacts beyond similarity search.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP7-IDEAS-ERC - Specific programme: "Ideas" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

ERC-CG-2013-PE6 - ERC Consolidator Grant - Computer Science and Informatics

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

ERC-2013-CoG
See other projects for this call

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

ERC-CG - ERC Consolidator Grants

Host institution

IT-UNIVERSITETET I KOBENHAVN

EU contribution

€ 1 889 711,73

Address

RUED LANGGAARDSVEJ 7
2300 KOBENHAVN
Denmark

Region

Danmark Hovedstaden Byen København

Activity type

Higher or Secondary Education Establishments

Links

Contact the organisation Website

Participation in EU R&I programmes

HORIZON collaboration network

Total cost

No data

Beneficiaries (1)

IT-UNIVERSITETET I KOBENHAVN

Denmark

EU contribution

€ 1 889 711,73

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Host institution

Beneficiaries (1)

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.