CORDIS
EU research results

CORDIS

English EN
Cross-lingual Knowledge Extraction

Cross-lingual Knowledge Extraction

Objective

The goal of the X-LIKE project is to develop technology to monitor and aggregate knowledge that is currently spread across global mainstream and social media, and to enable cross-lingual services for publishers, media monitoring and business intelligence.In terms of research contributions, the aim is to combine scientific insights from several scientific areas to contribute in the area of cross-lingual text understanding. By combining modern computational linguistics, machine learning, text mining and semantic technologies we plan to deal with the following two key open research problems:- to extract and integrate formal knowledge from multilingual texts with cross-lingual knowledge bases, and- to adapt linguistic techniques and crowdsourcing to deal with irregularities in informal language used primarily in social media.As an interlingua, knowledge resources from Linked Open Data cloud (http://linkeddata.org/) will be used with special focus on general common sense knowledge base CycKB (http://www.cyc.com/). For the languages where no required linguistic resources will be available, we will use a probabilistic interlingua representation trained from a comparable corpus drawn from the Wikipedia.The solution will be applied on two case studies, both from the area of news. For the Bloomberg case study the domain will be financial news, while for the Slovenian Press Agency we will deal with general news. The technology developed in the project will be used to introduce cross-lingual and information from social media in services for publishers and end-users in the area of summarization, contextualization, personalization, and plagiarism detection. Special attention will be paid to analysing news reporting bias from multilingual sources. The developed technology will be language-agnostic, while within the project we will specifically address English, German, Spanish, and Chinese as major world languages and Catalan and Slovenian as minority languages.

Coordinator

INSTITUT JOZEF STEFAN

Address

Jamova 39
1000 Ljubljana

Slovenia

Activity type

Higher or Secondary Education Establishments

EU Contribution

€ 989 487

Administrative Contact

JADRAN LENARCIC (Prof.)

Participants (7)

Sort alphabetically

Sort by EU Contribution

Expand all

TSINGHUA UNIVERSITY

China

EU Contribution

€ 193 599

KARLSRUHER INSTITUT FUER TECHNOLOGIE

Germany

EU Contribution

€ 670 366

INTELLIGENT SOFTWARE COMPONENTS S.A.

Spain

EU Contribution

€ 368 820

UNIVERSITAT POLITECNICA DE CATALUNYA

Spain

EU Contribution

€ 504 057

SVEUCILISTE U ZAGREBU

Croatia

EU Contribution

€ 334 800

SLOVENSKA TISKOVNA AGENCIJA DOO

Slovenia

EU Contribution

€ 205 201

BLOOMBERG LP

United States

EU Contribution

€ 283 670

Project information

Grant agreement ID: 288342

Status

Closed project

  • Start date

    1 January 2012

  • End date

    31 December 2014

Funded under:

FP7-ICT

  • Overall budget:

    € 4 745 468

  • EU contribution

    € 3 550 000

Coordinated by:

INSTITUT JOZEF STEFAN

Slovenia