Skip to main content

News Agencies Multilingual Information Categorisation

Objective

NAMIC main objective is to develop and bring to marketable stage advanced NLP technologies for multilingual news customisation and broadcasting throughout distributed services. It will develop an intelligent system for multilingual news customisation and broadcasting throughout distributed services within three main user business cases (Financial Times, Ansa and EFE) in three EU languages (English, Spanish and Italian). A specific component of the system will support the management of user profiles defining, at a conceptual level, the information relevant for final customers: this core knowledge will guide a user-driven automatic enrichment where all and only the information relevant for users will be adopted as trigger for monolingual hypertextual linking of texts. Cross-linguistic linking will be supported among news in different languages by content-based alignment methods. An intelligent front end for NL-based querying and hypertextual and multilingual browsing within the resulting news networks will be made available along advanced paradigms. The consortium includes relevant EU research centres in NLP and User Modelling as well as industrial partners, committed to the acquisition and exploitation of advanced technological know-how, as well as information providers playing the role of final users and testers of the developed technology. A specific emphasis has been put on the adoption of international standards in the area of news classification (XML and IPTC).

News Agencies daily distribute news often provided by other News Agencies, usually in different languages and according to different classification standards. Mismatching is at language level, as different languages are used, and at conceptual level, as the organization of news proceeds according to diverging schemes. NAMIC aims to exploit state-of-art language technologies for content-driven news classification, user-driven enrichment (based on the detection of information relevant to the user, via advanced Information Extraction methods for mono and cross-lingual automatic authoring and build a multilingual hypertextual browsing space (i.e. hypernews) from user source repositories.

Work description:
NAMIC will develop an intelligent system for multilingual news customisation and broadcasting throughout distributed services within three main user business cases (ANSA, EFE and Financial Times) in three EU languages (English, Spanish and Italian). A specific component of the system will support the management of user profiles defining, at a conceptual level, the information relevant for final customers. This core knowledge will guide a user-driven automatic enrichment where all and only the information relevant for users will be adopted as trigger for monolingual hypertext linking of texts. Cross-linguistic linking will be supported among news in different languages by content-based alignment methods. An intelligent front end for NL-based querying and hypertextual and multilingual browsing within the resulting news networks will be made available along advanced paradigms.

The NAMIC project duration is planned to be of 24 months, the activities being subdivided into three main phases:
1. Analysis (User an Software requirements, overall architectural design definition; months 1-6).
2. Development (two releases of the software Tools and implementation of the integrated prototype; months 7-20).
3. Pilot validation at the user sites (month 21-24).

The technical and administrative management will be active throughout the whole project life cycle, as are the dissemination and exploitation of the project results.

The work plan has been broken down into eleven work packages.
WP1 and WP2 are devoted to the definition of the user requirements,
WP3 to WP7 are identified to carry out the development of the NAMIC single components,
WP8 focuses on integration of the overall prototype,
WP9 is specifically designed to carry out the validation of the final prototype at user sites,
W10 deals with Exploitation and Dissemination and finally
WP11 is devoted to Project Management.

Milestones:
GM1 - User requirements (month 3)
GM2 - Software Requirements, Research Reports and Overall Architectural Design (Month 6)
GM3 - Release of Linguistic (English, Italian, Spanish) Processors and First Release of the User Profile Manager (Month 10)
GM4 - Release of Automatic Authoring tools, Cross-Linguistic Linker, Hypernews Engine (Month 13)
GM5 - User Validation Feedback First Report (Month 15)
GM6 - Second Release of the Software components (Month 18)
GM7 - Release of NAMIC prototype and Start of the Pilot Validation Phase (Month 20)
GM8 - Final Reports (results of the Validation activities) (Month 24).
1. A set of semantic models, and OSMOS construction specific API.
2. A set of information repositories, toolkits and plug-ins enabling commercial and proprietary systems to participate to the OSMOS virtual enterprise.
3. Two OSMOS teamwork prototype services (Finland, France).
4. A set of process, technical, and business reports.

Coordinator

AGENZIA ANSA S.C.R.A.L.
Address
Via Della Dataria 94
00187 Roma
Italy

Participants (8)

AGENCIA EFE, S.A.
Spain
Address
Calle Esproneda 32
28003 Madrid
COMITE INTERNATIONAL DES TELECOMMUNICATIONS DE PRESSE
United Kingdom
Address
Hq Royal Albert House, Sheet Street 8
SL4 1BE Windsor
FINANCIAL TIMES LIMITED
United Kingdom
Address
One Southwark Bridge
SE1 9HL London
KNOWLEDGE STONES S.P.A.
Italy
Address
Via Cristoforo Colombo 456
00145 Roma
THE UNIVERSITY OF SHEFFIELD
United Kingdom
Address
Firth Court, Western Bank
S10 2TN Sheffield
UNIVERSITA DEGLI STUDI DI ROMA "TOR VERGATA"
Italy
Address
Via Orazio Raimondo 18
00173 Roma
UNIVERSITAT POLITECNICA DE CATALUNYA
Spain
Address
Jordi Girona 31
08034 Barcelona
VRIJE UNIVERSITEIT BRUSSEL
Belgium
Address
Pleinlaan 2
Brussel