AVENTINUS will underpin European police and government law-enforcement in the drugs field. It will enhance existing information systems for prevention and detection of offences, also providing multilingual facilities. This user-friendly application will support collection and analysis of composite data, from a variety of sources and in several languages, for interrogation by investigators in their own language in which the search result will be profiled and presented. It will integrate information indexing and extraction over distributed sources, accurate information retrieval and multilingual and multimedia data handling techniques.
Progress on the Project
The user requirement study has established that police and intelligence units need to be supported in a number of ways:
- rough translation of foreign language documents to assess if they are worth accurate translation
- assistance in processing documents so as to assimilate relevant information into local systems
- aid to analysts in formulating searches, by offering additional search terms to expand a query and through natural language access to structured and textual databases
- access to foreign language databases by translating the search request and re-translating the texts found
To provide this support, AVENTINUS focuses on three main areas:
- translation support using term substitution components, translation memory technology, and machine translation integrated.
- information processing support with both information extraction (fact extraction) and intelligent indexing. Information extraction finds entities related to illegal activities (e.g. names of suspect organisations and individuals, narcotics, places, etc.), composing templates and scenarios, to be recorded in structured databases. Intelligent indexing identifies and links key words and phrases to textual documents.
- search support by searching names (dealing with problems of name similarity, transliteration etc.
These components have been specified in close co-operation with the users. Prototypes and a common user interface now exist for most of them. Most low level tools are in place; linguistic resources for the drug domain have been collected by the users and are being reviewed and completed.
Key Technical Areas
Translation Tools have been identified:
- the GMS T1 machine translation system, initially covering de<->en and es<->en translations. Easy import of AVENTINUS terminology will be supported. Raw machine translation gives the best comprehension of an incoming text.
- the ILSP PC-TM translation memory, enriched by named entity recognition modules, and used for translating of standardised texts, like police reports; memory technology will achieve good quality translation in such cases.
- a term substitution component which indicates native language terms in a foreign language document. This will be used in languages where machine translation is not currently available.
These tools cover English, Spanish, German, and some Swedish.
Information processing tools have been identified:
- an open architecture, GATE, has been created for developing the information extraction components (recognition of names of persons, organisations, places, narcotics etc., and their combination into templates and scenarios), in a multi-lingual environment
- intelligent indexing and retrieval components to support native language searches on foreign language data (both structured and unstructured), as well as post-search tools (passage retrieval, ranking, re-translation of relevant items)
User Group, Promotion and Awareness
The Aventinus User Group (AVUS), consisting of members of the intelligence community and of several police organisations, has contributed to project progress:
- the user requirements report, including the definition of the two scenarios just presented
- collection of a data pool, comprising texts, terminology, bitmaps, and other material, for training and test purposes
- definition of contextual aspects, such as legal aspects and security issues which are relevant to the integration of AVENTINUS modules into the users' proprietary systems
- active participation in test and evaluation planning
AVENTINUS has been presented at several police and scientific conferences as well as internally in the German Bundeskanzleramt, in Europol, and in the European Commission. A series of presentations is planned for next year, to police and intelligence conferences in Europe, the US and South East Asia.
The Way Ahead
In the short term the project will:
- completing the AVENTINUS specifications
- extending the user group to widen the range of languages supported; the user requirements report identifies the need for many more languages than the current ones to be supported, including Arabic, Farsi, Russian, and Chinese
- researching contextual issues, such as messaging, intelligence systems, and security aspects.
The main tasks for the future will be integration, testing and documentation to produce the specified toolkit. The tools will then be installed at user sites for training and evaluation.
In the particular application targeted, users will be governmental institutions which define the requirements, and later on evaluate the results, in the specific application area. In later phases, an extension to other domains and other user groups is envisaged.
There is a public need in this technology in the first place to improve existing workflows. However, there is a large demand in multilingual text and information proessing, given the fact that worldwide integration forces business to become global. Aventinus will provide the basic building blocks for such a market.
The project aims at setting up a system which allows users to query information bases of different types (structured, textual, multimedia) and of different languages in their native language. The system supports the search request formulation by offering help tools (alternative search terms, variants of names, etc.). The search is then decomposed and routed to the respective information sources; this step implies a translation of the search request. The search result is recomposed and retranslated (using translation memory or automatic translation tools). Display is performed in the users' native language again.
Technologies used will be the following: a object oriented database system, text retrieval technologies, fact extraction technologies, machine translation and translation memory techniques, and multilingual terminology facilities. These technologies will run in a client server environment, with PC frontends.
The project itself will concentrate on the improvement of drug enforcement, by allowing users to combine multilingual information sources, and international databases if needed. In later stages, other applications (in the field of medical services, office solutions etc.) will be possible as well.
Progress and results
Result of the project will be a version of the system which serves as an input for the users to migrate their existing environment into an improved system like Aventinus.
There will be preparatory deliverables (user requirements analysis, system and data analysis, technology evaluation) which lead to a complete system specification. This specification will be used for implementation and evaluation / test, and lead to a respective report, written by the users, with input for further development.
Aventinus will create a first mockup system for the user interface, and a demonstrator serving as object for tests and evaluation. This demonstrator will be available at some of the users' sites and be used as a basis for further developments.
Funding SchemeCSC - Cost-sharing contracts
S1 4DP Sheffield