Objective
MULINEX addresses problems encountered by providers and seekers of multilingual information on. the WWW. For information providers, Mulinex will provide translation, content structuring, and multilingual indexation tools which are intended to lower the threshold at which the provision of multilingual content becomes economically feasible. For information seekers, the project will provide advanced multilingual facilities for access, navigation, browsing and filtering of multilingual information. End users of the application will require only standard WWW browsers. The system overall will be realised as a group of interacting tools which provide the necessary functionalities, including the retrieval of documents in different languages with a monolingual query and integrated machine-aided translation facilities. User needs will be analysed on the basis of current and planned multilingual web offers by the two major WWW service providers in the consortium.
There has recently been a dramatic change towards commercial applications on the Web which has given rise to an enormous volume of information. This has become almost unmanageable for the information seeker despite existing web search engines. Tools such as Alta Vista, Lycos and Yahoo are used by many people as their primary navigation aid. Since the language of the Web is still predominantly English, these tools mostly cater for monolingual searches. However, the number of documents in other languages is increasing, together with the number of users who are non-native English speakers with varying levels of English language skills and knowledge. There are currently no adequate tools for managing multi-lingual document collections on the Web which allow documents to be presented in the language preferred by the user, and which support search and navigation in several languages.
MULINEX is responding to this need by developing a leading edge Internet based application that provides selective information access, navigation and browsing in a multi-lingual environment. The system will enable documents in different languages to be retrieved with one monolingual query, and will search using a combination of keywords, phrases and concepts, and provide tools for the management of multi-lingual web sites by supporting the creation, indexing and linking of multi-lingual documents. The system will run entirely on a web server so that the user needs only a standard web browser, with support for open web standards.
Actual and potential users of the application are publishers, web content creators and information providers who want to create and manage multi-lingual document collections, and/or multi-lingual search ability for their site. They are also providers of Internet access and search services, and their customers or end users, who want to benefit from the added value of multi-lingual search and navigation facilities. Another user segment is made up of companies and institutions which need facilities for searching in large multi-lingual document collections.
A survey in April 1996 found that 30 percent of users already have problems finding and organising information on the web. In five years from now we will be submerged by Terabytes of data in a number of languages, within which it will be virtually impossible to navigate without sophisticated information extraction tools.
Existing search services are very efficient at locating documents based on a keyword query. However, prevalent string matching search techniques are too coarse grained, and information can only be found if exactly the same words are used in the query and the document. This results in poor recall because many relevant documents are not found, and decreasing precision because there are often documents which are irrelevant to the query but contain terms mentioned in it.
Full text retrieval is a technology that indexes every word in a document. Hundreds of thousands of pages can be searched swiftly and accurately by word, phrase, or concept. Unlike a database, full text retrieval does not prejudge the type of query. Here are some of the features available in the most advanced systems:
- Search by boolean combinations (dog or cat)
- Relevance ranking where the expected relevance of a document is given based on how rich it is in the search criteria.
- Relevance feedback where a user highlights a section of particular interest and then requests that the system 'find more material like this'.
New technologies are emerging for automatic document classification based on statistical methods and Artificial Intelligence algorithms, and for intelligent query parsing and retrieval. These systems can be added to text retrieval software using a modular approach. One proprietary search engine allows the user to deploy an Intelligent Search Agent to obtain information from the Web, CD-ROM or enterprise network. It uses a blend of statistical Natural Language processing (with plain English queries), dynamic semantic clustering, and advanced relevance feedback based on the user's query history. However this and similar leading edge systems are only available for the English language.
A group of interacting tools will be produced which improve access to multi-lingual information on the web. This will lower the threshold at which the provision of multi-lingual documents becomes economically feasible. It will provide the following features:
- Search, retrieval and navigation for the end user. The search application consists of two sub-systems for intelligent indexing and intelligent retrieval:
intelligent indexing: extracts indexing expressions automatically and semi-automatically
intelligent retrieval: this will be supported through on-line creation of help information, such as maps or menus, so that the user is guided through the search space
- Tools for web content providers and service providers. These will include machine aided translation of hyperlinked web documents, management of aligned, translated versions of a document and delivery of documents in the language preferred by a user.
The demonstrator will be tested on the Club Internet which is regarded as the most prominent French access site for the general public. This will provide a wide multi-lingual domain. It will also be tested on a narrower domain but with a substantial proportion of translated documents. This will be either a marketing service for professionals, traders and marketing experts, or a specialist service catering for health or construction issues.
The demonstrator will initially use French, English and German, but will be designed to easily adopt other languages. It will be compared in performance to the best existing systems such as Fulcrum, Alta Vista and Lycos.
MULINEX will thus allow speakers of different languages equal access to information and will enable better representation of information from different countries and cultures. Educational establishments, libraries, public administrations and all other sectors using the web will be able to provide large multi-lingual sites more economically, due to cost reductions in transforming multi-lingual text into structured web documents.
The software will be used in the newly formed subsidiary of one of the partners to help develop web services. Another partner will use MULINEX's phrase and word level alignment technology in its next generation of alignment products to acquire multi-lingual terminology from existing corpora of translations.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
- humanities languages and literature general language studies
- natural sciences computer and information sciences internet internet access
- natural sciences computer and information sciences data science natural language processing
- natural sciences computer and information sciences internet world wide web
- social sciences political sciences public administration
You need to log in or register to use this function
We are sorry... an unexpected error occurred during execution.
You need to be authenticated. Your session might have expired.
Thank you for your feedback. You will soon receive an email to confirm the submission. If you have selected to be notified about the reporting status, you will also be contacted when the reporting status will change.
Programme(s)
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Topic(s)
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Call for proposal
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Data not available
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Funding Scheme
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Data not available
Coordinator
66123 Saarbrücken
Germany
The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.