Objective
The language industries of the future will rely heavily on the availability of large scale language resources e.g. corpora, speech databases, dictionaries, linguistic descriptions -- together with appropriate standards and methodologies. Ready access to harmonised databases of language data and rules would not only provide a direct benefit to research and development efforts across a wide range of private and public organisations, but would also foster fruitful academic and industrial co-operation.
The project aims to define a broad organisational framework for the creation of the language resources for both written and spoken language engineering (LRs in short) which are necessary for the development of an adequate language technology and industry in Europe, and to determine the feasibility of creating a co-ordinated European network of repositories which would perform the function of storing, disseminating and maintaining such resources. This activity is intended to contribute towards the long term goal of making large scale LRs widely available to European organisations involved in R&D and educational activities.
Approach and Methodology
The overall approach and the results which the project intends to achieve can be summarised as follows:
to create structured, publicly available catalogues of existing linguistic resources, using and extending the information already collected by various international and national survey initiatives;
to evaluate the present European situation, comparing what is available with the most urgent needs of the European R&D and teaching communities, and then to formulate recommendations for a concerted European action in the field of reusable resources for natural language and speech;
to discuss with the relevant actors (e.g. owners of resources, producers, private and public users, funding bodies, scientific and professional associations) the various aspects of the problem, their needs and requirements, the possible solutions, their willingness to co-operate, and the conditions for a joint European action;
to identify, describe and evaluate at various levels (e.g. organisational, technical, legal) alternative methods and structures which could ensure the creation, management and maintenance of a European repository of reusable LRs, and their dissemination to the various types of users;
to experiment with the collection and dissemination of existing LRs using (i) a distributed electronic network and (ii) CD-ROM pressing facilities, with the aim of encouraging the reuse of already available resources, and also of acquiring experience which will feed into the formulation of final recommendations;
to present final recommendations for establishing a collaborative infrastructure that will act as a collection, verification, management and dissemination centre, built on the foundation provided by existing European structures and organisations.
Assessing Existing Resources: carrying out a review of what LRs currently exist, both in Europe an elsewhere. The goal of this survey is not to produce a comprehensive, exhaustive catalogue of such resources, but rather to assess which needs of the various European languages are still not satisfied by the available resources, and to compare and characterise the situations of the different languages. The results of this evaluation effort will provide the basis for the general recommendations (see below).
Needs Analysis: determining the main resource needs of European actors involved in RTD training and system development; discussing the various aspects (e.g. legal, financial, organisational problems; participation and role of different types of public and private actors) of the actions required to meet the needs for LRs in Europe, as a basis for defining an overall organisational framework for the development of adequate LRs in Europe.
Experimental Implementation: testing the usefulness and feasibility of a distributed resource repository by implementing an infrastructure on which will be mounted a set of LRs; in particular we will experiment with the dissemination of LRs using ELSNET's existing infrastructure for LRs: (i) a wide-area network running the AFS server software, and (ii) the formatting, mastering and distributing of data by CD-ROM.
Recommendations: making detailed recommendations for the creation, management, and maintenance of a distributed, managed repository of reusable LRs, based on a detailed analysis and evaluation of the alternatives.
Exploitation and Future Prospects
The goal of the project is the co-ordinated collection and distribution of LRs, promoting awareness of the need for creating widely available LRs, and the promotion of consensus on an overall European strategy. Consequently, dissemination activities are central to the project. The project consortium comprises representatives of major European-wide bodies and associations, most notably ELSNET, ESCA and EACL, and will be assisted by an industrial steering committee composed of representatives of leading IT companies, publishers, PTTs and other providers of electronic information services.
The action will be carried out in co-operation with relevant European groups and with on-going initiatives such as EAGLES, and will imply amongst other things an analysis of existing international structures. It is expected that the experimental activities carried out within the project and the recommendations for further larger-scale operations will contribute to the establishment of a broad language infrastructure covering all Community languages.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
- natural sciences computer and information sciences software
- natural sciences computer and information sciences databases
You need to log in or register to use this function
We are sorry... an unexpected error occurred during execution.
You need to be authenticated. Your session might have expired.
Thank you for your feedback. You will soon receive an email to confirm the submission. If you have selected to be notified about the reporting status, you will also be contacted when the reporting status will change.
Programme(s)
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Topic(s)
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Data not available
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Call for proposal
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Data not available
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Funding Scheme
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Data not available
Coordinator
56100 Pisa
Italy
The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.