Skip to main content

Structured Information Management: Processing and Retrieval

Objective

The SIMPR project developed new techniques for the management of text stored in very large information banks, such as text published on optical media.
The SIMPR project developed new techniques for the management of text stored in very large information banks, such as text published on optical media. A first prototype system for the semiautomatic indexing of technical texts was designed and built. Implemented as a module within an existing information retrieval software system, the indexing prototype processes ASCII texts to extract analytics, that is, terms that accurately reflect the meaning of a text. These analytics are validated by the user and then stored in indexes for subsequent search against keyworks specified in a request for information. The prototype incorporates advanced linguistic software, performing morphological and syntactic analysis and disambiguation of texts. The results from the project include:
linguistic software to analyse the morphology and syntax of texts (this software succeeds in disambiguating input texts);
information indexing techniques for use with existing or innovative information retrieval systems (these techniques represent an advance on the current technique of file inversion, offering greater accuracy of retrieval via validated indexes);
information modelling techniques, enabling the author of an information bank to impose different structures on it depending on different user requirements, and enabling the user to retrieve information by tracing a path through a selected model (these models will contain facilities for the subject classification of texts, including software to find information using a search request specifying the subject required);
techniques to map information commonalities between texts and databases for computer aided design (CAD) systems.

The final output of the project is a system for creating and managing large information banks, such as an authoring system for optical information stores. It resembles a hypertext system in its ability to link items of information at different levels, but it aids the user by establishing links automatically, acc ording to subject analysis of texts and conformance with user specified information models.
The main stages of the project were:

- Research into automatic indexing of texts using language analysis. Specification and development of rules for automatic, application-independent and language-independent indexing.
- Research into subject classification methodologies. Design, development, and evaluation of an expert system to classify texts according to a user-defined subject taxonomy.
- Design and development of special-purpose software support tools for information bank maintenance.
- Research into domain and task modelling using information extracted from design databases and domain structures, and analysis of text structures. Use of domain models to browse and retrieve texts by applying SIMPR text indexing to text structures.

Coordinator

Peingown
Address
Kilmuir
IV51 9UB Portree
United Kingdom

Participants (9)

CAP GEMINI INTERNATIONAL SUPPORT
Netherlands
Address
Burgemeester Elsenlaan, 170, 3027
2280 GA Rijswijk
Computer Resources International
Denmark
Address
Bregneroedvej 144
3460 Birkeroed
DUBLIN CITY UNIVERSITY
Ireland
Address

Dublin 9
NOKIA CORPORATION
Finland
Address
, 226
02101 Helsinki
RESEARCH UNIT FOR COMPUTATIONAL LINGUISTICS
Finland
Address
Vuorikatu 5B
00100 Helsinki
UNIVERSIDAD CATOLICA PORTUGESA
Portugal
Address
Palma De Cima
1600 Lisboa
UNIVERSITEIT VAN AMSTERDAM
Netherlands
Address
Roetersstraat, 15
1018 Amsterdam
UNIVERSITY COLLEGE DUBLIN
Ireland
Address
Belfield
Dublin 4
UNIVERSITY OF STRATHCLYDE
United Kingdom
Address
16 Richmond Street
G1 IXQ Glasgow