Content archived on 2024-04-16

Structured Information Management: Processing and Retrieval


The SIMPR project developed new techniques for the management of text stored in very large information banks, such as text published on optical media. A first prototype system for the semiautomatic indexing of technical texts was designed and built. Implemented as a module within an existing information retrieval software system, the indexing prototype processes ASCII texts to extract analytics, that is, terms that accurately reflect the meaning of a text. These analytics are validated by the user and then stored in indexes for subsequent search against keyworks specified in a request for information. The prototype incorporates advanced linguistic software, performing morphological and syntactic analysis and disambiguation of texts. The results from the project include:
linguistic software to analyse the morphology and syntax of texts (this software succeeds in disambiguating input texts);
information indexing techniques for use with existing or innovative information retrieval systems (these techniques represent an advance on the current technique of file inversion, offering greater accuracy of retrieval via validated indexes);
information modelling techniques, enabling the author of an information bank to impose different structures on it depending on different user requirements, and enabling the user to retrieve information by tracing a path through a selected model (these models will contain facilities for the subject classification of texts, including software to find information using a search request specifying the subject required);
techniques to map information commonalities between texts and databases for computer aided design (CAD) systems.

The final output of the project is a system for creating and managing large information banks, such as an authoring system for optical information stores. It resembles a hypertext system in its ability to link items of information at different levels, but it aids the user by establishing links automatically, acc ording to subject analysis of texts and conformance with user specified information models.
The main stages of the project were:

- Research into automatic indexing of texts using language analysis. Specification and development of rules for automatic, application-independent and language-independent indexing.
- Research into subject classification methodologies. Design, development, and evaluation of an expert system to classify texts according to a user-defined subject taxonomy.
- Design and development of special-purpose software support tools for information bank maintenance.
- Research into domain and task modelling using information extracted from design databases and domain structures, and analysis of text structures. Use of domain models to browse and retrieve texts by applying SIMPR text indexing to text structures.


