European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS
Zawartość zarchiwizowana w dniu 2024-04-19

Semi-automatic indexing system for technical abstracts

Exploitable results

SISTA is designed for organizations compiling abstracts and indexing services in technical subject areas. In particular it supports indexers assigning controlled-language descriptors selected from a thesaurus. In addition to use in assigning descriptors for scientific abstracts, SISTA technology could be applied to other indexing situations and to the routing of texts, for example, of news service items. SISTA proposes possible descriptors for a document by analysing the text of the title and abstract and provides a personal computer (PC)-based interface allowing the indexer to prepare a final list of index terms. The indexers, as users, may work in-house or externally to the database compiler. SISTA uses natural language processing (NLP) techniques for the statistical and syntactic analysis of text in an existing corpus of documents, and to determine the statistical association of the resulting 'diagnostic units' with the originally assigned indexing. The resultant model is used to propose index terms for new texts. Work on several corpora of abstracts showed that optimal SISTA performance depends on the selection of document representation and on a descriptor assignment strategy appropriate to a corpus. Generally, a model using single association between diagnostic units and descriptors can exploit sophisticated representations such as noun groups better than does a probabilistic model.

Wyszukiwanie danych OpenAIRE...

Podczas wyszukiwania danych OpenAIRE wystąpił błąd

Brak wyników