Community Research and Development Information Service - CORDIS


French pharmaceutical theses are rarely quoted. If the main obstacles originate from language or access barriers, proper indexation could also be blamed. Manually extracted keywords don't necessarily come from a structured thesaurus. In this paper the manual indexing method is compared to an automated one, "Nomindex", based on UMLS. The automated method is improved by the addition of a relevance scoring system. The first indexing step consists of downloading, adapting and indexing theses in electronic format. Results will then be analysed and scored by relevance, by comparing classic statistical indices (noise/silence/relevance). It was assumed that the manually obtained keywords were always relevant. The silence of the manual indexing is nevertheless high: seven new keywords are proposed by Nomindex, whose results are mixed (10% of silence, but 50% of noise). These results are promising for the first experiment: pharmaceutical document without lexicon improvement. The indexing, if it is currently insufficient for a real life use, could easily be improved by specific updates of the lexicon.

Additional information

Authors: MARY V, Laboratoire d'informatique médicale, Faculté de Médecine, Rennes (FR);LE DUFF F, Laboratoire d'informatique médicale, Faculté de Médecine, Rennes (FR);LE BEUX P, Laboratoire d'informatique médicale, Faculté de Médecine, Rennes (FR);POULIQUEN B, European Commission, Joint Research Centre, Institute for the Protection and Security of the Citizen, Ispra (IT);DARMONI S J, Direction de l'Informatique et des Réseaux, Rouen (FR);SEGUI A, Laboratoire de mathématiques et de physique phamaceutique, Faculté de Pharmacie, Rennes (FR)
Bibliographic Reference: An oral report given at: MIE2002, XVIIth International Congress of the European Federation for Medical Informatics Organised by: European Federation for Medical Informatics (EFMI) Held at: Budapest (HU), 25-29 August 2002
Record Number: 200214520 / Last updated on: 2002-04-03
Original language: en
Available languages: en