Objectif This language technology project aims to bridge the gap from clausal syntax to text, and show how the syntactic mechanisms of the language indicate topical themes in text. The project will investigate a large number of texts using both human assessments of foreground and background statements and state-of-the art syntactic analysis tools to chart known and newly found systematic differences between how foreground and background themes are presented.This language technology project aims to bridge the gap from clausal syntax to text, and show how the syntactic mechanisms of the language indicate topical themes in text. The project will investigate a large number of texts using both human assessments of foreground and background statements and state-of-the art syntactic analysis tools to chart known and newly found systematic differences between how foreground and background themes are presented.OBJECTIVESA bottleneck for improving today's information management systems is that we know little of texts as text. Systems view texts as simple sets of words or terms, discarding information such as clause style and argument structure as noise. This project aims to bridge the gap from syntax to text, and show how syntactic mechanisms of language, which primarily concern clause-internal structure, carry text-level information as well. Once we are able to chart some features of the topical progression in a text we will give a road map for algorithms for further processing: indexing and search, summarisation, report generation, and optical text recognition are all application areas which would benefit from better knowledge of what makes texts.DESCRIPTION OF WORKWe will take a large number of texts in several languages and partition the clauses in them into a number of graded categories according to foregroundedness. These clause categories can then be used in different ways for indexing, multi-document summarization, and text item similarity calculation. This first assessment project takes the form of an experiment on text. If the experiment is successful, it opens up an entire research field, which we will continue examining in a future project.1. Assemble corpus. If possible we will use the multilingual TREC corpus.2. Define prototypical clause types based on our theory of foregroundedness.3. Use human test subjects to partition clauses according to prototypical type.4. Find and explain formal differences between types of clause as shown by test subjects, based on theory of transitivity.5. Build tools to automatically identify clause types.6. Index large number of texts using tools, and run test sets of information retrieval queries.7. Result dissemination.8. Plan for continued and refined experimentation. Programme(s) FP5-IST - Programme for research, technological development and demonstration on a "User-friendly information society, 1998-2002" Thème(s) 1.1.2.-6.1.1 - FET O: Open domain Appel à propositions Data not available Régime de financement CSC - Cost-sharing contracts Coordinateur SWEDISH INSTITUTE OF COMPUTER SCIENCE Contribution de l’UE Aucune donnée Adresse ISAFJORDSGATAN 22 164 29 KISTA Suède Voir sur la carte Coût total Aucune donnée Participants (1) Trier par ordre alphabétique Trier par contribution de l’UE Tout développer Tout réduire CONEXOR OY Finlande Contribution de l’UE Aucune donnée Adresse PORRASSALMENKATU 19 A 15 50100 MIKKELI Voir sur la carte Coût total Aucune donnée