SPARKLE provides advanced methods and tools for powerful, flexible and automatic acquisition oflexical information from text corpora. The tools fall into two categories:
- robust, shallow parsers of unrestricted text, and-
lexical acquisition systems, capable of learning (from pre-parsed texts) aspects of word knowledge needed for language Engineering applications.
The tools are based on up-to-date, finite-state technology and were originally developed for statistical and inferential routines for efficiently resolving data deficiencies. The methods are applicable to any type of text and were tested with remarkable results in English, French, German and Italian.
SPARKLE is able to acquire lexical information for verbs - probably the most elusive and challenging category for lexical analysis - as well as being the most important for Language Engineering applications such as machine translation, information retrieval and speech recognition. The variety of syntactic patterns typical for a verb is detected efficiently, and then statistically validated and automatically typed with respect to semantic preferences. SPARKLE technology has been used for intelligent cross-lingual text editing and translation filtering within multilingual information retrieval systems (Xerox, Sharp), and speech recognition systems (Daimler-Benz), and has demonstrated a steady improvement in performance. Acquired information was also used for automatic word sense disambiguation.
Work in SPARKLE actively contributes to efficient development in: automatic parsing of unrestricted text, computational lexical databases, speech dialogue systems, cross-lingual information retrieval, exchange and filtering.
Project URL: http://www.ilc.pi.cnr.it/sparkle.html