Service Communautaire d'Information sur la Recherche et le Développement - CORDIS


GICQ Résumé de rapport

Project ID: 40847
Financé au titre de: FP6-MOBILITY
Pays: Spain

Final Activity Report Summary - GICQ (Grammatical Inference with Correction Queries)

We have studied new kinds of correction queries in the framework of Grammatical Inference, and investigated the learnability of different classes of languages using these new kinds of queries. Main results:
We have introduced a new correction query based on edit distance. We have proposed to learn classes of languages defined via edit distance (i.e., topological balls of strings), and with the help of this new correction query. Moreover, we have conducted several experiments with an Oracle simulating a human Expert, and showed that our algorithm is resistant to approximate answers (publication in the journal JMLR 2008).
We have presented an algorithm that learns finite-state transducers using only one type of query called extension query (the teacher answers an extension query by connecting the new information asked by the learner with the information that the learner already knows). We prove that our new algorithm discovers a target finite-state transducer in polynomial time (publication in IDEAL 2009).
We have worked on the generalisation of the correction learning paradigm. We have explored the general idea of adding labels to the states of an automaton to make it easier to learn. We have considered the problem of learning finite automata with output using label queries (a label query with input string w returns both the output and the label of the state reached by w). We give lower and upper bounds on the number of label queries to learn the output behaviour of a finite automaton in different scenarios (publication in ALT 2009).

2) Linguistically relevant classes of languages. We have studied classes of languages that are relevant from a linguistic point of view. Main results:
We have studied the learnability of Simple External Contextual (SEC) languages. The two main features of SEC are: SEC is Mildly Context-Sensitive (i.e., it can express the non-CF constructions that are most prevalent in natural languages and it has good computational properties); SEC is incomparable with REG and CF, but it is included in CS (publications in ALT 2008, SEL 2010 and the TCS journal 2010).

We have investigated the linguistic relevance of Lindenmayer Systems (L systems). The three main features of this formalism are: bioinspiration, parallelism and generation of non-context-free languages. We have also studied the application of L systems to the description, analysis and processing of natural languages (publications in PAAMS 2010 and ICAART 2010).

3) New formal model of language learning. We have present a new computational model of language learning that takes into account the context, semantics, positive data and corrections. It accommodates two different tasks: comprehension and production. Such a model has allowed us to investigate aspects of the roles of semantics and corrections in the process of learning to understand and speak a natural language.
Main results:
We have presented an algorithm that learns a meaning function and prove that it finitely converges to a correct result under a specific set of assumptions about the transducer and examples used. We have tested our algorithm with natural language samples in an example domain of geometric shapes (publications in ICGI 2008, ForLing 2008, PsychoCompLA 2008, scientific poster in ICDL 2008 and TR Yale-2009).

We have explored the possibility of applying existing automata-theoretic approaches to machine translation to model language production (concretely, subsequential transducers and the OSTIA algorithm) (publication in CLAGI 2009).

We have considered a statistical approach to model comprehension and production, which has produced a more powerful version of our initial model and has allowed us to model meaning-preserving corrections. We have tested our model with limited sublanguages of several natural languages. The results show that: the access to the semantics facilitates language acquisition, the teacher can offer meaning-preserving corrections, the learner can detect intended corrections by the teacher and the presence of corrections has an effect on language acquisition (TR Yale-2010, submission to ICGI 2010).


Tél.: +34-977559543
Fax: +34-977559597