Skip to main content

Multimodal Agents Grounded via Interactive Communication


One of the main goals of artificial intelligence is to build artificial agents that can interact with humans using natural language. To fully master language, an agent needs to know how to use it to accomplish a goal; to interact with another speaker; and to refer to objects in the external reality. My research project aims at equipping an artificial agent with all these skills in one single learning framework.

Communication helps humans accomplish things in the world and cooperate with each other, resulting in continuous and incremental updating of the speakers’ knowledge state. However, traditional machine learning methods used to model language are based on static and passive regimes, and are typically not grounded in external reality. I propose a radically different research programme, based on recent advancements in training neural networks using reinforcement learning, that will enable the move from a static, fully supervised to a dynamic, interactive learning where the agents need to use language to accomplish a task in the visual world. This will dramatically accelerate the development of machines that can talk with humans.

Even though I am an established researcher in computational linguistics, with substantial contributions to the integration of language and vision, I still need to fully develop my own line of research to become a leading, independent researcher in Europe. Carrying out the present proposal at Universitat Pompeu Fabra and Facebook Artificial Intelligence Research will be a fundamental step towards achieving my goal, since my hosts are leaders in computational linguistics, machine learning, and artificial intelligence in general, and specifically in the methods needed for the present proposal. Conversely, my unique profile, bridging computational linguistics and computer vision with machine learning methods, will widen the scope and outreach of the research conducted at both groups.

Field of science

  • /natural sciences/computer and information sciences/artificial intelligence/computer vision
  • /natural sciences/computer and information sciences/artificial intelligence
  • /natural sciences/computer and information sciences/artificial intelligence/computational intelligence
  • /humanities/languages and literature/linguistics
  • /humanities/languages and literature/languages - general
  • /natural sciences/computer and information sciences/artificial intelligence/machine learning/reinforcement learning
  • /natural sciences/computer and information sciences/artificial intelligence/machine learning

Call for proposal

See other projects for this call

Funding Scheme

MSCA-IF-EF-ST - Standard EF


Placa De La Merce, 10-12
08002 Barcelona
Activity type
Higher or Secondary Education Establishments
EU contribution
€ 158 121,60