Project description
Teaching machines to see the world
The invention of deep neural networks has broadened the horizons of machine learning processes. It is now possible for a computer not only to process natural language and vision but even to learn models combining vision and language (V&L). The EU-funded IMAGINE project will integrate world knowledge with natural language generation and models of V&L. In other words, the machine will apply algorithms that mimic our reasoning abilities for solving tasks using knowledge available in machine-friendly multi-modal knowledge bases.
Objective
Deep neural networks have caused lasting change in the fields of natural language processing and computer vision. More recently, much effort has been directed towards devising machine learning models that bridge the gap between vision and language (V&L). In IMAGINE, I propose to lead this even further and to integrate world knowledge into natural language generation models of V&L. Such knowledge is easily taken for granted and is necessary to perform even simple human-like reasoning tasks. For example, in order to properly answer the question “What are the children doing?” about an image which shows parents with children playing in a park, a model should be able to (a) tell children from parents (e.g. children are considerably shorter), and infer that (b) because they are in a park, laughing, and with other children, they are very likely playing.
Much of this knowledge is presently available in large-scale machine-friendly multi-modal knowledge bases (KBs) and I will leverage these to improve multiple natural language generation (NLG) tasks that require human-like reasoning abilities. I will investigate (i) methods to learn representations for KBs that incorporate text and images, as well as (ii) methods to incorporate these KB representations to improve multiple NLG tasks that reason upon V&L. In (i) I will research how to train a model that learns KB representations (e.g. learning that children are young adults and likely do not work) jointly with the component that understands the image content (e.g. identifies people, animals, objects and events in an image). In (ii) I will investigate how to jointly train NLG models for multiple tasks together with the KB entity linking, so that these models benefit from one another by sharing parameters (e.g. a model that answers questions about an image benefits from the training data of a model that describes the contents of an image), and also benefit from the world knowledge representations in the KB.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
- natural sciencescomputer and information sciencesdata sciencenatural language processing
- natural sciencescomputer and information sciencesartificial intelligencecomputer vision
- natural sciencescomputer and information sciencesknowledge engineering
- natural sciencescomputer and information sciencesartificial intelligencemachine learning
- natural sciencescomputer and information sciencesartificial intelligencecomputational intelligence
You need to log in or register to use this function
We are sorry... an unexpected error occurred during execution.
You need to be authenticated. Your session might have expired.
Thank you for your feedback. You will soon receive an email to confirm the submission. If you have selected to be notified about the reporting status, you will also be contacted when the reporting status will change.
Programme(s)
Funding Scheme
MSCA-IF - Marie Skłodowska-Curie Individual Fellowships (IF)Coordinator
1012WX Amsterdam
Netherlands