Advancing AI’s ability to derive meaning from text and images

New deep learning models demonstrate the ability to holistically interpret text, image and video inputs.

Human intelligence is a remarkable thing. “By merely looking at our surroundings, we can immediately form conclusions about what is happening and who is involved,” says Amir Globerson(opens in new window), a computer science professor at Tel Aviv University(opens in new window). “When reading a book, we form a mental image of the world being described.” While artificial intelligence (AI) is impressive, it still struggles to match the human mind’s ability to connect the meaning of individual components to create a coherent understanding of the whole. But this skill is absolutely critical to AI’s use in, for example, self-driving cars, robotics and medical diagnostics. Enter HOLI(opens in new window), an EU-funded project working to help AI gain a more holistic understanding of text and image inputs.

A framework for designing deep learning models

The project, which received support from the European Research Council(opens in new window), delivered an innovative framework that can be used to design deep learning models capable of achieving a comprehensive interpretation of their inputs. “We did this by building models that explicitly represent scene components and then letting those representations interact with each other via deep learning architectures,” explains Globerson. In addition to the framework, the project provided new insights on the ‘how and why’ these models work. “We demonstrated that it is the specific way these models learn that enhances their generalisation capabilities,” notes Globerson.

Exciting times for AI

According to Globerson, these achievements are the direct result of his team of researchers, who approached the challenges presented with excitement, rigour, creativity and diligence. Many of these researchers will apply the skills and knowledge they gained during the project to their careers in academics or industry. “I am proud of the many contributions our team has made to AI, both in terms of developing holistic AI architectures and in understanding why and how such models work,” remarks Globerson. This, along with the project’s contribution to deep learning theory, introduction of a visual prompting paradigm, and discovery of how transformers perform in-context learning, have all inspired follow-up work. “These are exciting times for AI, and the key themes we explored in the project are still largely open,” concludes Globerson. “I am confident that the ideas and techniques we introduced during the HOLI project will help answer these questions, ultimately improving AI’s ability to form meaning from images, videos and text.”

Keywords

Project Information

HOLI

Grant agreement ID: 819080

Project website

DOI

10.3030/819080

Project closed

EC signature date 17 January 2019

Start date 1 February 2019

End date 31 January 2025

Funded under

EXCELLENT SCIENCE - European Research Council (ERC)

Total cost

€ 1 932 500,00

EU contribution

€ 1 932 500,00

1 932 500,00

Coordinated by

TEL AVIV UNIVERSITY
Israel

Advancing AI’s ability to derive meaning from text and images

A framework for designing deep learning models

Exciting times for AI

Keywords

Share this page Share this page on social networks

Download Download the content of the page