Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Autonomous Linguistic Emergence in neural Networks

Periodic Reporting for period 2 - ALiEN (Autonomous Linguistic Emergence in neural Networks)

Periodo di rendicontazione: 2023-07-01 al 2024-12-31

Deep artificial neural networks, and in particular so-called "language models" (such as OpenAI's ChatGPT) are revolutionizing our every-day life. However, these AI systems are trained to reproduce common patterns they extract from huge amounts of text. As such, they have been called "stochastic parrots", that can produce superficially fluent utterances while not truly engaging in meaningful, goal-based linguistic interactions.

The ALiEN project studies what happens when we take these AI systems and make them communicate with each other to achieve a shared goal.

On the one hand, our hope is that in this way we will develop new systems that can learn more efficiently from fewer data, and that are genuinely goal-driven and interactive (imagine, for example, two self-driving cars that coordinate at an intersection through communication).

On the other hand, machine-to-machine communication poses a security threat: we and others have found that, by letting an algorithmic agent interact with a language model, the agent will discover apparently nonsensical word sequences that can make the language model engage in all sorts of behaviors, including producing potentially harmful responses. We found, moreover, that these opaque prompts can be discovered using a language model, but then successfully transferred to other language models, suggesting that they constitute a genuine "universal machine code".

The ALiEN project designs and runs experiments on machine-to-machine communication, using language models, visual models and language+vision models, and analyzes the emergent machine communication protocol with the twin goals of contributing to a more genuinely human-like and safe AI, and to broaden our understanding of how latest-generation AI systems really work.
During the first half of the project, we performed the following work:

1) We conducted a large-scale, systematic analysis of communication between visual networks, finding that: a) it is possible to establish successful communication between networks that have very different architectures; and b) a new visual network can learn the communication code of an existing community of networks in a much more efficient way than if it had to build a new communication protocol from scratch. In other words, it is possible to induce a transferable visual network communication protocol that can be rapidly taught to new networks.

2) We looked at what happens when two multimodal systems are made to communicate with each other about natural images. We found that, when no explicit tuning for communication is performed, systems might succeed at communicating, but they develop a code that is completely opaque to humans. On the other hand, when the systems are tuned to solve a simple communication task together, they become better at describing images in a more general and human-like way.

3) We showed that, by tuning simple language-generation systems to interact with a large language model, a code emerges that allows language model control, while being completely opaque to human inspection. We showed moreover that this code is transferable across language models. We are currently taking the first steps towards an in-depth analysis of this "unnatural language" code.

4) We developed an interactive environment in which two language models must cooperate through language in order to solve a common task. We are using this environment to study how the language of the models evolve to optimize interaction, and the limits of language model interaction.

5) To support all lines of work above, we are developing tools for communication protocol interpretation, focusing in particular on a geometric analysis of the inner linguistic representations of language models and similar systems.
We built on existing work in emergent deep net communication to show that it is possible for pre-trained visual networks to develop a shared code that is reasonably general and that can be rapidly transferred to other visual networks. This paves the way to applied work on letting visual-net-powered agents operating in the real world cooperate through an emergent communication protocol.

We have demonstrated how communication-based training is an efficient and cheap way to develop better image description models.

We were the first to show how opaque information extraction prompts are transferable from one language model to another, and the first to provide some insights on the nature of these prompts.

Our main aims until the end of the project are twofold. On the one hand, we want to provide a full characterization of "unnatural" emergent language: how does it arise, what are its properties, how is it processed by the networks, and how can it be avoided, if necessary. On the other hand, we want to explore the limits of interactive, language-based problem solving in communities of large language models.
Il mio fascicolo 0 0