Deep artificial neural networks, and in particular so-called "language models" (such as OpenAI's ChatGPT) are revolutionizing our every-day life. However, these AI systems are trained to reproduce common patterns they extract from huge amounts of text. As such, they have been called "stochastic parrots", that can produce superficially fluent utterances while not truly engaging in meaningful, goal-based linguistic interactions.
The ALiEN project studies what happens when we take these AI systems and make them communicate with each other to achieve a shared goal.
On the one hand, our hope is that in this way we will develop new systems that can learn more efficiently from fewer data, and that are genuinely goal-driven and interactive (imagine, for example, two self-driving cars that coordinate at an intersection through communication).
On the other hand, machine-to-machine communication poses a security threat: we and others have found that, by letting an algorithmic agent interact with a language model, the agent will discover apparently nonsensical word sequences that can make the language model engage in all sorts of behaviors, including producing potentially harmful responses. We found, moreover, that these opaque prompts can be discovered using a language model, but then successfully transferred to other language models, suggesting that they constitute a genuine "universal machine code".
The ALiEN project designs and runs experiments on machine-to-machine communication, using language models, visual models and language+vision models, and analyzes the emergent machine communication protocol with the twin goals of contributing to a more genuinely human-like and safe AI, and to broaden our understanding of how latest-generation AI systems really work.