CORDIS - Forschungsergebnisse der EU
CORDIS

Influence-based Decision-making in Uncertain Environments

Periodic Reporting for period 4 - INFLUENCE (Influence-based Decision-making in Uncertain Environments)

Berichtszeitraum: 2022-08-01 bis 2023-01-31

Decision-theoretic sequential decision making (SDM) is concerned with endowing an intelligent agent with the capability to choose actions that optimize task performance. SDM techniques have the potential to revolutionize many aspects of society, and successes such as beating a grandmaster in the game of Go, have sparked renewed interest in this field. However, despite these successes, fundamental problems of scalability prevent these methods from addressing other problems with hundreds or thousands of state variables. For instance, there is no principled way of computing an optimal or near-optimal traffic light control plan for an intersection that takes into account the current state of traffic in an entire city.

INFLUENCE aimed to develop a new class of influence-based SDM methods that overcome scalability issues for such problems by using novel ways of abstraction. For instance, the intersection’s local problem is manageable, but the influence that the rest of the network exerts on it is complex. The key idea that we explored is that by using machine learning methods, we can learn sufficiently accurate representations of such influence to facilitate near-optimal decisions. We call these representations 'approximate influence points' (AIPs).

The objectives were to
1 generate formal understanding of the use of AIP representations
2 develop methods that can induce representations for AIPs
3 develop novel simulation-based planning methods that use AIPs to efficiently plan for very large problems,
4 develop novel influence-based reinforcement learning (RL) methods
5 investigate approaches to exploit AIPs in multiagent coordination.

These objectives have been realized to a large extent:

We published 42 papers, 23 of which in top-tier AI and ML venues. We have further developed the framework of influence-based abstraction and provided conditions under which we can have bounds on the quality of AIPs. As part of a number of papers we have learned effective AIP representations, and we demonstrated that these can improve the efficiency of several tasks. Specifically, we demonstrated that we can improve the task performance of complex online planning problems with 100s of variables. For even more complex problems, we have shown that AIPs can lead to more efficient deep RL significantly reducing the time needed to train without reducing quality, and that they can also improve multiagent learning by parallelizing learning in different sub-problems.

In this way, INFLUENCE has made an important step towards realizing the promise of autonomous agent technology, particularly for domains with local effects of the actions of agents, such as intelligent traffic light control, or coordination of multi-robot teams.
In the initial phase of the project, the PI did preparatory work for the simulation and evaluation framework and worked on the theoretical foundations of AIPs. One postdoc and three PhD students started in fall 2018. An initial joint work, called InfluenceNet, integrated ideas underlying influence-based abstraction in the context of deep RL. The resulting workshop paper (at NeurIPS’2019) gave the first empirical indication that the ideas in INFLUENCE can improve sequential decision making and RL. In fall 2019 another postdoc and PhD student have joined the team, and the members started to pursue their own research lines, focusing on the above objectives. Even though the Covid-19 pandemic did have an impact on our ability to efficiently collaborate, this has resulted in a number of key results and papers.

Some of the main results include AIPs for online planning (presented at NeurIPS’2020), decentralized MCTS with learned teammate models (IJCAI’21), generalizing the theory of IBA (JAIR’21), loss bounds for IBA (AAMAS’21), self-improving simulators for online planning (IJCAI’22), influence-augmented local simulators (IALS) for deep RL (ICML’22), and distributed IALS for multiagent learning (NeurIPS’22). Additionally, the PI has engaged in a number collaborations which closely relate to the INFLUENCE project.
There were many novel results, leading to a total of 42 papers (excl. arxiv pre-preprints). The most important achievements are the following.

-We have further developed the framework of influence-based abstraction to also deal with intra-stage dependencies (so-called 'instantaneous effects').

-Our results on performance loss for approximate influence point give clear criteria of what an AIP should satisfy to be ‘good’, and provides validation for the idea of learning AIPs with standard ML techniques, since we showed that the learning AIPs using the usual cross-entropy loss is aligned with minimizing performance loss.

-As part of a number of papers we have learned effective AIP representations, and we demonstrated that these can improve the efficiency of several tasks.

-We clearly demonstrated the power of using AIPs to speed up online planning and thus achieve better task performance with a given time budget. Given that many realistic settings (traffic, robotics) do have severe real-time constraints, this result may have a huge impact on such domains.

-A possible downside of using AIPs in such a way is that one first needs to learn them before one can start using them in planning. We can overcome this by learning the AIPs while planning. This is accomplished by switching between the use of the (slow) global model and the (fast, but possibly inaccurate) influence augmented local simulator, effectively creating a self-improving simulator.

-For even more complex problems, such as traffic light control settings simulated with SUMO, we showed that AIPs can lead to more efficient deep RL, thus significantly reducing the time needed to train without reducing quality. These deep RL results were also validated on a large-scale multiagent task inspired by robotic warehousing.

-We have explored approaches to exploit AIPs in multiagent coordination showing that they can improve multiagent learning by parallelizing learning in different sub-problems by making use of AIP descriptions.

-Mostly we specified what influence sources needed to be predicted. InfluenceNet shows that the ideas of influence-based abstraction can work even without this. It also provides empirical evidence that we can learn AIPs from experience, without an exact model. Similarly, our work on decentralized MCTS with learned teammate models provided evidence that it is possible to learn reasonably accurate predictive models of complex teammates.

-Integrating the learning of abstractions and RL is surprisingly complex: we provided a counter-example that shows that theoretical model-based RL results may not hold when applied to abstracted learned models, and we detail how it is still possible to give guarantees by resorting to analysis based on Martingale bounds [under submission].

-Based on the insights of influence-based abstraction, we also wrote a blue sky paper that sets out the vision of using multiagent systems (potentially including influence-based abstraction) to deal with non-stationarity as often encountered when deploying machine learning systems.

In the last phase of this project, we have explored practical machine learning methods to learn AIPs, and investigating the use of abstraction techniques in Bayesian RL. Papers describing this are still in preparation.
INFLUENCE investigates how the global problem influences the local problem and vice versa