Periodic Reporting for period 1 - TUPLES (Trustworthy Planning and Scheduling with Learning and Explanations)
Período documentado: 2022-10-01 hasta 2024-03-31
TUPLES is a 3 year project aiming to obtain scalable, yet transparent, robust and safe algorithmic solutions for P&S. The cornerstones of our scientific contributions will be (1) combining symbolic P&S methods with data-driven methods to benefit from the scalability and modelling power of the latter, while gaining the transparency, robustness, and safety of the former and (2) developing rigorous explanations and verification approaches for ensuring the transparency, robustness, and safety of a sequence of interacting machine learned decisions. Both of these challenges are at the forefront of AI research.
We will demonstrate and evaluate our novel and rigorous methods in a laboratory environment, on a range of use-cases in manufacturing, aircraft operations, sport management, waste collection, and energy management. Our results also include practical guidelines derived from the lessons learnt in this process, and open-source software tools and test environments enabling the human-centered development and assessment of trustworthy P&S systems.
In particular, from a scientific standpoint, we have:
• advanced the state of the art with novel hybrid approaches (model-based / data-driven) for P&S that aim at increasing the robustness and/or scalability of existing methods;
• developed new methods for verifying, testing, or enforcing the safety and robustness of the solution schedules, plans or policies produced by these methods; and
• come up with innovative explanation approaches allowing users to understand the solutions produced by P&S systems and justify why particular solution was chosen over others.
In more detail:
• Regarding robusness, we designed novel methods for performing robustness verification of tree ensembles and neural networks. These algorithms form the building blocks for analyzing learned policies.We established a framework for assessing the robustness of tree ensemble predictions at deployment time to identify instances at a heightened risk of being mispredicted. For training neural networks with respect to downstream scheduling and planning systems (decision-focused learning, DFL) we have devised methods based on black-box differentiation that are capable of addressing two-stage and multi-stage optimization problems under uncertainty and are very scalable at inference time. Furthermore, we investigated for the first time the use of DFL for learning to predict action costs.
• Regarding safety, we advanced safety verification of action policies based on predicate abstraction, for both neural policies and policies represented as tree ensembles; as well as policy testing methods, identifying unsafe behavior through new fuzzing and test-oracle methodology. We established a methodology that allows us to learn ML models that are either verifiable by design,or safe by design (i.e. provably compliant with a user-defined property)
• Regarding explainability, we designed new methods, based on minimal unsatisfiable subsets (MUS) of solution properties, to help the user navigate the solution space of constraint satisfaction (including scheduling), classical, numeric, and probabilistic planning problems, and explain the lack of feasible solutions. We also designed new methods explaining decisions produced by black box and grey box planning policies, including those represented by neural networks.
• Regarding scalablity, We have developed a range of new architectures for learning generalised policies and heuristics that scale to large planning & scheduling problems, and novel ways of integrating them with search algorithms. For learning cost coefficients from user solutions in DFL, we improved scalability by systematically avoiding having to call an optimal solver. The inference scalability of our hybrid and DFL methods for two-stage and multi-stage optimisation is already considerably higher than that of widely employed methods while their training scalability still requires improvement.
With respect to use cases:
• For each use case, we have provided a formal specification, alongside with the properties and metrics we will use to evaluate the performance of our novel approaches, and a simulator to automate part of this evaluation;
• we have started conducting preliminary evaluations of our research against these metrics; and
• we also engaged with end-users of some of our use cases to capture their expectations and understand the human factors and ethical considerations that would influence their trust towards a P&S system.
Moreover, we have started the design and in some cases, the implementation, of a number of tools to be publicly released by the end of the project, including extensions of the scikit-decide library, the TUPLES Lab – a set of simulation environments enabling controlled experiments on simplified versions of our use cases – and a self-assessment tool to assess the trustworthiness of P&S systems.
In the sectors of waste collection and energy systems, TUPLES also expect to improve energy efficiency and reduce CO2 emissions and other pollutants, as well as traffic congestion.
In the aircraft operations industry, and in particular in the context of our flight diversion use case, TUPLES hopes to improve the security of the passengers and the stress level of the pilot.
We can also expect improvements in working conditions. For example, as part of building a football team, sports scoots will be able to reduce the number of players they will have to observe and their number of movements. And thanks to plan and schedule explanations we will be able to increase the participation employees in company decisions, for example in the context of manufacturing.
More broadly, planning and scheduling has the potential to have a major impact on the industry and society by multiplying the benefits exemplified by these five sectors. But this requires systems that are sufficiently trustworthy to be massively adopted. And this is what TUPLES would like to achieve.