Periodic Reporting for period 5 - XAI (Science and technology for the explanation of AI decision making)
Periodo di rendicontazione: 2024-11-01 al 2025-09-30
Under the XAI paradigm, the AI’s output—whether a prediction or recommendation—is enriched with an explanation that may take various forms: highlighting the input features that most influenced the result, providing examples, or supporting counterfactual reasoning (showing what changes would alter the outcome). The format and language depend on the target audience: developers, domain experts, lay users, or legal auditors.
XAI techniques fall into two main categories:
Post-hoc: These add an explanation module to an existing model to reconstruct its logic after the fact. The metaphor is "opening black boxes."
Interpretable-by-design: These involve creating inherently understandable models where the explanation is directly extractable without additional components. The metaphor is "designing white boxes."
Both approaches involve two critical steps: extracting the explanation artifact and presenting it in a language appropriate for the user.
This taxonomy of XAI problems and methods, is a preliminary result of the XAI project team, based on “A survey of methods for explaining black box models” (ACM CSUR 2018, >7000 citations) and subsequent surveys [39,29,85].
The XAI (Fig1)project has pioneered the challenge of requiring AI to be explainable and is articulated along 5 Research Activities:
RA1) algorithms to infer local explanations and their generalization to global ones (post-hoc) and algorithms that are transparent by-design;
RA2) languages for expressing explanations in terms of logic rules, with statistical and causal interpretation; RA3) XAI watchdog platform for sharing experimental dataset and explanation algorithms;
RA4) a repertoire of case studies aimed at also involving final users;
RA5) a framework to study the interplay between XAI and ethical and legal dimensions.
The XAI project contribuited to establishing XAI requirements for the design of AI-based systems under the AI Act, with a focus on key concepts such as AI Risk and Trustworthy AI [7,45,77,95] resulting from interdisciplinary collaboration with legal and ethical experts.
Rule-Based Factual and Counterfactual Explanations [1,2,9,69,109] Post-hoc, model-agnostic local methods explain black-box decisions by reconstructing the used logic. The core intuition is that while decision boundaries are globally complex, they are locally simple, allowing approximation by interpretable models. LORE (LOcal Rule-based Explainer) (Fig4) pioneered this paradigm (Guidotti et al., arXiv:1805.10820). It generates a local neighborhood around the instance, labels it via the black box, and trains a local decision tree. Crucially, LORE derives a dual explanation: factual (rules explaining "why") and counterfactual (rules explaining "what if"), aligning with cognitive psychology. LORE outperforms competitors due to decision boundary exploration (using genetic algorithms) and robustness (ensuring actionable constraints). Extensions include reasoning (REASONX), merging for global consensus (GLocalX), and natural language generation (MAINLE).
Explanation by Example(s) [5,6,24,52,57,72,75] Based on Latent LORE (LLORE), this paradigm leverages latent feature spaces learned via autoencoders to handle complex data like images (ABELE) and time series (LAST). A local interpretable model filters plausible factual and counterfactual points in the latent space (enriched with black-box predictions), which are then mapped back to the original space as interpretable exemplars.
Domain-Informed Explanation DoctorXAI [4,18,19,22,43], presented at ACM FAT 2020, pioneered ontology-based explanations for sequential data. It adapts local explanations to the medical domain using specific ontologies and Health Records (sequences of events). The medical ontology graph helps generate synthetic neighborhoods and meaningful explanations, addressing group unfairness and supporting trust measurement in user studies.
Post-Hoc Global Explanations [59,120] The Interpretable Latent Space method defines a linear encoding of features by learning a latent space on black-box labeled data. Originating from Bodria’s PhD thesis, this led to ILLUME, a global-to-local approach that sets the stage for a new ML design pipeline merging global meta-explainers with local instantiation.
(RA3, RA4): The XAI Library (github.com/kdd-lab/XAI-Lib) powers the Watchdog platform, providing a benchmarking workspace for algorithms and quantitative evaluation measures. In RA4, we conducted qualitative validation with >200 health professionals using the Judge-Advisor System (JAS) to evaluate trust. This work received an Honorable Mention at ACM-CHI22 [18].
RA5 The interplay between privacy and fairness, privacy risks of explainers, and broader trustworthiness issues were intensively explored, examining the practical implications of the European ethical and legal guidelines [45,32,95,92,117] and leading to the production of new algorithms for auditing, assessing and balancing explainability advantages versus various risks [19,56,45,77,114,110].
The project had wide international impact, reaching a scientific audience of >30,000 people and produced ~150 publications (72 Open Access).
We organized 26 workshops, 3 conferences, and 7 tutorials. PIs delivered 42+ keynotes, and team members presented at 130+ venues.
Highlights include the "XAI Distinguished Seminars" (2021), a 15-team Hackathon (2024), and invitations to ESOF 2024 (Katowice) and TEDx (2023). 14 PhDs made their theses on XAI topics, and the PI activated an XAI course at SNS
The three strengths, which make LORE a great advance over the state of the art are: i) the genetic algorithm for neighborhood generation, ii) the construction of factual and counterfactual logical rules, and iii) the combination with latent space.
These elements are like Lego pieces that allow adaptability to different forms of data and provide the basis for stability, robustness, and actionability, enabling reasoning, conversation, and explaining uncertainty. The last step is ILLUME, a paradigm shift in post-hoc explainability by deploying a 'meta-explainer' that generates a global surrogate to approximate black-box behavior. Uniquely, this method bridges the gap between global and local explainability by providing a way to rethink Post-hoc Explanation (Fig5)