The work conducted during the project can be organized into three major research areas:
Deep learning faults and mutation: in order to investigate the specific nature of faults that affect deep neural networks, we have performed a qualitative analysis of software forum discussions and code commit messages, and we have conducted semi-structured interviews. The acquired knowledge was organized into a taxonomy of real deep learning faults (ICSE 2020). Then, we have developed a deep learning mutation tool, called DeepCrime (ISSTA 2021), which can inject artificial faults into a deep learning component, mimicking the real faults described in our taxonomy. By simulating the occurrence of real faults in a deep learning component, we can assess the thoroughness of testing: new test cases should be created until all artificially injected faults are exposed by at least one test case. The augmented test set obtained thanks to the faults injected by DeepCrime ensures a higher degree of robustness of the system under test.
Test scenario generation: deep learning components produce unreliable outputs when operating on inputs that deviate from those used to train them. For instance, a self-driving car trained to operate exclusively in sunny conditions might misbehave when it is raining or snowing. Our approach to assess the reliability of a deep learning component finds automatically its frontier of behaviors, consisting of pairs of nearby conditions such that in one the component behaves properly while in the other it fails. In the self-driving car example, this could be the transition from sunny to rainy conditions. Our approach is implemented in a tool, called DeepJanus (FSE 2020), that can automatically find and report the frontier of behaviors. To support explainability and debugging of the failure scenarios, we developed an approach, called DeepHyperion (ISSTA 2021; TOSEM 2023), which provides a characterization of the failure conditions in the form of a feature map. Developers can use DeepHyperion's feature map to understand the precise combination of features that lead to a misbehaviour (e.g. a specific luminosity condition paired with a specific road shape might result occasionally in out of bound episodes of a self-driving car).
Misbehavior prediction: this line of research realizes Precrime’s vision of a self-oracle which can determine if a system based on artificial intelligence components is facing unexpected execution conditions under which it should be safely disengaged to avoid damages. For instance, a self-driving car facing an unexpected driving scenario, which requires the activation of a safe disengagement procedure. We took both a black-box and a white-box approach to the self-oracle problem. In the black-box approach we consider only the input to the deep learning component and we assess its proximity to the training data by means of autoencoders. When the input deviates from the training data, the safe oracle activates safe disengagement. This approach was presented at the conference ICSE 2020. In the white-box approach, the internals of the deep learning component are inspected to obtain measurements of uncertainty. When the component is highly uncertain about its output, again a safe disengagement procedure is activated. The white-box techniques to measure uncertainty are implemented within UncertaintyWizard, a tool that we presented at the conference ICST 2021.