CORDIS - EU research results
CORDIS

DEveloper COmpanion for Documented and annotatEd code Reference

Periodic Reporting for period 2 - DECODER (DEveloper COmpanion for Documented and annotatEd code Reference)

Reporting period: 2020-01-01 to 2021-12-31

The DECODER (DEveloper COmpanion for Documented and annotatEd code Reference) project aims at providing a unified framework for gathering a comprehensive knowledge base about a given software project and offering a wide variety of requests over such information. This knowledge base is known as the Persistent Knowledge Monitor (PKM).
The needs of all stakeholders across the whole software development life-cycle vary significantly from modelling to implementation to validation to maintenance. Moreover, information is typically scattered across many documents, many of them in natural language text, making it difficult to process and to relate to the relevant parts of the code itself. This lack of traceability is a major impediment to software development, review and maintainance. For instance, it is sometimes difficult to assess whether a function fulfills its intended behavior as described in an ambiguous and/or incomplete English text. On the other hand, manually writing fully formal specification is extremely tedious, especially for large software. Similarly, when facing some maintenance tasks on some unfamiliar piece of software, it can be complicated to understand whether some comment adequately represents what the actual code is doing. Still, at maintenance level, from the mere description of a vulnerability discovered in a third-party dependency code of the project at hand, deciding whether this vulnerability might have an impact on the product, and, if so, how critical it is, is usually not an obvious task. In order to improve this situation, DECODER proposes to use Natural Language Processing (NLP) techniques for extracting information from informal project-related documents and finding correspondances with the code artifacts and/or the associated formal specifications.
In addition, a semi-formal description language (ASFM, standing for Abstract Semi-Formal Model) is designed, together with an abstract graphical tool for animating models. ASFM is generated either from informal documents or from the code. The user can then refine interactively the models. Once they are satisfied with the refinements, they will be able to derive a fully formal specification and use formal methods for verifying the software, or to generate test scenarios that cover all possible behaviors exhibited by the model.
Finally, the PKM and the associated tools developed within the DECODER are assessed on several real-world use-cases in order to ensure that they are up to the task. Feedback from the use-case partners are taken into account throughout the project to improve the tools.
During the course of the project, DECODER explored several directions. First and foremost, all partners were involved in the definition of the PKM meta-model, in order to gain a comprehensive view of all the various user roles interacting with the PKM during software development lifecycle, as well as which types of artifacts might be stored in the knowledge base, and which kinds of requests the PKM should be able to answer. In parallel, DECODER investigated methodological solutions for taking advantage of the PKM server in existing software development processes, such as Waterfall, V-cycle, or agile methods.
Based on the meta-model, the PKM server and its API were designed and implemented on top of MongoDB, a document-oriented database. The preferred interacting device with the database itself is a RESTful server, which features a powerful set of requests specified through the OpenAPI format.
A prototype for the graphical user interface of the PKM client was designed and implemented, acting as the main user entry point to the PKM. Moreover, an orchestrator was developed that takes care of making the relevant calls to the PKM server and to the individual tools, providing a very smooth user experience for the most standard usages of the platform. In order to facilitate its insertion in existing SW development chains, the PKM server can be synchronized with a git server, ensuring that the code artifacts stored in the PKM are always up-to-date.
On another aspect of the project, a first version of the ASFM language was designed. ASFM takes the PKM meta-model as reference, and is defined as a JSON schema that closely follows the PKM API to ease the task of generating ASFM models from requests to the PKM server. ASFM schemas can be extracted from standard .doc or .pdf documents, and are the basis of the FormalDebug tool that represents graphically the datastructures of the program and can be manipulated interactively to better understand the invariants in the code.
Another work stream concerned NLP and the connections between code and informal documents. Code summarization tools were applied to track potentially misused identifiers and suggest a fix. The identification of correspondances between documentation and code using machine translation techniques also proceeded well. All NLP tools were integrated within the PKM. It is now possible to visualize the traceability matrix between high-level requirements (in English) and the relevant code. Finally, some work was devoted to semantically analyze CVE messages and highlight their main characteristic, in order to help users to assess whether their code is impacted by a given vulnerability.
Finally, the selection of use-cases was refined, ensuring that they are representative of a large class of applications and that they can lead to a wide spectrum of activities involving the PKM and the other DECODER tools. All use-cases were analyzed with the tools developed within the project and the results are stored in the PKM server.
The project was presented in several venues, in particular open-source conferences, where it sparked the interest of many developers. DECODER is thus in line with its ambition of delivering tools and methodologies for radically improving software development processes, thereby offering the possibility to obtain much more easily robust and secure software. The PKM and a core set of tools have become mature enough to be offered outside of DECODER for early adopters. This has been possible through two beta-testing campaigns under the auspices of the ReachOut project, as well as through evaluations done with students and researchers outside of DECODER. CEA’s membership in the OW2 community ensures that the PKM and related tooling, which are already available under Open-Source licence, will be maintained on the longer run.
To conclude, DECODER’s results can be extended in various ways, which could be the subject of future collaborative projects. This includes a tighter integration with git servers to let individual tools exploit the correspondances between each successive version of the code base, enhancing the results of NLP tools when it comes to generate formal specification, and extending ASFM to facilitate formal reasoning with graphical representation of structures.
pkm-ecosystem.png