Skip to main content

DEveloper COmpanion for Documented and annotatEd code Reference

Periodic Reporting for period 1 - DECODER (DEveloper COmpanion for Documented and annotatEd code Reference)

Reporting period: 2019-01-01 to 2019-12-31

The DECODER (DEveloper COmpanion for Documented and annotatEd code Reference) project aims at providing a unified framework for gathering a comprehensive knowledge base about a given software project and offering a wide variety of requests over such information. Within DECODER, this knowledge base is known as the Persistent Knowledge Monitor (PKM).
Indeed, the needs of all stakeholders across the whole software development life-cycle vary significantly from requirements engineering and modelling to implementation to validation to maintenance. Moreover, information is typically scattered across many documents, and often consists in large part in natural language text, making it difficult to process and to relate to the relevant parts of the code itself. This lack of traceability is a major impediment to software development, review and maintainance. For instance, it is sometimes difficult to assess whether a function fulfills its intended behavior when that behavior is described as a possibly ambiguous and/or incomplete English text. On the other hand, manually writing fully formal specification can become extremely tedious, especially for large software and is usually restricted to extremely critical code. Similarly, when facing some maintenance tasks on some unfamiliar piece of software, it can be complicated to understand whether some comment adequately represents what the actual code (which might have evolved independently from the comment) is doing. Still, at maintenance level, from the mere description of a vulnerability discovered in a third-party dependency code of the project at hand, deciding whether this vulnerability might have an impact on the product, and, if so, how critical it is, is usually not an obvious task. In order to improve this situation, DECODER proposes to use Natural Language Processing (NLP) techniques for extracting information from informal project-related documents and finding correspondances with the code artifacts and/or the associated formal specifications.
In addition, a semi-formal description language (ASFM, standing for Abstract Semi-Formal Model) will be designed, together with an abstract graphical tool for animating ASFM models of datastructures to let users experiment with them at a more abstract level. On one hand, ASFM models will be partially generated from informal requirements to provide a more precise description of the intended behavior of the system. On the other hand, other ASFM models will stem from code itself, with the animation tool acting as a kind of abstract debugger showing how the implementation reacts to various sequence of events and data. In both cases, the user will be able to refine the model generated by NLP tools. Once they are satisfied with the ASFM model they obtained, they will then be able to derive a fully formal specification and use formal methods for verifying the software, or to generate test scenarios that cover all possible behaviors exhibited by the model.
Finally, the PKM and the associated tools developed within the DECODER will be assessed on several real-world use-cases in order to ensure that they are up to the task.
Feedback from the use-case partners will be taken into account throughout the project to improve the tools.
During the first year of the project, DECODER explored several directions. First and foremost, all partners were involved in the definition of the PKM meta-model, in order to gain a comprehensive view of all the various user roles interacting with the PKM may play during software development lifecycle, as well as which types of artifacts might be stored in the knowledge base, and which kinds of requests the PKM should be able to answer.
This resulted in a UML diagram defining these various entities and their relationships. In parallel to this work, and closely related to it, DECODER has investigated methodological solutions for taking advantage of the PKM server in existing software development processes, such as Waterfall, V-cycle, or agile methods.
The PKM meta-model was then used to foster the design of the PKM server, based on a document-oriented database (currently MongoDB), and the use of JSON objects for handling requests and data. Request API is currently actively developed and a first prototype is scheduled for release in 2020.
Conversely, some work has taken place to design the graphical user interface for the PKM client, with some User eXperience (UX) workshops dedicated to identify the main needs of the various kinds of users of the PKM.
On another aspect of the project, a first version of the ASFM language has been designed. ASFM takes the PKM meta-model as reference, and is defined as a JSON schema that closely follows the PKM API to ease the task of generating ASFM models from requests to the PKM server. Tools for interacting with ASFM models are currently under development.
Another avenue of work concerned natural language processing and the connections between code and informal documents. For both directions training sets have been collected and the tools are currently exercised on them with encouraging results.
Finally, the selection of use-cases has been refined, ensuring that they are representative of a large class of applications and that they can lead to a wide spectrum of activities involving the PKM and the other DECODER tools. First analyses with individual tools have taken place.
While it is yet a bit early to assess the impact of DECODER, as the core tools are not yet really available outside the DECODER team, the project has been presented in several venues, in particular open-source conferences, where it has sparked the interest of many developers. DECODER is thus in line with its ambition of delivering tools and methodologies for radically improving software development processes, thereby offering the possibility to obtain much more easily robust and secure software.
In addition, common interests have been identified of several H2020 projects, in particular OpenReq, which focuses on requirements engineering, FASTEN, which deals with finegrained analysis of dependencies between software packages, and Cross-Miner, which aims at helping developing software systems based on existing open-source components and selecting the most appropriate component for a given purpose. In the next phase of the project, DECODER will coordinate with these projects to maximize the impact of the results for software development and (open-source) software quality.
Persistend Knowledge Monitor
Encompassing the whole development lifecycle