European Commission logo
English English
CORDIS - EU research results

An Artificial Assistant for Software Developers

Periodic Reporting for period 2 - DEVINTA (An Artificial Assistant for Software Developers)

Reporting period: 2021-08-01 to 2023-01-31

Software permeates every aspect of our lives, including transportation, communication, economy, and healthcare. While becoming more and more central to our lives, software systems are also becoming increasingly complex, with millions of components interacting to provide users with the expected features. While software complexity alone already poses several non-trivial challenges for software developers, these are even exacerbated by other contextual factors typical of the software industry, such as the low longevity of programming languages and frameworks used to create software. On top of that, the time to release software must be minimized as much as possible without, however, sacrificing the quality of the released product. This boils down to the need for increasing software developers’ productivity, maximizing the quality of the code they write.

The DEVINTA project aims at introducing models and techniques serving as the basis for the next generation of recommender systems supporting software developers. These recommenders are expected to support developers in comprehending unfamiliar code and in writing high-quality code faster, thus reducing the considerable costs of developing and maintaining complex software. In particular DEVINTA aims at providing support to developers in different phases of the software lifecycle, with three main challenges being tackled:

1. Support developers in program comprehension activities by translating a given code into a natural language text.

2. While the developer is implementing software, predicting the feature they are working on and suggest how to automatically complete it.

3. Provide support for online code review, meaning the ability to review in real time the code written by the developer, looking for possible bugs/suboptimal implementation choices.
*Automating the implementation of source code*
We exploited deep learning (DL) models to automatically recommend developers how to finalize an ongoing implementation task. DL models can be trained to "learn" how to deal with a specific task by looking at concrete examples (i.e. training set). We provide the DL model with millions of examples of source code written by developers. We showed that DL models can correctly guess the next few code tokens the developer is likely to write in ~70% of cases. When the prediction task becomes more complex (i.e. predicting dozens of tokens), the performance drops to ~30%, with the DL model still being able to generate quite complex code snippets. This work has been presented at the MSR'21 conference and in the TSE journal.

Given the positive results we achieved, we investigated the extent to which DL models tend to copy code from their training set when recommending code. Such a research question is particularly important considering the fact that most of DL-based code recommenders have been trained on the source code of open source repositories and it is unclear whether the code they generate should be considered as new or as derivative work, with possible implications on license infringements. We showed that ~10% to ~0.1% of the predictions generated by DL-based code recommenders are exact copies of instances in the training set, depending on the size of the predicted code, with long predictions being unlikely to be cloned. This work has been published in the MSR'22 conference.

*Automating code-related tasks*
There are several tasks revolving around software writing. In the DEVINTA project we target their (partial) automation with the goal of saving time to software developers. We are focusing on the usage of large pre-trained DL models. To better understand the idea behind these models, let's assume we are interested in training a DL model able to translate from Italian to English. The training would usually require to provide the model with several examples of Italian sentences translated to English. Creating such a training dataset requires manual effort. For this reason, these datasets are usually limited in size, with consequences on the model performance. The idea behind pre-training is to firstly "teach" the model basic features about the languages of interest without the need for a manually built dataset. For example, the model is provided as input with sentences of the language of interest (e.g. Italian sentences) having specific words masked and it is required to guess the masked words. Only after this pre-training, the model is "fine-tuned" to learn the specific task of interest (in our example, the translation task). Results from the natural language processing literature showed that the pre-training/fine-tuning procedure substantially boost the performance of DL models. We showed that such a finding holds also when training models for the automation of code-related tasks (e.g. bug-fixing). The work describing these publications has been published at the ICSE'21, ICSME'21, ICSE'22 conferences and in the TSE journal.

*Online code review*
Code review is the process of analyzing source code written by a teammate to judge whether it is of sufficient quality to be integrated into the software project. Recent studies provided evidence about the benefits of code review that, however, do not come for free. Our long-term goal is to reduce the cost of code reviewing by (partially) automating this time-consuming process. We presented the first approach in the literature taking as input a previously unseen code and recommending code changes as a reviewer would do. These findings are detailed in works presented at the ICSE'21 and ICSE'22 conferences.
Our work on automating the implementation of source code through DL models shed some light on the limitations of these models when dealing with complex predictions. Before DEVINTA, DL models were mostly evaluated in the scenario in which they were asked to predict a single code token or, at most, a single statement. We pushed the evaluation of these models, also systematically studying for the first time the extent to which they tend to copy from their training dataset when generating predictions. In the future, we will study strategies aimed at boosting the performance of DL-based code generators when dealing with complex coding scenarios.

Concerning the automation of code-related tasks, we showed how large pre-trained models can help in substantially boosting performance. We proposed novel approaches providing the most comprehensive support at date for specific tasks (e.g. log statements injection). The second part of the project will focus on integrating these approaches when possible, creating multi-task DL models that can reuse what they learned for a given task to also improve their performance in other tasks. Also, we plan to mostly invest in the support of program comprehension activities by pushing the boundaries of code summarization, namely the ability to describe in natural language a given piece of code.

DEVINTA, with our ICSE'21 paper on code review, started a research thread on the automation of non-trivial code review activities, such as the automated reporting of issues in code components via natural language sentences, as human reviewers would do. Since then, several research groups joined on this thread and we are confident that by the end of the project major steps ahead will be achieved in this direction. On our side, we will keep working on improving the automated support we can offer to developers during code review.
Schematic representation of our first approach to automate code review (ICSE'21)