An Artificial Assistant for Software Developers

Informations projet

DEVINTA

N° de convention de subvention: 851720

Site Web du projet

DOI

10.3030/851720

Projet clôturé

Date de signature de la CE 6 Septembre 2019

Date de début 1 Février 2020

Date de fin 31 Juillet 2025

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 499 093,00

Contribution de l’UE

€ 1 499 093,00

1 499 093,00

Coordonné par

UNIVERSITA DELLA SVIZZERA ITALIANA
Switzerland

Periodic Reporting for period 4 - DEVINTA (An Artificial Assistant for Software Developers)

Période du rapport: 2024-08-01 au 2025-07-31

Software permeates every aspect of our lives, including transportation, communication, economy, and healthcare. While becoming more and more central to our lives, software systems are also becoming increasingly complex, with millions of components interacting to provide users with the expected features. While software complexity alone already poses several non-trivial challenges for software developers, these are even exacerbated by other contextual factors typical of the software industry, such as the low longevity of programming languages and frameworks used to create software. On top of that, the time to release software must be minimized as much as possible without, however, sacrificing the quality of the released product. This boils down to the need for increasing software developers’ productivity, maximizing the quality of the code they write.

The DEVINTA project aims at introducing models and techniques serving as the basis for the next generation of recommender systems supporting software developers. These recommenders are expected to support developers in comprehending unfamiliar code and in writing high-quality code faster, thus reducing the considerable costs of developing and maintaining complex software. In particular DEVINTA aims at providing support to developers in different phases of the software lifecycle, with three main challenges being tackled:

1. Support developers in program comprehension activities by translating a given code into a natural language text.

2. While the developer is implementing software, predicting the feature they are working on and suggest how to automatically complete it.

3. Provide support for online code review, meaning the ability to review in real time the code written by the developer, looking for possible quality issues.

*Automating the implementation of source code*
We exploited deep learning (DL) models to automatically recommend developers how to finalize an ongoing implementation task. DL models can be trained to "learn" how to deal with a specific task by looking at concrete examples (i.e. training set). We provide the DL model with millions of examples of source code written by developers. We showed that DL models can correctly guess the next few code tokens the developer is likely to write in ~70% of cases. When the prediction task becomes more complex (i.e. predicting dozens of tokens), the performance drops to ~30%, with the DL model still being able to generate quite complex code snippets. This work has been presented at the MSR'21 conference and in the TSE journal.

Given the positive results we achieved, we investigated the extent to which DL models tend to copy code from their training set when recommending code. Such a research question is particularly important considering the fact that most of DL-based code recommenders have been trained on the source code of open source repositories and it is unclear whether the code they generate should be considered as new or as derivative work, with possible implications on license infringements. We showed that ~10% to ~0.1% of the predictions generated by DL-based code recommenders are exact copies of instances in the training set (MSR'22 conference).

We also showed the importance of building a high-quality training set for DL models targeting code generation: Feeding low-quality code, even in small percentages, results in a major increase of low-quality code produced by the model (ICPC'25 conference).

*Automating code-related tasks*
There are several tasks revolving around software writing. In DEVINTA we targeted their (partial) automation with the goal of saving time to software developers. We focused on the usage of large pre-trained DL models. To better understand the idea behind these models, let's assume we are interested in training a DL model able to translate from Italian to English. The training would usually require to provide the model with several examples of Italian sentences translated to English. Creating such a training dataset requires manual effort. For this reason, these datasets are usually limited in size, with consequences on the model performance. The idea behind pre-training is to firstly "teach" the model basic features about the languages of interest without the need for a manually built dataset. For example, the model is provided as input with sentences of the language of interest (e.g. Italian sentences) having specific words masked and it is required to guess the masked words. Only after pre-training, the model is "fine-tuned" to learn the specific task of interest (in our example, the translation task). We showed that pre-training substantially boost performance in the automation of several code-related tasks (e.g. bug-fixing), with results published at the ICSE'21, ICSME'21, ICSE'22, ICSE'23, ICSE'24, and ICPC'24 conferences and in the TSE, JSS, and EMSE journals. We also documented 45 tasks which developers automate via DL, presenting our findings at MSR'24 and receiving a Distinguished Paper Award.

*Online code review*
Code review is the process of analyzing source code written by a teammate to judge whether it is of sufficient quality to be integrated into the software project. We presented the first approach in the literature taking as input a previously unseen code and recommending code changes as a reviewer would do. These findings are detailed in works presented at the ICSE'21 and ICSE'22 conferences, and have been quite impactful, resulting in many follow-up works on the same topic. Finally, we run a controlled experiment with developers to assess the extent to which AI-based code review actually helps them in finding more quality issues. We found out that, while the AI is able to find quality issues, it also impacts the developers' behavior: Developers experience a tunnel-vision effect, focusing only on parts of the code commented by the AI and missing quality issues in other parts of the code (ICSE'25 conference).

Our work on automating the implementation of source code through DL models shed some light on the limitations of these models when dealing with complex predictions. Before DEVINTA, DL models were mostly evaluated in the scenario in which they were asked to predict a single code token or, at most, a single statement. We pushed the evaluation of these models, also systematically studying for the first time the extent to which they tend to copy from their training dataset when generating predictions. We also provided a long-term research agenda for code recommenders, thanks to findings from our empirical studies.

Concerning the automation of code-related tasks, we showed how large pre-trained models can help in substantially boosting performance across a variety of tasks.

DEVINTA, with our ICSE'21 and ICSE'22 papers on code review, started a research thread on the automation of non-trivial code review activities, such as the automated reporting of issues in code components via natural language sentences, as human reviewers would do. Since then, several research groups joined on this thread and major steps in this direction have been done.

The final major contribution of DEVINTA has been one of the very first studies showing changes in developers’ behavior resulting from the usage of an AI assistant (see ICSE'25 paper on the impact of AI-based code review).

Schematic representation of our first approach to automate code review (ICSE'21)

Periodic Reporting for period 4 - DEVINTA (An Artificial Assistant for Software Developers)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page