*Automating the implementation of source code*
We exploited deep learning (DL) models to automatically recommend developers how to finalize an ongoing implementation task. DL models can be trained to "learn" how to deal with a specific task by looking at concrete examples (i.e. training set). We provide the DL model with millions of examples of source code written by developers. We showed that DL models can correctly guess the next few code tokens the developer is likely to write in ~70% of cases. When the prediction task becomes more complex (i.e. predicting dozens of tokens), the performance drops to ~30%, with the DL model still being able to generate quite complex code snippets. This work has been presented at the MSR'21 conference and in the TSE journal.
Given the positive results we achieved, we investigated the extent to which DL models tend to copy code from their training set when recommending code. Such a research question is particularly important considering the fact that most of DL-based code recommenders have been trained on the source code of open source repositories and it is unclear whether the code they generate should be considered as new or as derivative work, with possible implications on license infringements. We showed that ~10% to ~0.1% of the predictions generated by DL-based code recommenders are exact copies of instances in the training set (MSR'22 conference).
We also showed the importance of building a high-quality training set for DL models targeting code generation: Feeding low-quality code, even in small percentages, results in a major increase of low-quality code produced by the model (ICPC'25 conference).
*Automating code-related tasks*
There are several tasks revolving around software writing. In DEVINTA we targeted their (partial) automation with the goal of saving time to software developers. We focused on the usage of large pre-trained DL models. To better understand the idea behind these models, let's assume we are interested in training a DL model able to translate from Italian to English. The training would usually require to provide the model with several examples of Italian sentences translated to English. Creating such a training dataset requires manual effort. For this reason, these datasets are usually limited in size, with consequences on the model performance. The idea behind pre-training is to firstly "teach" the model basic features about the languages of interest without the need for a manually built dataset. For example, the model is provided as input with sentences of the language of interest (e.g. Italian sentences) having specific words masked and it is required to guess the masked words. Only after pre-training, the model is "fine-tuned" to learn the specific task of interest (in our example, the translation task). We showed that pre-training substantially boost performance in the automation of several code-related tasks (e.g. bug-fixing), with results published at the ICSE'21, ICSME'21, ICSE'22, ICSE'23, ICSE'24, and ICPC'24 conferences and in the TSE, JSS, and EMSE journals. We also documented 45 tasks which developers automate via DL, presenting our findings at MSR'24 and receiving a Distinguished Paper Award.
*Online code review*
Code review is the process of analyzing source code written by a teammate to judge whether it is of sufficient quality to be integrated into the software project. We presented the first approach in the literature taking as input a previously unseen code and recommending code changes as a reviewer would do. These findings are detailed in works presented at the ICSE'21 and ICSE'22 conferences, and have been quite impactful, resulting in many follow-up works on the same topic. Finally, we run a controlled experiment with developers to assess the extent to which AI-based code review actually helps them in finding more quality issues. We found out that, while the AI is able to find quality issues, it also impacts the developers' behavior: Developers experience a tunnel-vision effect, focusing only on parts of the code commented by the AI and missing quality issues in other parts of the code (ICSE'25 conference).