Skip to main content
An official website of the European UnionAn official EU website
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS
Learning from Big Code: Probabilistic Models, Analysis and Synthesis

Article Category

Article available in the following languages:

Machine learning to make programming more reliable, efficient and secure

Machine learning models trained on computer code could help solve some of the problems that have long plagued the software sector.

Every year, billions of euro are spent fixing software defects, many of which are the result of defective code. What is needed is more reliable, efficient and secure software. The challenge, however, is that creating such software is difficult – if not impossible – using traditional techniques. This is where the EU-funded BIGCODE project comes in. Combining programming languages and machine learning, the project is building new kinds of programming engines capable of providing statistically likely solutions to otherwise impossible problems. “The project created new kinds of machine learning models that can be trained on computer code,” says Martin Vechev(opens in new window), head of the Secure, Reliable, and Intelligent Systems Lab(opens in new window) (SRI) at the Swiss Federal Institute of Technology Zurich(opens in new window) (ETH Zurich) and BIGCODE project coordinator. “The goal is to use these models to solve various important tasks, such as automatically generating new code, detecting security violations in existing code, and translating between programming languages – to name only a few.”

New types of machine learning models

The key outcome of the project, which was supported by the European Research Council(opens in new window), was the development of new types of foundational machine learning models. “What makes these models so unique is that they can be trained on code and can solve challenges that were previously impossible,” explains Vechev. “Furthermore, our models are interpretable, meaning the decisions they make can be both understood and inspected by humans.” While developing machine learning models, researchers realised that computer programmes have complex, formal semantics. To extract these semantics, successful training on code must be able to integrate symbolic methods into the process. According to Vechev, this goes well beyond the methods used to train natural language- or image-based models. “As such, our advancements have turned out to have conceptual value beyond code, for example, in noisy programme synthesis, natural language processing, and discovering new kinds of formal grammars,” notes Vechev. “In a sense, working in the code domain has been fruitful in terms of developing new methods that can be easily generalised.”

More reliable and secure software

The models developed during the BIGCODE project, which Vechev calls ‘AI for Code’, are already being used by hundreds of thousands of coders, resulting in more reliable and secure software. Based on the work done during the project, researchers also launched a successful start-up, DeepCode(opens in new window). The start-up has since been acquired by Snyk(opens in new window), a major cybersecurity company that has integrated DeepCode into many of its operations. “The BIGCODE project showed that it is possible to not only advance our knowledge and help shape new areas of research, but also use these advancements to create successful deep-tech companies that have tremendous economic value,” concludes Vechev.

Discover other articles in the same domain of application