A Neuroscientific Foundation of Program Comprehension

Project Information

Brains On Code

Grant agreement ID: 101052182

DOI

10.3030/101052182

EC signature date 24 May 2022

Start date 1 October 2022

End date 30 September 2027

Funded under

European Research Council (ERC)

Total cost

€ 2 499 023,00

EU contribution

€ 2 499 023,00

2 499 023,00

Coordinated by

UNIVERSITAT DES SAARLANDES
Germany

Periodic Reporting for period 1 - Brains On Code (A Neuroscientific Foundation of Program Comprehension)

Reporting period: 2022-10-01 to 2025-03-31

The pivotal role of software in our modern world mandates strong requirements on quality, correctness, and reliability of software systems. In software development and maintenance, the ability to understand program artifacts plays a key role for programmers to fulfill these requirements. Despite significant progress, research on program comprehension has a fundamental limitation: program comprehension is a cognitive process that cannot be directly observed, which leaves considerable room for misinterpretation, uncertainty, and confounders. In Brains On Code, we will develop a neuroscientific foundation of program comprehension. Instead of merely observing whether there is a difference regarding program comprehension (e.g. between two programming methods), we aim at precisely and reliably determining the key factors that cause the difference. This is especially challenging as humans are the subjects of study, and interpersonal variance and other confounding factors obfuscate the results. The key idea of Brains On Code is to leverage established methods from cognitive neuroscience to obtain insights into the underlying processes and influential factors of program comprehension. Brains On Code will pursue a multimodal approach that integrates different neurophysiological measures as well as a cognitive computational modeling approach to establish the theoretical foundation. This way, Brains On Code will lay the foundations of measuring and modeling program comprehension and offer substantial feedback for programming methodology, language design, and education. Addressing longstanding foundational questions such as “How can we reliably measure program comprehension?”, “What makes a program hard to understand?”, and “What skills should programmers have?” will get into reach. A success of Brains On Code would not only help answer these questions, but also provide an outline for applying the methodology beyond program code (models, specifications, etc.).

We have achieved significant progress in understanding how programmers comprehend code, using neurophysiological and behavioral methods such as cognitive modeling, neuroimaging, and eye tracking.

We developed the first comprehensive computational model simulating how programmers retrieve variable values during code comprehension. This model accurately predicts which parts of the code are harder to understand or error-prone, and a web-based demo is in progress to make these insights accessible for real-world use.

We conducted a series of EEG and fMRI studies that have revealed how expertise and cognitive strategies influence programming. For instance, skilled programmers process code with less cognitive effort. Interestingly, commonly used experience measures do not predict programmer efficacy well, but self-estimation and indicators of learning eagerness are fairly accurate.

In a series of eye-tracking studies, we explored how visual cues such as comments and type annotations affect code comprehension. While programmers perceive comments as helpful, their actual impact on understanding is context-dependent. Similarly, type annotations often do not enhance comprehension as much as expected.

Most notably, we have discovered parallels between processing code and natural language, finding that programmers’ brains respond to confusing code elements in ways similar to unexpected words in sentences, suggesting shared cognitive mechanisms.

Using large language models (LLMs), we demonstrated their potential to mirror human judgments about "fuzzy" aspects of code, like smells and clarity, where traditional tools struggle. This work also highlights how LLMs improve with contextual information that benefits humans.

Additionally, we refined existing techniques for neuroimaging and eye-tracking studies. Notably, we identified better baselines for program comprehension tasks and called for standardization in eye-tracking data analysis to ensure robust and comparable results.

These insights pave the way for better programming tools, education, and metrics, fostering a deeper understanding of how we think and work with code.

Our project has achieved several results that go beyond the state of the art, with the potential to significantly impact programming research, education, and industry practices.

One of the most notable advances is the development of an ACT-R-based cognitive model for simulating how programmers understand simple code snippets. This model achieves a high level of predictive accuracy, enabling precise identification of error-prone or cognitively demanding sections of code. This breakthrough paves the way for developing empirically validated metrics to measure code comprehensibility, which could be applied in industry to enhance software quality and maintainability. While further research and refinement are needed to scale its application to complex code and real-world programming environments, the current model already establishes a new benchmark for understanding cognitive processes in programming.

Another major advance is the adaptation of event-related and fixation-related potentials (ERP/FRP) from psycholinguistics to software engineering. This novel methodology revealed parallels between code comprehension and natural language processing, shedding light on how the brain handles confusing elements in code. These insights not only deepen our understanding of program comprehension but also demonstrate the potential of integrating neuroscience with software engineering. To ensure the continued uptake and success of this work, further studies are needed to expand its scope, refine the methodologies, and explore broader applications in education, hiring, and tool development. Supporting this effort will require ongoing collaboration, standardization of methodologies, and dissemination of findings to both academic and industry audiences.

Periodic Reporting for period 1 - Brains On Code (A Neuroscientific Foundation of Program Comprehension)

Share this page Share this page on social networks

Download PDF Download the content of the page