News 2025

AI, Computing and Society

When AI and Humans Stumble Over Program Code

New study shows that humans and large language models respond surprisingly similarly to confusing program code

Researchers from Saarland University and the Max Planck Institute for Software Systems have, for the first time, shown that the reactions of humans and large language models (LLMs) to complex or misleading program code significantly align, by comparing brain activity of study participants with model uncertainty. Building on this, the team developed a data-driven method to automatically detect such confusing areas in code — a promising step toward better AI assistants for software development. ...
New study shows that humans and large language models respond surprisingly similarly to confusing program code

Researchers from Saarland University and the Max Planck Institute for Software Systems have, for the first time, shown that the reactions of humans and large language models (LLMs) to complex or misleading program code significantly align, by comparing brain activity of study participants with model uncertainty. Building on this, the team developed a data-driven method to automatically detect such confusing areas in code — a promising step toward better AI assistants for software development.

The team led by Sven Apel, Professor of Software Engineering at Saarland University and Dr. Mariya Toneva, a faculty member at the Max Planck Institute for Software Systems and head of the research group Bridging AI and Neuroscience, investigated how humans and large language models respond to confusing program code. The characteristics of such code, known as atoms of confusion, are well studied: They are short, syntactically correct programming patterns that are misleading for humans and can throw even experienced developers off track.

To find out whether LLMs and humans “think” about the same stumbling blocks, the research team used an interdisciplinary approach: On the one hand, they used data from an earlier study by Apel and colleagues, in which participants read confusing and clean code variants while their brain activity and attention were measured using electroencephalography (EEG) and eye tracking. On the other hand, they analyzed the “confusion” or model uncertainty of LLMs using so-called perplexity values. Perplexity is an established metric for evaluating language models by quantifying their uncertainty in predicting sequences of text tokens based on their probability.

The result: Wherever humans got stuck on code, the LLM also showed increased perplexity. EEG signals from participants—especially the so-called late frontal positivity, which in language research is associated with unexpected sentence endings—rose precisely where the language model’s uncertainty spiked. “We were astounded that the peaks in brain activity and model uncertainty showed significant correlations,” says Youssef Abdelsalam, who was advised by Toneva and Apel and was instrumental in conducting the study as part of his doctoral studies.

Based on this similarity, the researchers developed a data-driven method that automatically detects and highlights unclear parts of code. In more than 60 percent of cases, the algorithm successfully identified known, manually annotated confusing patterns in the test code and even discovered more than 150 new, previously unrecognized patterns that also coincided with increased brain activity.

“With this work, we are taking a step toward a better understanding of the alignment between humans and machines,” says Max Planck researcher Mariya Toneva. “If we know when and why LLMs and humans stumble in the same places, we can develop tools that make code more understandable and significantly improve human–AI collaboration,” adds Professor Sven Apel.

Through their project, the researchers are building a bridge between neuroscience, software engineering, and artificial intelligence. The study, currently published as a preprint, was accepted for publication at the International Conference on Software Engineering (ICSE), one of the world’s leading conferences in the field of software development. The conference will take place in Rio de Janeiro in April 2026. The authors of the study are: Youssef Abdelsalam, Norman Peitek, Anna-Maria Maurer, Mariya Toneva, and Sven Apel.
Read more

Abhilasha Ravichander joins MPI-SWS as tenure-track faculty

September 2025
Abhilasha Ravichander will be joining MPI-SWS as tenure-track faculty. Abhilasha's research focuses on developing a scientific understanding of how AI models work, in order to improve their reliability and performance. She is actively seeking motivated students to join her team.

Prior to joining MPI, Abhilasha was a postdoctoral scholar at the University of Washington and the Allen Institute for Artificial Intelligence. She received her PhD from Carnegie Mellon University in 2022. ...
Abhilasha Ravichander will be joining MPI-SWS as tenure-track faculty. Abhilasha's research focuses on developing a scientific understanding of how AI models work, in order to improve their reliability and performance. She is actively seeking motivated students to join her team.

Prior to joining MPI, Abhilasha was a postdoctoral scholar at the University of Washington and the Allen Institute for Artificial Intelligence. She received her PhD from Carnegie Mellon University in 2022. Abhilasha’s work has been presented at several top NLP conferences, receiving Outstanding Paper Award at ACL 2025, Best Resource Paper Award at ACL 2024, Best Theme Paper Award at ACL 2024, and Area Chair Favorite Paper Award at COLING 2018. She has been recognized as a "Rising Star in Generative AI" (2024), "Rising Star in EECS" (2022), and "Rising Star in Data Science" (2021).
Read more

Mariya Toneva awarded ERC Starting Grant

September 2025
Mariya Toneva, head of the Bridging AI and Neuroscience group at MPI-SWS, has been awarded a 2025 ERC Starting Grant. Over the next five years, her project BrainAlign will receive funding of nearly 1.5 million euros for research on "brain-aligned language models for long-range language understanding and neuroscientific insight." Read more about the BrainAlign project below.

In addition, former MPI-SWS postdoctoral fellow Jiarui Gan, who is currently a lecturer at Oxford, has also received a 2025 ERC Starting Grant for his project "Algorithms of Stochastic Principal-Agent Coordination". ...
Mariya Toneva, head of the Bridging AI and Neuroscience group at MPI-SWS, has been awarded a 2025 ERC Starting Grant. Over the next five years, her project BrainAlign will receive funding of nearly 1.5 million euros for research on "brain-aligned language models for long-range language understanding and neuroscientific insight." Read more about the BrainAlign project below.

In addition, former MPI-SWS postdoctoral fellow Jiarui Gan, who is currently a lecturer at Oxford, has also received a 2025 ERC Starting Grant for his project "Algorithms of Stochastic Principal-Agent Coordination".

ERC grants are the most prestigious and the most competitive European-level awards for ground-breaking scientific investigations. This year, less than 13% of all ERC Starting Grant applicants across all scientific disciplines received the award, with only 24 awardees in Computer Science across all of Europe and Israel!

These grants carry substantial research funding -- each winner receives up to 1.5 Million Euros over a period of 5 years to carry out their research. You can find more information about the 2025 ERC Starting Grants here: https://erc.europa.eu/news-events/news/starting-grants-2025-call-results

The BrainAlign Project

The BrainAlign project aims to revolutionize next-generation artificial intelligence (AI) models by aligning them closely with the way the human brain understands language. While AI systems for language understanding and generation have undergone much progress in recent years thanks to language models, these systems still face significant challenges, such as understanding human intent. Moreover, the successes have mostly stemmed from tremendous increases in model size, and continuing this trend demands unrealistic amounts of data, compute power, and energy.

One way forward is to look to the only system we trust to truly understand complex language: the human brain. Insights from brain functions have long inspired AI, but these insights took years to consolidate and even longer to transfer to AI. For brain functions that are uniquely human, such as understanding complex natural language, the lack of a suitable animal model organisms limits the mechanistic insights that can be applied to AI.

The BrainAlign project presents a novel, data-driven solution that will develop brain-aligned language models by forcing their internal processing to closely reflect information sampled directly from the human brain, as humans read and listen to large amounts of every-day language. By integrating machine learning techniques with human neuroimaging and behavioral data from novel experimental paradigms, BrainAlign will develop next-generation models with a deeper, human-like understanding of language. Additionally, innovative interpretability methods will allow these models to serve as model organisms, revealing mechanisms that mirror human brain processing of language and massively enhancing our scientific knowledge of language in the brain.

 
Read more