Recent events

Uncovering the Mechanics of Vision AI Model Failures: Textures and Beyond

Blaine Hoak University of Wisconsin-Madison
(hosted by Christof Paar)
16 Feb 2026, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB/1-84/90
CIS@MPG Colloquium
Artificial Intelligence (AI) models now serve as core components to a range of mature applications but remain vulnerable to a wide spectrum of attacks. Yet, the research community has yet to develop systematic understanding of model vulnerability. In this talk, I approach uncovering the mechanics of model failure from two complementary perspectives: the design of attack techniques and the features models exploit. First, I introduce The Space of Adversarial Strategies, a robustness evaluation framework constructed through a decomposition and reformulation of current attacks. ...
Artificial Intelligence (AI) models now serve as core components to a range of mature applications but remain vulnerable to a wide spectrum of attacks. Yet, the research community has yet to develop systematic understanding of model vulnerability. In this talk, I approach uncovering the mechanics of model failure from two complementary perspectives: the design of attack techniques and the features models exploit. First, I introduce The Space of Adversarial Strategies, a robustness evaluation framework constructed through a decomposition and reformulation of current attacks. With this, I isolate the components that drive attack success and provide insights for future defenses. Motivated by the widespread failure observed, I then turn to the feature space, where I uncover differences in visual processing and the human visual system that explain failures in AI systems. My work reveals that textures, or repeated patterns, are a core mechanism for driving model generalization, yet are also a primary source of vulnerability. I present new methodologies to quantify a model’s bias toward texture, uncover learned associations between textures and objects, and identify textures in images. With this, I find that up to 90% of failures can be explained by mismatches in texture information, highlighting texture as an important, yet overlooked, influence in model robustness. I conclude by outlining future work for addressing trustworthiness issues in both classification and generative settings, with particular attention to (mis)alignment between biological and artificial intelligence.
Read more

Building Private, Secure and Transparent Digital Identity at Scale

Harjasleen Malvai University of Illinois Urbana–Champaign
(hosted by Peter Schwabe)
12 Feb 2026, 10:00 am - 11:00 am
Bochum building MPI-SP, room .
CIS@MPG Colloquium
Digital identity controls access to many everyday essentials, from getting paid to accessing banking and benefits, and sits in the critical path of modern security. Even end-to-end encrypted messaging depends on authentic cryptographic identity, i.e., a trustworthy way to learn the cryptographic keys needed to encrypt messages to the right person. In practice, centralized identity providers become both a single point of failure and a single point of control: they decide what assertions are supported, and their compromise or coercion enables targeted attacks that are hard to detect. ...
Digital identity controls access to many everyday essentials, from getting paid to accessing banking and benefits, and sits in the critical path of modern security. Even end-to-end encrypted messaging depends on authentic cryptographic identity, i.e., a trustworthy way to learn the cryptographic keys needed to encrypt messages to the right person. In practice, centralized identity providers become both a single point of failure and a single point of control: they decide what assertions are supported, and their compromise or coercion enables targeted attacks that are hard to detect. Yet, many strong proposals assume new ecosystems or significant user effort – assumptions that don’t hold for the systems and users we have today.

My thesis is that it is possible to make identity infrastructure more private, secure, and transparent at scale while designing for existing user and ecosystem constraints. I’ll present two case studies across the identity lifecycle.

First, I’ll discuss key transparency for end-to-end encrypted messaging: how to make centralized key directories auditable so that even a compromised server cannot quietly swap keys for targeted users. I’ll show how this line of work evolved from formal privacy and history guarantees (SEEMless) to a billion-user architecture (Parakeet) built for real operational constraints (such as distributed storage and long time horizons), which now underpins the key transparency deployments in WhatsApp and Facebook Messenger.

Second, I’ll briefly describe credential bootstrapping with accountability (CanDID): a path to establishing privacy-preserving credentials from existing web sources without assuming a fully mature verifiable-credential ecosystem, while supporting practical requirements like revocation, recovery, and compliance checks.

I’ll close by highlighting ongoing work and open problems motivated by these systems and sketch a research agenda for building auditable and privacy-preserving infrastructure at internet scale, for identity and beyond.
Read more

Beyond Static Alignment: Advancing Trustworthy and Socially Intelligent AI Assistant

Jieyu Zhao University of Southern California
(hosted by Abhilasha Ravichander, Asia Biega)
11 Feb 2026, 3:00 pm - 4:00 pm
Virtual talk
CIS@MPG Colloquium
Large language models have transformed how we interact with technology, but most deployed systems remain reactive and rely on static, one-size-fits-all alignment, limiting trust in real-world, high-stakes settings. This talk explores a path toward personalized, trustworthy AI assistants that can reason, continually adapt, and align with user values while remaining safe and socially appropriate. I will introduce Computer-Using Agents that combine GUI operations and code generation to efficiently complete real-world tasks, and present CoAct-1, a multi-agent system that coordinates planning and execution. ...
Large language models have transformed how we interact with technology, but most deployed systems remain reactive and rely on static, one-size-fits-all alignment, limiting trust in real-world, high-stakes settings. This talk explores a path toward personalized, trustworthy AI assistants that can reason, continually adapt, and align with user values while remaining safe and socially appropriate. I will introduce Computer-Using Agents that combine GUI operations and code generation to efficiently complete real-world tasks, and present CoAct-1, a multi-agent system that coordinates planning and execution. I will then discuss SEA, a black-box auditing algorithm for uncovering LLM knowledge deficiencies and probing failure modes such as hallucination under limited query budgets. Next, I will present WildFeedback, a framework that learns in-situ user preferences from natural, multi-turn interactions, enabling continual personalization beyond lab-style preference data. Finally, I will highlight ongoing work on proactive social intelligence and culturally grounded evaluation, spanning intention understanding, reasoning consistency, and value-aligned collaboration. Together, these advances move us closer to AI systems that don’t just respond, but adapt responsibly and assist people in ways that are reliable, equitable, and context-aware.
Read more

Towards More Trustworthy and Efficient Systems

Hugo Lefeuvre University of British Columbia
(hosted by Gilles Barthe, Deepak Garg)
10 Feb 2026, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Software usages grow much faster than computing hardware. Paradoxically, modern software systems make inefficient use of computing resources: decades of feature creep have made them too generic to perform well on any specific task. These decades of growth have also made systems fragile -- and frankly insecure, glued together from countless components of diverse origins, critical or confidential, buggy, risky, AI-generated, or otherwise untrustworthy. This talk will take the audience on a journey at the intersection of systems and security. ...
Software usages grow much faster than computing hardware. Paradoxically, modern software systems make inefficient use of computing resources: decades of feature creep have made them too generic to perform well on any specific task. These decades of growth have also made systems fragile -- and frankly insecure, glued together from countless components of diverse origins, critical or confidential, buggy, risky, AI-generated, or otherwise untrustworthy. This talk will take the audience on a journey at the intersection of systems and security. I will give an overview of my past and present works applying isolation and specialization techniques to make systems more more trustworthy and more efficient (Unikraft, FlexOS, CHERIoT), demonstrating the shortcomings of these techniques and how to address them (CIVs, SoK), and getting these advances deployed to better the real world. I will conclude with a forward-looking perspective on my research and impact plans towards achieving this vision of more robust and efficient software systems.
Read more

Modern Challenges in Learning Theory

Nataly Brukhim Institute for Advanced Study and the Center for Discrete Mathematics and Theoretical Computer Science
(hosted by Derek Dreyer)
09 Feb 2026, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
CIS@MPG Colloquium
Machine learning relies on its ability to generalize from limited data, yet a principled theoretical understanding of generalization remains incomplete. While binary classification is well understood in the classical PAC framework, even its natural extension to multiclass learning is substantially more challenging. In this talk, I will present recent progress in multiclass learning that characterizes when generalization is possible and how much data is required, resolving a long-standing open problem on extending the Vapnik–Chervonenkis (VC) dimension beyond the binary setting. ...
Machine learning relies on its ability to generalize from limited data, yet a principled theoretical understanding of generalization remains incomplete. While binary classification is well understood in the classical PAC framework, even its natural extension to multiclass learning is substantially more challenging. In this talk, I will present recent progress in multiclass learning that characterizes when generalization is possible and how much data is required, resolving a long-standing open problem on extending the Vapnik–Chervonenkis (VC) dimension beyond the binary setting. I will then turn to complementary results on efficient learning via boosting. We extend boosting theory to multiclass classification, while maintaining computational and statistical efficiency even for unbounded label spaces. Lastly, I will discuss generalization in sequential learning settings, where a learner interacts with an environment over time. We introduce a new framework that subsumes classically studied settings (bandits and statistical queries) together with a combinatorial parameter that bounds the number of interactions required for learning.
Read more

Symmetry in Neural Network Parameter Spaces

Bo Zhao University of California, San Diego
(hosted by Bernt Schiele)
05 Feb 2026, 3:00 pm - 4:00 pm
Virtual talk
CIS@MPG Colloquium
In many neural networks, different parameter values can yield the same loss, often due to underlying symmetries in the parameter space. We introduce a general framework for continuous symmetries based on equivariance in activation functions, revealing a new set of nonlinear, data-dependent symmetries. Using these symmetries, we derive topological properties of the minima and identify conserved quantities in gradient flows. As a practical application of parameter space symmetries, we present an algorithm called symmetry teleportation, which leverages parameter symmetries to search loss level sets for points with desired properties. ...
In many neural networks, different parameter values can yield the same loss, often due to underlying symmetries in the parameter space. We introduce a general framework for continuous symmetries based on equivariance in activation functions, revealing a new set of nonlinear, data-dependent symmetries. Using these symmetries, we derive topological properties of the minima and identify conserved quantities in gradient flows. As a practical application of parameter space symmetries, we present an algorithm called symmetry teleportation, which leverages parameter symmetries to search loss level sets for points with desired properties. This approach leads to improvements in both convergence and generalization. We also discuss future work on foundational understanding of deep learning that reveals, characterizes, and leverages hidden structures to shape optimization and model behavior, enabling more efficient and interpretable AI systems.
Read more

The Skolem Problem: a century-old enigma at the heart of computation

Joël Ouaknine Max Planck Institute for Software Systems
04 Feb 2026, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
It has been described as the most important problem whose decidability is still open: the Skolem Problem asks how to determine algorithmically whether a given integer linear recurrence sequence (such as the Fibonacci numbers) has a zero term. This deceptively simple question arises across a wide range of topics in computer science and mathematics, from program verification and automata theory to number theory and logic. This talk traces the history of the Skolem Problem: from the early 1930s to the current frontier of one of the most enduring open questions in computer science.
It has been described as the most important problem whose decidability is still open: the Skolem Problem asks how to determine algorithmically whether a given integer linear recurrence sequence (such as the Fibonacci numbers) has a zero term. This deceptively simple question arises across a wide range of topics in computer science and mathematics, from program verification and automata theory to number theory and logic. This talk traces the history of the Skolem Problem: from the early 1930s to the current frontier of one of the most enduring open questions in computer science.

Toward Practical and Scalable Systems Evaluation for Post-Moore Datacenters

Hejing Li Max Planck Institute for Software Systems
02 Feb 2026, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Having a solid system evaluation under realistic workloads and environments is essential for datacenter network research. Modern datacenter systems are shaped by a wide range of factors - including hardware behavior, software components, and complex interactions between them - whose combined effects on end-to-end performance is often difficult to predict. However, evaluating such systems on physical testbeds is frequently infeasible due to scale, cost, limited experimental control, and the increasing reliance on specialized hardware that may be unavailable or still under development. ...
Having a solid system evaluation under realistic workloads and environments is essential for datacenter network research. Modern datacenter systems are shaped by a wide range of factors - including hardware behavior, software components, and complex interactions between them - whose combined effects on end-to-end performance is often difficult to predict. However, evaluating such systems on physical testbeds is frequently infeasible due to scale, cost, limited experimental control, and the increasing reliance on specialized hardware that may be unavailable or still under development. As a result, researchers often turn to simulation. Existing simulators, however, typically focus on isolated components, such as network protocols, host architectures, or hardware RTL, making it challenging to conduct faithful end-to-end evaluations or to scale experiments to realistic datacenter sizes. The goal of this thesis is to provide researchers with practical and scalable tools and methodologies for conducting faithful end-to-end system evaluation targeting modern datacenters. To this end, this thesis is structured around three components. First, it introduced SimBricks, an end-to-end simulation framework that enables the modular composition of best-of-breed simulators, allowing unmodified hardware and software system implementations to be evaluated together within a single virtual testbed. Second, it presented SplitSim, a simulation framework designed to make large-scale end-to-end evaluation practical by supporting mixed-.delity simulation, controlled decomposition, and efficient resource utilization. Finally, the thesis will include a set of case studies that apply SplitSim to the evaluation of large-scale networked systems, demonstrating a concrete evaluation workflow and distilling lessons on navigating trade-offs between fidelity, scalability, and simulation cost.
Read more

Towards a Privacy-First Future for Wearable AI

Shwetha Rajaram University of Michigan
(hosted by Carmela Troncoso)
02 Feb 2026, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB/1-84/90
CIS@MPG Colloquium
Wearable AI systems, such as display-free smartglasses and augmented reality (AR) devices, are emerging as a new paradigm of personal computing. However, they introduce novel privacy harms for both users and bystanders, relying on multimodal data that reveals people's identities, activities, and surroundings. In this talk, I demonstrate that by building wearable AI systems for adaptation – enabling users and systems to dynamically adjust interactions across contexts – we can mitigate privacy risks while supporting meaningful use. ...
Wearable AI systems, such as display-free smartglasses and augmented reality (AR) devices, are emerging as a new paradigm of personal computing. However, they introduce novel privacy harms for both users and bystanders, relying on multimodal data that reveals people's identities, activities, and surroundings. In this talk, I demonstrate that by building wearable AI systems for adaptation – enabling users and systems to dynamically adjust interactions across contexts – we can mitigate privacy risks while supporting meaningful use. First, through co-design studies with AR and security & privacy researchers, I establish a design space of alternatives to traditional interactions that reduce data exposure while maintaining functionality. To embed this privacy mindset in designers’ workflows, I develop interactive tools for critically analyzing risks in their own applications. Finally, I investigate how to automatically balance user experience and privacy for wearable AI users and bystanders, proposing an optimization framework for system-driven "negotiations" of sensing capabilities. I conclude by discussing open challenges and future research towards this vision of privacy-adaptive wearable AI, from technical mechanisms to implications for policy and regulation.
Read more

Pushing the Boundaries in Stateless Model Checking

Iason Marmanis Max Planck Institute for Software Systems
28 Jan 2026, 3:30 pm - 4:30 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Stateless model checking (SMC) verifies a concurrent program by systematically exploring its state space. To combat the state-space explosion problem, SMC is frequently combined with Dynamic Partial Order Reduction (DPOR), a technique that avoids exploring executions that are deemed equivalent to one another. Still, DPOR’s scalability is limited by the size of the input program.

This thesis improves scalability by (i) providing direct support for common coding patterns that would otherwise have to be handled inefficiently, ...
Stateless model checking (SMC) verifies a concurrent program by systematically exploring its state space. To combat the state-space explosion problem, SMC is frequently combined with Dynamic Partial Order Reduction (DPOR), a technique that avoids exploring executions that are deemed equivalent to one another. Still, DPOR’s scalability is limited by the size of the input program.

This thesis improves scalability by (i) providing direct support for common coding patterns that would otherwise have to be handled inefficiently, and (ii) combining DPOR with other state-space reduction techniques. Key to our contributions is a DPOR framework that generalizes an existing state-of-the-art algorithm.
Read more

Building Robotics Foundation Models with Reasoning in the Loop

Jiafei Duan University of Washington
(hosted by Christian Theobalt)
28 Jan 2026, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
CIS@MPG Colloquium
Recent advances in generative AI have demonstrated the power of scaling: large language and vision models trained on internet-scale data now exhibit remarkable capabilities in perception, generation, and reasoning. These successes have inspired growing interest in bringing foundation-model paradigms to robotics, with the goal of moving beyond task-specific autonomy in constrained environments toward general-purpose robots that can operate robustly in open-world settings. However, robotics fundamentally differs from language and vision. Robot learning cannot rely on passive internet data at scale, ...
Recent advances in generative AI have demonstrated the power of scaling: large language and vision models trained on internet-scale data now exhibit remarkable capabilities in perception, generation, and reasoning. These successes have inspired growing interest in bringing foundation-model paradigms to robotics, with the goal of moving beyond task-specific autonomy in constrained environments toward general-purpose robots that can operate robustly in open-world settings. However, robotics fundamentally differs from language and vision. Robot learning cannot rely on passive internet data at scale, and collecting large-scale, high-quality embodied interaction data remains expensive and slow. As a result, simply scaling data and model parameters is insufficient. To build general-purpose and robust robotics foundation models, we must instead ask: how can robots learn more from less data—and continue to improve over time? In this talk, I argue that reasoning in the loop offers a promising path forward. Rather than treating reasoning as a downstream capability applied after learning, I show how reasoning can be integrated directly into the learning process itself. This enables robots to learn from structured feedback, temporal context, and failure, thereby compensating for data scarcity and improving generalization. I will present a unified research agenda along three axes. First, I introduce approaches for spatial reasoning, enabling robots to ground language in 3D space and reason about object relationships for precise manipulation. Second, I discuss temporal reasoning, focusing on memory-centric models that retain, query, and reason over past observations to support long-horizon, high-precision control. Third, I show how reasoning over failures allows robots to understand why actions fail and use that understanding to self-improve, increasing robustness without additional supervision. Together, these results reframe robotics foundation models as systems that learn through reasoning, closing the loop between perception, action, and structured inference to enable self-improving autonomy.
Read more

On Predictive Accuracy and Fairness in Human-AI Teams

Nastaran Okati Max Planck Institute for Software Systems
26 Jan 2026, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Human-AI teams are increasingly deployed across high-stakes domains such as medical diagnosis, content moderation, and recruitment. These teams strive to leverage the strengths of both human decision-makers and AI models, while mitigating their respective weaknesses. For instance, such hybrid systems can combine the efficiency and quality of AI-generated decisions with human experience and domain knowledge. Moreover, they can be free of statistical biases in AI models and cognitive biases inherent to human decision-making. The combined system's performance can hence surpass that of the human or the AI model in isolation, ...
Human-AI teams are increasingly deployed across high-stakes domains such as medical diagnosis, content moderation, and recruitment. These teams strive to leverage the strengths of both human decision-makers and AI models, while mitigating their respective weaknesses. For instance, such hybrid systems can combine the efficiency and quality of AI-generated decisions with human experience and domain knowledge. Moreover, they can be free of statistical biases in AI models and cognitive biases inherent to human decision-making. The combined system's performance can hence surpass that of the human or the AI model in isolation, achieving human-AI complementarity.

Despite the prevalence of such hybrid teams, most existing approaches remain heuristic-driven, ignore human users and their error models, and hence fail to optimize the human-AI team performance. My thesis strives to close these gaps and fully unlock this potential for complementarity by (i) optimizing for human-AI collaboration, rather than model-centric performance—ensuring these models best support and complement human decision-makers who utilize them—and (ii) designing efficient post-processing algorithms that ensure fairness of high-stakes decisions made by these teams—thereby supporting people who are affected by such decisions.
Read more

Where are all the fixed points?

Benjamin Kaminski Fachrichtung Informatik - Saarbrücken
14 Jan 2026, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Fixed points are a recurring theme in computer science and related fields: shortest paths, game equilibria, semantics and verification, social choice theory, or dynamical systems are only a few of many instances. Various fixed point theorems - e.g. the famous Kleene fixed point theorem - state that fixed points emerge as limits of suitably seeded fixed point iterations. 

I will showcase in this talk a purely algebraic way of reasoning about fixed points which we call the algebra of iterative constructions (AIC). ...
Fixed points are a recurring theme in computer science and related fields: shortest paths, game equilibria, semantics and verification, social choice theory, or dynamical systems are only a few of many instances. Various fixed point theorems - e.g. the famous Kleene fixed point theorem - state that fixed points emerge as limits of suitably seeded fixed point iterations. 

I will showcase in this talk a purely algebraic way of reasoning about fixed points which we call the algebra of iterative constructions (AIC). It lead us to discover novel fixed point theorems as well as prove fully algebraically well-established ones (e.g. Kleene, Tarksi-Kantorovic, and k-induction). As for the novel fixed point theorems, we will (given a suitable setting) obtain a method that maps any point to two canonical corresponding fixed points of a function by way of a limit of some fixed point iteration.

Our algebra is mechanized in Isabelle/HOL. Isabelle's sledgehammer tool is able to find proofs of fixed point theorems fully automatically whereas sledgehammer is not able to find such proofs relying only on Isabelle’s standard libraries.

From the audience I would like to learn: Do you encounter fixed points or fixed point problems in your work? Do you perhaps work on algorithms for solving fixed point problems?

I look forward to talking to you!
Read more

LaissezCloud: A Resource Exchange Platform for the Public Cloud

Tejas Harith Max Planck Institute for Software Systems
16 Dec 2025, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Qualifying Exam
Resource and carbon efficiency in public clouds is poor and costs are high. This affects operators and tenants alike. We argue that the current rigid cloud pricing interface is to blame. Improving efficiency requires dynamic coordination between operator and tenants, but also among tenants. While comparatively easy for clusters where operator and applications belong to a single administrative domain, the cloud setting makes this challenging with mutually untrusted tenants and operators and the broad set of workloads and applications. ...
Resource and carbon efficiency in public clouds is poor and costs are high. This affects operators and tenants alike. We argue that the current rigid cloud pricing interface is to blame. Improving efficiency requires dynamic coordination between operator and tenants, but also among tenants. While comparatively easy for clusters where operator and applications belong to a single administrative domain, the cloud setting makes this challenging with mutually untrusted tenants and operators and the broad set of workloads and applications. We address this with LaissezCloud, a new cloud resource management platform that enables continuous resource re-negotiation and re- allocation. Rather than agreeing to a fixed price on resource allocation, LaissezCloud operators and tenants continuously re- negotiate resource prices during execution. Our key insight is that pricing provides a narrow waist that enables the cloud to align incentives between operator and tenants as well as among tenants. Tenants decide the price they are willing to pay based on the current value of a resource during execution. Operators in turn price in current demand as well as infrastructure concerns, such as current power availability, cooling capacity, or carbon intensity. We demonstrate that LaissezCloud scales to typical cloud infrastructure, that applications are easy to adapt to dynamic resource negotiation, and that LaissezCloud improves resource efficiency.
Read more

Collaborative Prediction via Tractable Agreement Protocols

Ira Globus-Harris Cornell University
(hosted by Manuel Gomez Rodriguez)
09 Dec 2025, 10:30 am
Kaiserslautern building G26, room 111
SWS Colloquium
Designing effective collaboration between humans and AI systems is crucial for leveraging their complementary abilities in complex decision tasks. But how should agents possessing unique knowledge—like a human expert and an AI model— interact to reach decisions better than either could alone? In this talk, I will introduce a collection of tools based in machine learning theory and algorithmic game theory which allow us to develop efficient "collaboration protocols", where parties iteratively exchange only low-dimensional information— their current predictions or best-response actions—without needing to share underlying features and which guarantee that the agents' final predictions are provably competitive with an optimal predictor with access to their joint features. ...
Designing effective collaboration between humans and AI systems is crucial for leveraging their complementary abilities in complex decision tasks. But how should agents possessing unique knowledge—like a human expert and an AI model— interact to reach decisions better than either could alone? In this talk, I will introduce a collection of tools based in machine learning theory and algorithmic game theory which allow us to develop efficient "collaboration protocols", where parties iteratively exchange only low-dimensional information— their current predictions or best-response actions—without needing to share underlying features and which guarantee that the agents' final predictions are provably competitive with an optimal predictor with access to their joint features. Together, these results offer a new foundation for building systems that achieve the power of pooled knowledge through tractable interaction alone.
Read more

Illuminating Generative AI: Mapping Knowledge in Large Language Models

Abhilasha Ravichander Max Planck Institute for Software Systems
03 Dec 2025, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Millions of everyday users are interacting with technologies built with generative AI, such as voice assistants, search engines, and chatbots. While these AI-based systems are being increasingly integrated into modern life, they can also magnify risks, inequities, and dissatisfaction when providers deploy unreliable systems. A primary obstacle to having more reliable systems is the opacity of the underlying large language models— we lack a systematic understanding of how models work, where critical vulnerabilities may arise, why they are happening, ...
Millions of everyday users are interacting with technologies built with generative AI, such as voice assistants, search engines, and chatbots. While these AI-based systems are being increasingly integrated into modern life, they can also magnify risks, inequities, and dissatisfaction when providers deploy unreliable systems. A primary obstacle to having more reliable systems is the opacity of the underlying large language models— we lack a systematic understanding of how models work, where critical vulnerabilities may arise, why they are happening, and how models must be redesigned to address them. In this talk, I will first describe my work in investigating large language models to illuminate when models acquire knowledge and capabilities. Then, I will describe my work on building methods to enable data transparency for large language models, that allows practitioners to make sense of the information available to models. Finally, I will describe work on understanding why large language models produce incorrect knowledge, and implications for building the next generation of responsible AI systems. 
Read more

Boosting — Empowering Citizens with Behavioral Science

Ralph Hertwig Max Planck Institute for Human Development
(hosted by Krishna Gummadi)
26 Nov 2025, 12:15 pm - 1:15 pm
Kaiserslautern building G26, room 111
AICS Distinguished Speaker Colloquium
Behavioral public policy came to the fore with the introduction of nudging, which aims to steer behavior while maintaining freedom of choice. Responding to critiques of nudging (e.g., that it does not promote agency and relies on benevolent choice architects), other behavioral policy approaches focus on empowering citizens. Here we review boosting, a behavioral policy approach that aims to foster people's agency, self-control, and ability to make informed decisions. It is grounded in evidence from behavioral science showing that human decision making is not as notoriously flawed as the nudging approach assumes. ...
Behavioral public policy came to the fore with the introduction of nudging, which aims to steer behavior while maintaining freedom of choice. Responding to critiques of nudging (e.g., that it does not promote agency and relies on benevolent choice architects), other behavioral policy approaches focus on empowering citizens. Here we review boosting, a behavioral policy approach that aims to foster people's agency, self-control, and ability to make informed decisions. It is grounded in evidence from behavioral science showing that human decision making is not as notoriously flawed as the nudging approach assumes. We argue that addressing the challenges of our time—such as climate change, pandemics, and the threats to liberal democracies and human autonomy posed by digital technologies and choice architectures—calls for fostering capable and engaged citizens as a first line of response to complement slower, systemic approaches. Boosts can be delivered through different means, one being digital tools — the talk will give a few illustrative examples.
Read more

Curriculum Design for Reinforcement Learning Agents

Georgios Tzannetos Max Planck Institute for Software Systems
24 Nov 2025, 2:30 pm - 3:30 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Reinforcement learning (RL) enables agents to learn complex behaviours and excel in various domains such as robotics, gaming, and large language models (LLMs). Despite these successes, RL algorithms remain inefficient, rendering the training process challenging and limiting their broader application in real-world settings. Motivated by the importance of curricula in pedagogical domains, there is a growing interest in leveraging curriculum strategies when training agents in challenging environments. However, existing methods for automatic curriculum design typically require domain-specific hyperparameter tuning, ...
Reinforcement learning (RL) enables agents to learn complex behaviours and excel in various domains such as robotics, gaming, and large language models (LLMs). Despite these successes, RL algorithms remain inefficient, rendering the training process challenging and limiting their broader application in real-world settings. Motivated by the importance of curricula in pedagogical domains, there is a growing interest in leveraging curriculum strategies when training agents in challenging environments. However, existing methods for automatic curriculum design typically require domain-specific hyperparameter tuning, rely on expensive optimization procedures, or have limited theoretical underpinnings. To address these limitations, we design different curriculum strategies grounded in the pedagogical concept of Zone of Proximal Development. The theoretical and empirical analysis across multiple domains affirms the effectiveness of our strategies. In particular, our strategies are shown to improve the training efficiency of agents under different learning objectives, including uniform performance, target performance, and constrained performance. Finally, addressing a real-world LLM deployment scenario, we show how our curriculum strategy improves the inference-time efficiency of LLMs by compressing models’ chain-of-thought reasoning process.
Read more

How to Manage a Hotel Desk? Stable Perfect Hashing in the Incremental Setting

Guy Even MPI-INF - D1
05 Nov 2025, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Many modern applications—from large-scale databases to network routers and genome repositories—depend on maintaining large dynamic sets of elements. Efficient management of these sets requires data structures that can quickly support insertions and deletions, answer queries such as "Is this element in the set?" or "What is the value associated with this element?", and assign distinct short keys to elements as the set grows.

The field of data structures is concerned with specifying functionality, abstracting computational models, ...
Many modern applications—from large-scale databases to network routers and genome repositories—depend on maintaining large dynamic sets of elements. Efficient management of these sets requires data structures that can quickly support insertions and deletions, answer queries such as "Is this element in the set?" or "What is the value associated with this element?", and assign distinct short keys to elements as the set grows.

The field of data structures is concerned with specifying functionality, abstracting computational models, designing efficient representations, and analyzing the running time and memory requirements of algorithms over these representations. Classical data structures developed for representing sets include dictionaries, retrieval data structures, filters, and perfect hashing.

In this talk, I will explore these issues through the lens of perfect hashing, a method for assigning each element a distinct identifier, or hashcode, with no collisions. We will focus on how to simultaneously satisfy several competing design goals:

Small space: using near-optimal memory proportional to the set’s size.

Fast operations: supporting constant-time insertions, deletions, and queries.

Low redundancy: keeping the range of hashcodes close to the set’s size.

Stability: ensuring that each element’s hashcode remains unchanged while it stays in the set.

Extendability: adapting automatically to unknown or growing data sizes.

This talk is based on joint work with Ioana Bercea.
Read more

Accountable Multi-Agent Sequential Decision Making

Stelios Triantafyllou Max Planck Institute for Software Systems
30 Oct 2025, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
As AI agents increasingly engage in high-stakes decision making, it is essential to assess their accountability in ways that are both fair and interpretable. This involves explaining expected or realized outcomes of multi-agent systems and attributing responsibility for those outcomes to the participating agents. Addressing these challenges is key to fostering societal trust and easing the adoption of AI decision makers. This thesis investigates accountability in multi-agent sequential decision making. We develop methods to attribute responsibility for observed outcomes and overall system performance, ...
As AI agents increasingly engage in high-stakes decision making, it is essential to assess their accountability in ways that are both fair and interpretable. This involves explaining expected or realized outcomes of multi-agent systems and attributing responsibility for those outcomes to the participating agents. Addressing these challenges is key to fostering societal trust and easing the adoption of AI decision makers. This thesis investigates accountability in multi-agent sequential decision making. We develop methods to attribute responsibility for observed outcomes and overall system performance, design efficient approximation algorithms for otherwise intractable attribution problems, and introduce causal tools to explain how agents’ decisions influence outcomes. Together, these contributions establish theoretical foundations and practical tools for accountable decision making, drawing on and integrating insights from causality, multi-agent reinforcement learning and game theory.
Read more

A Logical Foundation For Multi-Language Interoperability

Brigitte Pientka McGill University
(hosted by Derek Dreyer)
28 Oct 2025, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
SWS Colloquium
Today’s software systems are complex and often made up of parts written in different programming languages with different computational and memory management strategies. This allows programmers to combine different languages and choose the most suitable one for a given problem. It also allows the gradual migration of existing projects from one language to another, or to reuse existing source code.

While this flexibility offers clear advantages, it also introduces significant challenges, as different programming languages may have fundamentally different implementations and may use different runtime environments, ...
Today’s software systems are complex and often made up of parts written in different programming languages with different computational and memory management strategies. This allows programmers to combine different languages and choose the most suitable one for a given problem. It also allows the gradual migration of existing projects from one language to another, or to reuse existing source code.

While this flexibility offers clear advantages, it also introduces significant challenges, as different programming languages may have fundamentally different implementations and may use different runtime environments, which are hard to combine. As a consequence composing parts written in different languages often results in complex interfaces between languages, insufficient flexibility, poor performance, and hard to diagnose errors. This lack of interoperability support can lead to subtle bugs and security vulnerabilities, especially in large or long-lived systems.

Existing foundations for interoperability often assume that we compile languages into a common low-level language. This is however not always realistic. We propose a logical foundation for interoperability where we retain the static and operational semantics of each part. It is grounded in adjoint logic -- a logic that unifies a wide collection of logics through the up-shift and down-shift modalities. We give a Curry-Howard interpretation of this logic where we use the down-shift modality to model foreign function calls, and the up-shift modality to model runtime code generation and execution. Our system is parametric to a user-defined collection of languages and their accessibility relation, which controls how languages interact with each other, allowing the interoperability of various formations of languages. We sketch the statics and an operational semantics together with properties such as accessibility safety, which ensures that languages respect their user-defined boundaries, alongside type safety. Finally, it time permits, we outline how we have used this foundation to reason about the interoperability between languages with fundamentally different runtime implementations such as the interoperability between a quantum and a purely functional language.
Read more

Counterfactual Reasoning and Uncertainty Quantification for AI-Assisted Decision Making

Nina Corvelo Benz Max Planck Institute for Software Systems
15 Oct 2025, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Artificial intelligence (AI) systems are increasingly being used to support human experts in various domains such as healthcare, education, and the judicial system. The aim of these systems is complementarity—leveraging the strengths of each side, human and AI, to compensate for the weaknesses of the other. In most such systems, the human expert makes decisions based on a prediction by the AI model and their own judgment. However, models designed for automated decision making are typically trained in isolation and do not take into account the human decision maker when making predictions. ...
Artificial intelligence (AI) systems are increasingly being used to support human experts in various domains such as healthcare, education, and the judicial system. The aim of these systems is complementarity—leveraging the strengths of each side, human and AI, to compensate for the weaknesses of the other. In most such systems, the human expert makes decisions based on a prediction by the AI model and their own judgment. However, models designed for automated decision making are typically trained in isolation and do not take into account the human decision maker when making predictions. As a result, when these AI models are used in decision support systems, their predictions may not be helpful, undermining the human expert’s trust in the AI model and leading to no improvement in their decisions. To address this, this thesis focuses on the design of AI-based decision support systems that leverage the interaction with the expert through counterfactual reasoning and uncertainty quantification. It proposes decision support systems for three distinct decision-making contexts, where each one is based on a novel methodological approach and is evaluated with experiments using real-world data or a human subject study.
Read more

Can machine learning revolutionize biomarker discovery?

Karsten Borgwardt Max-Planck-Institut für Biochemie
(hosted by Manuel Gomez Rodriguez)
15 Oct 2025, 12:15 pm - 1:15 pm
Kaiserslautern building G26, room 111
AICS Distinguished Speaker Colloquium
Machine learning has transformed many areas of science and technology, including the life sciences, most prominently through its breakthrough impact on protein structure prediction, recognized by the 2024 Nobel Prize in Chemistry. An open question, however, is whether machine learning can have a similarly profound impact on biomarker discovery, that is, the identification of biological properties that predict system functions or phenotypes. Biomarker discovery is a key topic for advancing biology and medicine. In this talk, ...
Machine learning has transformed many areas of science and technology, including the life sciences, most prominently through its breakthrough impact on protein structure prediction, recognized by the 2024 Nobel Prize in Chemistry. An open question, however, is whether machine learning can have a similarly profound impact on biomarker discovery, that is, the identification of biological properties that predict system functions or phenotypes. Biomarker discovery is a key topic for advancing biology and medicine. In this talk, I will present our efforts to harness machine learning for biomarker discovery, summarize our algorithmic contributions, and discuss the opportunities and challenges in this field.
Read more

From Exploits to Defenses: Building Trustworthy Digital Systems

Thorsten Holz Max Planck Institute for Security and Privacy
(hosted by Krishna Gummadi)
17 Sep 2025, 12:15 pm - 1:15 pm
Kaiserslautern building G26, room 111
AICS Distinguished Speaker Colloquium
Building trustworthy software systems has become increasingly challenging as complexity grows across the hardware-software stack. Adversaries exploit sophisticated techniques such as return-oriented programming and timing side channels to bypass traditional defenses and compromise critical components. This talk examines these classes of low-level attacks and presents defenses we have developed, including control-flow integrity mechanisms and memory tagging. I will further discuss how automated approaches such as fuzzing can help us to systematically expose latent vulnerabilities and strengthen the design of security-critical systems, ...
Building trustworthy software systems has become increasingly challenging as complexity grows across the hardware-software stack. Adversaries exploit sophisticated techniques such as return-oriented programming and timing side channels to bypass traditional defenses and compromise critical components. This talk examines these classes of low-level attacks and presents defenses we have developed, including control-flow integrity mechanisms and memory tagging. I will further discuss how automated approaches such as fuzzing can help us to systematically expose latent vulnerabilities and strengthen the design of security-critical systems, aiming for resilience against both current and emerging threats. I will conclude with an overview of future challenges.
Read more

The fine-grained complexity of NFA intersection emptiness

Neha Rino University of Warwick
12 Sep 2025, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
SWS Colloquium
Given some integer k, intersection emptiness of k Nondeterministic Finite Automata (NFA k-IE) is a fundamental problem in automata theory, with applications across Computer science from Arithmetic theories and model checking to graph database queries. In this talk, I will discuss some results regarding the fine-grained complexity of NFA k-IE. Informally, what I mean by fine-grained complexity is that we want to know (upto logarithmic factors) the runtime complexity of our algorithms, and argue why they cannot be improved by relating it to the state of the art in solving long-standing hard problems like SAT. ...
Given some integer k, intersection emptiness of k Nondeterministic Finite Automata (NFA k-IE) is a fundamental problem in automata theory, with applications across Computer science from Arithmetic theories and model checking to graph database queries. In this talk, I will discuss some results regarding the fine-grained complexity of NFA k-IE. Informally, what I mean by fine-grained complexity is that we want to know (upto logarithmic factors) the runtime complexity of our algorithms, and argue why they cannot be improved by relating it to the state of the art in solving long-standing hard problems like SAT.

I will demonstrate (what we believe to be) a new algorithm for solving NFA k-IE. If all the NFAs have n states and m transitions, our algorithm runs in time O(n^{k-1}m), compared to the O(m^k) runtime of the classic Cartesian product approach. I will also present a matching lower bound subject to the Combinatorial k-Clique hypothesis, and a barrier to tight SETH-based lower bounds. This is joint work with Dmitry Chistikov, at the University of Warwick, UK.
Read more

Permissive Assumptions in Logical Controller Synthesis for Cyber-Physical Systems

Satya Prakash Nayak Max Planck Institute for Software Systems
09 Sep 2025, 1:00 pm - 2:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
The automatic construction of correct-by-design systems has emerged as a central challenge in cyber-physical systems (CPS). As CPS combine discrete logical decision-making with continuous physical processes, their correctness requires reasoning both about logical specifications and real-time physical behavior. A common approach is to abstract the physical dynamics into a discrete plant model and synthesize a logical controller for this plant that ensures the specification. Such approaches often rely on a set of assumptions about how the system interacts with its environment. ...
The automatic construction of correct-by-design systems has emerged as a central challenge in cyber-physical systems (CPS). As CPS combine discrete logical decision-making with continuous physical processes, their correctness requires reasoning both about logical specifications and real-time physical behavior. A common approach is to abstract the physical dynamics into a discrete plant model and synthesize a logical controller for this plant that ensures the specification. Such approaches often rely on a set of assumptions about how the system interacts with its environment. However, these assumptions are typically overly restrictive and fail to capture the full range of behaviors that the environment can exhibit, leading to conservative or inflexible controllers. This thesis addresses these limitations by proposing frameworks for synthesizing controllers under more permissive assumptions.

The first part of the thesis considers permissiveness in the interactions between multiple discrete logical components, where assumptions restrict the behavior of other components. We propose new methods to compute permissive assumptions that capture all cooperative behavior of other components. Building on this, we propose a negotiation-based approach to compute assume-guarantee contracts between components, allowing components to retain flexibility while ensuring correctness in a distributed setting.

The second part of the thesis addresses permissiveness in the interaction between high-level logic and low-level physical dynamics, where assumptions on the plant model restrict the behavior of the physical environment. We develop a new class of assumptions that capture richer behaviors of low-level controllers, enabling logical controllers to adapt seamlessly to changes by the external environment. To ensure scalability for large plant models, we propose a universal controller framework where controller decisions are conditioned on future branching-time properties of the plant model, learned from a small set of representative plant models. Finally, we introduce a robust semantics for branching-time temporal logics that allows reasoning under uncertainty or partial violations of assumptions without increasing computational complexity.
Read more

Algorithmic Problems for Linear Recurrence Sequences

Joris Nieuwveld Max Planck Institute for Software Systems
05 Sep 2025, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Linear recurrence sequences (LRS) are among the most fundamental and easily definable classes of number sequences, encompassing many classical sequences such as polynomials, powers of two, and the Fibonacci numbers. They also describe the dynamics of iterated linear maps and arise naturally in numerous contexts within computer science, mathematics, and other quantitive sciences. However, despite their simplicity, many easy-to-state decision problems for LRS have stubbornly remained open for decades despite considerable and sustained attention. Chief among these are the Skolem problem and the Positivity problem, ...
Linear recurrence sequences (LRS) are among the most fundamental and easily definable classes of number sequences, encompassing many classical sequences such as polynomials, powers of two, and the Fibonacci numbers. They also describe the dynamics of iterated linear maps and arise naturally in numerous contexts within computer science, mathematics, and other quantitive sciences. However, despite their simplicity, many easy-to-state decision problems for LRS have stubbornly remained open for decades despite considerable and sustained attention. Chief among these are the Skolem problem and the Positivity problem, which ask to determine, for a given LRS, whether it contains a zero term and whether it contains only positive terms, respectively. For both problems, decidability is currently open, i.e., whether they are algorithmically solvable.

In this thesis, we present the following results. For the Skolem problem, we introduce an algorithm for simple LRS whose correctness is unconditional but whose termination relies on two classical, widely-believed number-theoretic conjectures. This algorithm is implementable in practice, and we report on experimental results. For the Positivity problem, we introduce the notion of reversible LRS, which enables us to carve out a large decidable class of sequences. We also examine various expansions of classical logics by predicates obtained from LRS. In particular, we study expansions of monadic second-order logic of the natural numbers with order and present major advances over the seminal results of Büchi, Elgot, and Rabin from the early 1960s. Finally, we investigate fragments of Presburger arithmetic, where, among others, we establish the decidability of the existential fragment of Presburger arithmetic expanded with powers of 2 and 3.
Read more

Supporting Human-Human Communication: Towards a Proactive AI Paradigm

Christian Danescu-Niculescu-Mizil Cornell University
(hosted by Krishna Gummadi)
05 Sep 2025, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 607
AICS Distinguished Speaker Colloquium
Recent years have seen a gold rush towards replacing people with AI agents in communication: they can serve as your therapist, your tutor, your financial advisor, your interviewer. In this talk I will propose a contrasting vision: one where AI is used for supporting humans in their communication while preserving their agency. Achieving this vision requires moving beyond the current transactional paradigm embodied by current generative AI systems, which are designed to fulfill the immediate goals of a single person, ...
Recent years have seen a gold rush towards replacing people with AI agents in communication: they can serve as your therapist, your tutor, your financial advisor, your interviewer. In this talk I will propose a contrasting vision: one where AI is used for supporting humans in their communication while preserving their agency. Achieving this vision requires moving beyond the current transactional paradigm embodied by current generative AI systems, which are designed to fulfill the immediate goals of a single person, such as answering a question, solving a math problem, booking a flight, or (repeatedly) replying in character. To meaningfully support human-human communication without disrupting or supplanting it, an AI system must instead follow a proactive paradigm: it needs to decide when to intervene to offer support as the interaction unfolds, rather than wait to explicitly be prompted as AI agents and chatbots do today. In this talk I will present initial progress on AI technologies that enable such a proactive mode of operation, and demonstrate communication support tools that embody it.
Read more

Strategic and counterfactual reasoning in AI-assisted decision making

Efstratios Tsirtsis Max Planck Institute for Software Systems
19 Aug 2025, 2:30 pm - 3:30 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
From finance and healthcare to criminal justice and transportation, many domains that involve high-stakes decisions, traditionally made by humans, are increasingly integrating artificial intelligence (AI) systems into their decision making pipelines. While recent advances in machine learning and optimization have given rise to AI systems with unprecedented capabilities, fully automating such decisions is often undesirable. Instead, a promising direction lies in AI-assisted decision making, where AI informs or complements human decisions without completely removing human oversight. ...
From finance and healthcare to criminal justice and transportation, many domains that involve high-stakes decisions, traditionally made by humans, are increasingly integrating artificial intelligence (AI) systems into their decision making pipelines. While recent advances in machine learning and optimization have given rise to AI systems with unprecedented capabilities, fully automating such decisions is often undesirable. Instead, a promising direction lies in AI-assisted decision making, where AI informs or complements human decisions without completely removing human oversight. In this talk, I will present my PhD work on AI-assisted decision making in settings where humans rely on two core cognitive capabilities: strategic reasoning and counterfactual reasoning. First, I will introduce game-theoretic methods for supporting policy design in strategic environments, enabling a decision maker to allocate resources (e.g., loans) to individuals who adapt their behavior in response to transparency regarding the decision policy. Next, I will present methods to enhance a decision maker’s counterfactual reasoning process— identifying key past decisions (e.g., in clinical treatments) which, if changed, could have improved outcomes and, hence, serve as valuable learning signals. Finally, I will discuss a computational model of how people attribute responsibility between humans and AI systems in collaborative settings, such as semi-autonomous driving, evaluated through a human subject study. I will conclude with key takeaways and future directions for designing AI systems that effectively support and interact with humans.
Read more