Recent events

Foundation models of human cognition

Marcel Binz University of Marburg (hosted by Yixin Zou)
13 Feb 2025, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB1SMMW106
CIS@MPG Colloquium
Most cognitive models are domain-specific, meaning that their scope is restricted to a single type of problem. The human mind, on the other hand, does not work like this -- it is a unified system whose processes are deeply intertwined. In this talk, I will present my ongoing work on foundation models of human cognition: models that cannot only predict behavior in a single domain but that instead offer a truly universal take on our mind. ...
Most cognitive models are domain-specific, meaning that their scope is restricted to a single type of problem. The human mind, on the other hand, does not work like this -- it is a unified system whose processes are deeply intertwined. In this talk, I will present my ongoing work on foundation models of human cognition: models that cannot only predict behavior in a single domain but that instead offer a truly universal take on our mind. Furthermore, I outline my vision for how to use such behaviorally predictive models to advance our understanding of human cognition, as well as how they can be scaled to naturalistic environments.
Read more

Measuring and Improving Fairness and Resilience Across the Blockchain Ecosystem

Lioba Heimbach ETH Zurich (hosted by Krishna Gummadi)
12 Feb 2025, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
The multi-layer blockchain architecture presents unique challenges, as issues in one layer can amplify or cause problems in another. These layers include the network layer, a peer-to-peer (P2P) network responsible for information dissemination; the consensus layer, where nodes reach agreement on the blockchain’s state; and the application layer, which hosts decentralized finance (DeFi) applications. My research investigates the interactions between these layers and the resulting challenges, with the goal of enhancing fairness and resilience in the blockchain ecosystem. ...
The multi-layer blockchain architecture presents unique challenges, as issues in one layer can amplify or cause problems in another. These layers include the network layer, a peer-to-peer (P2P) network responsible for information dissemination; the consensus layer, where nodes reach agreement on the blockchain’s state; and the application layer, which hosts decentralized finance (DeFi) applications. My research investigates the interactions between these layers and the resulting challenges, with the goal of enhancing fairness and resilience in the blockchain ecosystem. In the first part of this talk, I will explore how financial value originating at the application layer can threaten the consensus layer, focusing on non-atomic arbitrage — arbitrage between on-chain and off-chain exchanges. I will demonstrate how this value, despite originating at the application layer, introduces centralizing forces and security vulnerabilities in the consensus layer. In the second part, I will show how these dynamics operate in the opposite direction. Specifically, I will highlight privacy flaws in Ethereum’s P2P network that threaten the consensus layer by enabling attacks targeting application layer value. As I will demonstrate, the P2P network leaks validator locations. This vulnerability allows malicious validators (i.e., consensus layer participants) to launch targeted attacks on validators handling blocks with significant application layer value and scoop the value from those blocks.
Read more

How Do We Evaluate and Mitigate AI Risks?

Maksym Andriushchenko EPFL (hosted by Christof Paar)
11 Feb 2025, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB1SMMW106
CIS@MPG Colloquium
AI has made remarkable progress in recent years, enabling groundbreaking applications but also raising serious safety concerns. This talk will explore the robustness challenges in deep learning and large language models (LLMs), demonstrating how seemingly minor perturbations can lead to critical failures. I will present my research on evaluating and mitigating AI risks, including adversarial robustness, LLM jailbreak vulnerabilities, and the broader implications of AI safety. By developing rigorous benchmarks, novel evaluation methods, and foundational theoretical insights, ...
AI has made remarkable progress in recent years, enabling groundbreaking applications but also raising serious safety concerns. This talk will explore the robustness challenges in deep learning and large language models (LLMs), demonstrating how seemingly minor perturbations can lead to critical failures. I will present my research on evaluating and mitigating AI risks, including adversarial robustness, LLM jailbreak vulnerabilities, and the broader implications of AI safety. By developing rigorous benchmarks, novel evaluation methods, and foundational theoretical insights, my work aims to provide effective safeguards for AI deployment. Ultimately, I advocate for a systematic approach to AI risk mitigation that integrates technical solutions with real-world considerations to ensure the safe and responsible use of AI systems.
Read more

Spatio-Temporal AI for Long-term Human-centric Robot Autonomy

Lukas Schmid Massachusetts Institute of Technology (hosted by Bernt Schiele)
10 Feb 2025, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
CIS@MPG Colloquium
The ability to build an actionable representation of the environment of a robot is crucial for autonomy and prerequisite to a large variety of applications, ranging from home, service, and consumer robots to social, care, and medical robotics to industrial, agriculture and disaster response applications. Notably, a large part of the promise of autonomous robots depends on long-term operation in domains shared with humans and other agents. These environments are typically highly complex, semantically rich, and highly dynamic with agents frequently moving through and interacting with the scene. ...
The ability to build an actionable representation of the environment of a robot is crucial for autonomy and prerequisite to a large variety of applications, ranging from home, service, and consumer robots to social, care, and medical robotics to industrial, agriculture and disaster response applications. Notably, a large part of the promise of autonomous robots depends on long-term operation in domains shared with humans and other agents. These environments are typically highly complex, semantically rich, and highly dynamic with agents frequently moving through and interacting with the scene. This talk presents an autonomy pipeline combining perception, prediction, and planning to address these challenges. We first present methods to detect and represent complex semantics, short-term motion, and long-term changes for real-time robot perception in a unified framework called Khronos. We then show how Dynamic Scene Graphs (DSGs) can represent semantic symbols in a task-driven fashion and facilitate reasoning about the scene, such as the prediction of likely future outcomes based on the data the robot has already collected. Lastly, we show how robots as embodied agents can leverage these actionable scene representations and predictions to complete tasks such as actively gathering data that helps them improve their world models, perception, and action capabilities fully autonomously over time. The presented methods are demonstrated on-board fully autonomous aerial, legged, and wheeled robots, run in real-time on mobile hardware, and are available as open-source software.
Read more

Leveraging Sociotechnical Security and Privacy to Address Online Abuse

Miranda Wei University of Washington (hosted by Krishna Gummadi)
06 Feb 2025, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
The prevalence and severity of online abuse are on the rise, from toxic content on social media to image-based sexual abuse, as new technologies are weaponized by people who do harm. Further, this abuse disproportionately harms people already marginalized in society, creating unacceptable disparities in safety and reinforcing oppression. Working in the areas of both computer security and privacy (S&P) and human-computer interaction (HCI), I address online abuse as the next frontier of S&P challenges. In this talk, ...
The prevalence and severity of online abuse are on the rise, from toxic content on social media to image-based sexual abuse, as new technologies are weaponized by people who do harm. Further, this abuse disproportionately harms people already marginalized in society, creating unacceptable disparities in safety and reinforcing oppression. Working in the areas of both computer security and privacy (S&P) and human-computer interaction (HCI), I address online abuse as the next frontier of S&P challenges. In this talk, I discuss my approach to sociotechnical threat modeling that (1) characterizes emerging S&P threats in digital safety, with particular attention to the technical and societal factors at play, (2) evaluates the existing support for online abuse, taking an ecosystem-level perspective, and (3) develops conceptual tools that bridge S&P and HCI towards societally informed S&P research. I conclude by outlining how sociotechnical security and privacy can work towards a world where all people using technology feel safe and connected.
Read more

Practical Privacy via New Systems and Abstractions

Kinan Dak Albab Brown University (hosted by Peter Schwabe)
05 Feb 2025, 1:00 pm - 2:00 pm
Virtual talk
CIS@MPG Colloquium
Data privacy has become a focal point for public discourse. In response, Data protection and privacy regulations have been enacted across the world, including the GDPR and CCPA, and companies make various promises to end-users in their privacy policies. However, high profile privacy violations remain commonplace, in part because complying with privacy regulations and policies is challenging for applications and developers. This talk demonstrates how we can help developers achieve privacy compliance by designing new privacy-conscious systems and abstractions. ...
Data privacy has become a focal point for public discourse. In response, Data protection and privacy regulations have been enacted across the world, including the GDPR and CCPA, and companies make various promises to end-users in their privacy policies. However, high profile privacy violations remain commonplace, in part because complying with privacy regulations and policies is challenging for applications and developers. This talk demonstrates how we can help developers achieve privacy compliance by designing new privacy-conscious systems and abstractions. This talk focuses on my work on Sesame (SOSP24), my system for end-to-end compliance with privacy policies in web applications. To provide practical guarantees, Sesame combines new static analysis for data leakage with advances in memory safe languages and lightweight sandboxing, as well as standard industry practices like code review. My work in this area also includes K9db (OSDI23), a privacy-compliant database that supports compliance-by-construction with GDPR-style subject access requests. By creating privacy abstractions at the systems level, we can offer applications privacy guarantees by design, in order to simplify compliance and improve end-user privacy.
Read more

Pushing the Boundaries of Modern Application-Aware Computing Stacks

Christina Giannoula University of Toronto (hosted by Peter Druschel)
03 Feb 2025, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
CIS@MPG Colloquium
Modern computing systems encounter significant challenges related to data movement in applications, such as data analytics and machine learning. Within a compute node, the physical separation of the processor from main memory necessitates retrieving data through a narrow memory bus. In big-data applications running across multiple nodes, data must be exchanged via narrow network interconnects. This movement of data —both within and across compute nodes— causes significant performance and energy overheads in modern and emerging applications. ...
Modern computing systems encounter significant challenges related to data movement in applications, such as data analytics and machine learning. Within a compute node, the physical separation of the processor from main memory necessitates retrieving data through a narrow memory bus. In big-data applications running across multiple nodes, data must be exchanged via narrow network interconnects. This movement of data —both within and across compute nodes— causes significant performance and energy overheads in modern and emerging applications. Moreover, today’s general-purpose computing stacks overlook the particular data needs of individual applications, missing crucial opportunities for untapped performance optimization. In this talk, I will present a cross-stack approach to designing application-aware computing stacks for cutting-edge applications, enabling new synergies between algorithms, systems software, and hardware. Specifically, I will demonstrate how integrating fine-grained application characteristics —such as input features, data access and synchronization patterns— across the layers of general-purpose computing stacks allows for tailoring stack components to meet the application’s specific data needs. This integration enables the stack components to work synergistically to reduce unnecessary or redundant data movement during application execution. I will present a few of my research contributions that propose hardware and software solutions for emerging applications, such as deep learning, and by capitalizing on the emerging processing-in-memory paradigm. Finally, I will conclude by outlining my future plans to design application-adaptive and sustainable computing stacks to significantly enhance performance and energy efficiency in cutting-edge applications.
Read more

Legal Research in the Era of Large Language Models

Christoph Engel Max Planck Institute for Research on Collective Goods
15 Jan 2025, 3:00 pm - 4:00 pm
Kaiserslautern building G26, room 111
SWS Colloquium
In a profound sense, the law is an applied field. It exists because society needs rules to function. Even if these rules are seemingly "bright line", in limit cases they require interpretation. Even more so if the rule in question confines itself to enunciate a normative program, and leaves its implementation to the administration and the judiciary. The traditional response is hermeneutical. The legislator translates the normative intention into words. That way, it implicitly delegates spelling out what these words mean to the parties and the authorities involved in dissolving the concrete conflicts of life. ...
In a profound sense, the law is an applied field. It exists because society needs rules to function. Even if these rules are seemingly "bright line", in limit cases they require interpretation. Even more so if the rule in question confines itself to enunciate a normative program, and leaves its implementation to the administration and the judiciary. The traditional response is hermeneutical. The legislator translates the normative intention into words. That way, it implicitly delegates spelling out what these words mean to the parties and the authorities involved in dissolving the concrete conflicts of life. This sketch of the law’s mission explains the traditional character of legal research. If a researcher adopts an "inside view", she engages in a division of labor with practicing lawyers. The quintessential product of this research is a "commentary". The researcher summarizes the state of the art thinking about a statutory provision (and maybe proposes an alternative reading). Alternatively, the researcher adopts an "outside view". In the spirit of a social scientist, she treats the law as her object of study. Typical products are more precise definitions of and empirical investigations into a class of social problems that legal rules are meant to address; or attempts at finding traces of judicial policy in the jurisprudence of a court. Large language models have the potential to deeply affect all of these strands of legal research. As the potential is more easily discernible for the "outside view", the talk will only briefly illustrate in which ways LLMs are likely to fuel this strand of legal research. It will drill deeper into the "inside view", and explain how an important part of this research, the summarization of the jurisprudence on a statutory provision (the guarantee of freedom of assembly in the German Constitution) can already today be delegated to the LLM. It is not difficult to predict that, ten years from now, legal research will look radically different.
Read more

Safety Analysis of Parameterized Networks with Non-Blocking Rendez-Vous and Broadcasts

Lucie Guillou IRIF,Paris, France
11 Dec 2024, 2:00 pm - 4:00 pm
Kaiserslautern building G26, room 113
SWS Colloquium
I will present two recent joint works done with Arnaud Sangnier and Nathalie Sznajder. We study networks of processes that all execute the same finite protocol and communicate synchronously in two different ways: a process can broadcast one message to all other processes or send it to at most one other process. We study two coverability problems with a parameterised number of processes (state coverability and configuration coverability). It is already known that these problem are Ackermann-hard (and decidable) in the general case. ...
I will present two recent joint works done with Arnaud Sangnier and Nathalie Sznajder. We study networks of processes that all execute the same finite protocol and communicate synchronously in two different ways: a process can broadcast one message to all other processes or send it to at most one other process. We study two coverability problems with a parameterised number of processes (state coverability and configuration coverability). It is already known that these problem are Ackermann-hard (and decidable) in the general case. This already holds when the processes communicate only by broadcasts. We show that when forbidding broadcasts, the two problems are EXPSPACE-complete. We also study a restriction on the protocol: a Wait-Only protocol has no state from which a process can send and receive messages. We show that without broadcasts, both problems are PTIME-complete in this restriction, and with broadcasts, state coverability is PTIME-complete and configuration coverability PSPACE-complete.
Read more

Mechanistic interpretability of neural networks

Jonas Fischer MPI-INF - D2
06 Nov 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, ...
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, the reasoning process of a neural network however needs to be transparent. Recently, the mechanistic interpretation of this reasoning process has gained significant attention, which is concerned with understanding the *internal reasoning process* of a network, including what information particular neurons respond to and how these specific neurons are organized into larger circuits. In this talk, I will (1) gently introduce the topic of mechanistic interpretability in machine learning, (2) show how to discover mechanistic circuits within a neural network, (3) discuss the relevance of mechanistic interpretability in real-world applications, and (4) discuss what is still missing in the field.
Read more

Semantics of declassification in higher-order languages

Jan Menz Max Planck Institute for Software Systems
29 Oct 2024, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, ...
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, an intensional declassification policy, in the higher-order setting by adapting logical relations techniques developed for noninterference. This approach leads to the first compositional semantics for an intensional declassification policy in the higher-order setting. In the second part of my thesis, I turn towards a style of declassification that has already been more extensively studied in the higher-order setting using logical relation semantics: what declassification. I will present a new approach to what declassification in higher-order languages. Specifically, I will present a novel dependency analysis using a dependent type system and show that what-declassification guarantees can be obtained from this dependency analysis using logical relations semantics.
Read more

Strategic and counterfactual reasoning in AI-assisted decision making

Stratis Tsirtsis Max Planck Institute for Software Systems
10 Oct 2024, 5:00 pm - 6:00 pm
Virtual talk
SWS Student Defense Talks - Thesis Proposal
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. ...
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. In this thesis, I present contributions on multiple dimensions of AI-assisted decision making that range from technical methodologies to empirical findings. I start by focusing on ways that AI systems can support strategic reasoning—the ability of humans to make decisions based on expectations about others’ behavior—and I introduce algorithmic methods that support a human decision maker in allocating resources to a population of other (strategic) humans. Then, I switch focus to counterfactual reasoning—the ability to reflect on how different decisions could have led events of the past to turn out differently. In that context, I propose algorithmic methods to help a human decision maker improve their future decisions by highlighting specific past decisions that could have led to better outcomes and may serve as a good learning signal. I conclude the thesis by proposing a computational model based on counterfactual simulations to predict the extent to which people hold a human and an AI agent responsible for an outcome in cases where the two agents are working together towards a common goal. The performance of this model is evaluated through an experiment with human participants conducted online.
Read more

Computing Graph Isomorphisms and Similarities

Daniel Neuen MPI-INF - D1
02 Oct 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Deciding if two graphs are isomorphic, i.e., whether they are structurally identical, is a fundamental computational problem that has many important applications. Despite extensive efforts in the past decades, GI has remained one of only few natural problems contained in NP that are neither known to be NP-complete nor known to be contained in P. In 20216, Babai proved in a breakthrough result that GI can be solved in quasipolynomial time. In the first part of my talk, ...
Deciding if two graphs are isomorphic, i.e., whether they are structurally identical, is a fundamental computational problem that has many important applications. Despite extensive efforts in the past decades, GI has remained one of only few natural problems contained in NP that are neither known to be NP-complete nor known to be contained in P. In 20216, Babai proved in a breakthrough result that GI can be solved in quasipolynomial time. In the first part of my talk, I will give an overview on recent developments on testing isomorphism. A closely related problem is that of determining the similarity between two graphs. Arguably, this problem is even more relevant for many applications such as machine learning tasks on graph-structured data. Natural approaches such as counting edge modifications required to turn one graph into the other are computationally intractable even on very restricted input graphs. As a result, various other methods have been proposed to measure similarity between graphs such as graph motif counts, i.e., measuring similarity based on counting certain patterns appearing in the graphs. In the second part of the talk, we will consider different ways of measuring similarity and discuss relations to standard heuristics for isomorphism testing that have been discovered in recent years.
Read more

Specifying and Fuzzing Machine-Learning Models

Hasan Eniser Max Planck Institute for Software Systems
25 Sep 2024, 9:00 am - 10:00 am
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
Machine Learning (ML) models are now integral to many critical systems, from self-driving cars to aviation, where their reliability and safety are crucial. Validating that these models perform their intended functions without failure is essential to prevent catastrophic outcomes. This thesis introduces novel tools and approaches inspired by software testing to specify and fuzz ML models for their functional correctness. By leveraging fuzzing and metamorphic testing techniques, we address the challenges of generating test inputs and defining test oracles for ML models. ...
Machine Learning (ML) models are now integral to many critical systems, from self-driving cars to aviation, where their reliability and safety are crucial. Validating that these models perform their intended functions without failure is essential to prevent catastrophic outcomes. This thesis introduces novel tools and approaches inspired by software testing to specify and fuzz ML models for their functional correctness. By leveraging fuzzing and metamorphic testing techniques, we address the challenges of generating test inputs and defining test oracles for ML models. We begin by focusing on sequential decision-making problems, developing techniques to test action policies for reliability. Our PI-fuzz framework identifies bugs by generating diverse test states and applying test oracles relying on metamorphic relations. We then formalize metamorphic relations as hyperproperties and show their generalization across diverse domains and ML models. This led to the development of NOMOS, a declarative, domain-agnostic specification language for expressing and testing these hyperproperties. NOMOS is shown to be effective in identifying property violations across various ML domains. Additionally, we extend NOMOS to support code translation models. We evaluate several state-of-the-art models against a range of hyperproperties, uncovering numerous violations. This work contributes to the field by providing a comprehensive framework for assessing the reliability and safety of ML models in various applications.
Read more

Learning Temporal Properties for Explainability and Verification

Rajarshi Roy Max Planck Institute for Software Systems
13 Sep 2024, 3:00 pm - 3:45 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
We are witnessing the introduction of "intelligent" autonomous systems in many tasks that were traditionally performed by humans. Examples of such systems include autonomous vehicles, surveillance drones, and robots in warehouses, to name only a few. Often, these systems are developed using data-driven algorithms that result in them being "black-box" models. However, for the widespread usage of black-box systems in safety-critical applications, it is imperative to develop adequate trust in their behavior.

Toward the goal of developing trust in intelligent systems, ...
We are witnessing the introduction of "intelligent" autonomous systems in many tasks that were traditionally performed by humans. Examples of such systems include autonomous vehicles, surveillance drones, and robots in warehouses, to name only a few. Often, these systems are developed using data-driven algorithms that result in them being "black-box" models. However, for the widespread usage of black-box systems in safety-critical applications, it is imperative to develop adequate trust in their behavior.

Toward the goal of developing trust in intelligent systems, in this thesis, we propose techniques to (i) explain the behavior of these systems in a human-interpretable manner, and (ii) facilitate their verification through formal methods. We are particularly interested in explaining and formalizing the temporal behavior of these systems. To this end, we design several techniques to automatically learn temporal properties from observed executions of systems. We explore several different problem settings, such as learning from noisy data or from only positive data, and consider several popular logical formalisms such as Linear Temporal Logic (LTL), Metric Temporal Logic (MTL), and Property Specification Language (PSL). In my thesis defense, I will give an overview of the various problem settings and techniques used to handle them. Further, I will demonstrate some of the empirical results obtained from the prototype implementation of the learning techniques.
Read more

Scaling Up Relaxed Memory Verification with Separation Logics

Hoang-Hai Dang Max Planck Institute for Software Systems
10 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 005
SWS Student Defense Talks - Thesis Defense
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; ...
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; (2) modular reasoning so that we can compose smaller verification results into larger ones; (3) reasoning extensibility so that we can derive new reasoning principles for both complex language features and algorithms without rebuilding our logic from scratch; and (4) machine-checked proofs so that we do not miss potential unsoundness in our verifications. With these ingredients in hand, a logic designer can flexibly accommodate the intricacy of relaxed memory features and the ingenuity of programmers who exploit those features. In this dissertation, I present how to develop strong, abstract, modular, extensible, and machine-checked separation logics for realistic relaxed memory concurrency in the Iris framework, using multiple layers of abstractions. I report two main applications of such logics: (i) the verification of the Rust type system with a relaxed memory model, where relaxed memory effects are encapsulated behind the safe interface of libraries and thus are not visible to clients, and (ii) the compositional specification and verification of relaxed memory libraries, in which relaxed memory effects are exposed to clients.
Read more

Reaching New Shores with Modern Separation Logic

Simon Spies Max Planck Institute for Software Systems
09 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs. ...
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs.

With my thesis, I will present a line of work that focuses on the weak spots of modern separation logic. Concretely, in the context of the separation logic framework Iris, I will target two broader areas, *step-indexing* and *automation*. Step-indexing is a powerful technique for recursive definitions that is crucial for handling many of the more advanced features of modern languages. But if one does not closely follow the path laid out by its inventors, perfectly natural proof strategies turn into dead ends. Automation, on the other hand, is important for reducing the overhead of verification, which is all too often preventing verification from scaling to larger code bases.

Regarding step-indexing, the thesis will present two projects, Transfinite Iris and Later Credits. Transfinite Iris shows how to generalize step-indexing—which traditionally applies only to safety properties—to proving liveness properties. Later Credits leverage separation logic to develop an amortized form of step-indexing that enables more flexible proof patterns. Regarding automation, the thesis will present Quiver and, potentially, Daenerys. Quiver introduces a new form of guided specification inference to reduce the specification overhead of separation logic verification. Daenerys brings heap-dependent assertions— logic-level assertions containing program-level expressions—to separation logic. Unlike traditional separation-logic assertions about programs, these heap-dependent assertions validate first-order reasoning principles thus laying the groundwork for SMT-solver based automation.
Read more

Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs)

Krishna Gummadi Max Planck Institute for Software Systems
04 Sep 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; ...
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; specifically, how do LLMs receive, process, organize, store, and retrieve information. 

Our analysis is centered around engaging LLMs in two specific types of cognitive tasks: first, syntactically-rich (semantically-poor) tasks such as recognizing formal grammars, and next, semantically-rich (syntactically-poor) tasks such as answering factual knowledge questions about real-world entities. Using carefully designed experimental frameworks, we attempt to answer the following foundational questions: (a) how can we estimate what latent skills and knowledge a (pre-trained) LLM possesses? (b) (how) can we distinguish whether some LLM has learnt some training data by rote vs. by understanding? (c) what is the minimum amount of training data/costs needed for a (pre-trained) LLM to acquire a new skill or knowledge?  (d) when solving a task, is training over task examples better, worse, or similar to providing them as in-context demonstrations?

I will present some initial empirical results from experimenting with a number of large open-source language models and argue that our findings they have important implications for the privacy of training data (including potential for memorization), the reliability of generated outputs (including potential for hallucinations), and the robustness of LLM-based applications (including our podcast assistant for science communication). 
Read more