Recent events

Legal Research in the Era of Large Language Models

Christoph Engel Max Planck Institute for Research on Collective Goods
15 Jan 2025, 3:00 pm - 4:00 pm
Kaiserslautern building G26, room 111
SWS Colloquium
In a profound sense, the law is an applied field. It exists because society needs rules to function. Even if these rules are seemingly "bright line", in limit cases they require interpretation. Even more so if the rule in question confines itself to enunciate a normative program, and leaves its implementation to the administration and the judiciary. The traditional response is hermeneutical. The legislator translates the normative intention into words. That way, it implicitly delegates spelling out what these words mean to the parties and the authorities involved in dissolving the concrete conflicts of life. ...
In a profound sense, the law is an applied field. It exists because society needs rules to function. Even if these rules are seemingly "bright line", in limit cases they require interpretation. Even more so if the rule in question confines itself to enunciate a normative program, and leaves its implementation to the administration and the judiciary. The traditional response is hermeneutical. The legislator translates the normative intention into words. That way, it implicitly delegates spelling out what these words mean to the parties and the authorities involved in dissolving the concrete conflicts of life. This sketch of the law’s mission explains the traditional character of legal research. If a researcher adopts an "inside view", she engages in a division of labor with practicing lawyers. The quintessential product of this research is a "commentary". The researcher summarizes the state of the art thinking about a statutory provision (and maybe proposes an alternative reading). Alternatively, the researcher adopts an "outside view". In the spirit of a social scientist, she treats the law as her object of study. Typical products are more precise definitions of and empirical investigations into a class of social problems that legal rules are meant to address; or attempts at finding traces of judicial policy in the jurisprudence of a court. Large language models have the potential to deeply affect all of these strands of legal research. As the potential is more easily discernible for the "outside view", the talk will only briefly illustrate in which ways LLMs are likely to fuel this strand of legal research. It will drill deeper into the "inside view", and explain how an important part of this research, the summarization of the jurisprudence on a statutory provision (the guarantee of freedom of assembly in the German Constitution) can already today be delegated to the LLM. It is not difficult to predict that, ten years from now, legal research will look radically different.
Read more

Safety Analysis of Parameterized Networks with Non-Blocking Rendez-Vous and Broadcasts

Lucie Guillou IRIF,Paris, France
11 Dec 2024, 2:00 pm - 4:00 pm
Kaiserslautern building G26, room 113
SWS Colloquium
I will present two recent joint works done with Arnaud Sangnier and Nathalie Sznajder. We study networks of processes that all execute the same finite protocol and communicate synchronously in two different ways: a process can broadcast one message to all other processes or send it to at most one other process. We study two coverability problems with a parameterised number of processes (state coverability and configuration coverability). It is already known that these problem are Ackermann-hard (and decidable) in the general case. ...
I will present two recent joint works done with Arnaud Sangnier and Nathalie Sznajder. We study networks of processes that all execute the same finite protocol and communicate synchronously in two different ways: a process can broadcast one message to all other processes or send it to at most one other process. We study two coverability problems with a parameterised number of processes (state coverability and configuration coverability). It is already known that these problem are Ackermann-hard (and decidable) in the general case. This already holds when the processes communicate only by broadcasts. We show that when forbidding broadcasts, the two problems are EXPSPACE-complete. We also study a restriction on the protocol: a Wait-Only protocol has no state from which a process can send and receive messages. We show that without broadcasts, both problems are PTIME-complete in this restriction, and with broadcasts, state coverability is PTIME-complete and configuration coverability PSPACE-complete.
Read more

Mechanistic interpretability of neural networks

Jonas Fischer MPI-INF - D2
06 Nov 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, ...
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, the reasoning process of a neural network however needs to be transparent. Recently, the mechanistic interpretation of this reasoning process has gained significant attention, which is concerned with understanding the *internal reasoning process* of a network, including what information particular neurons respond to and how these specific neurons are organized into larger circuits. In this talk, I will (1) gently introduce the topic of mechanistic interpretability in machine learning, (2) show how to discover mechanistic circuits within a neural network, (3) discuss the relevance of mechanistic interpretability in real-world applications, and (4) discuss what is still missing in the field.
Read more

Semantics of declassification in higher-order languages

Jan Menz Max Planck Institute for Software Systems
29 Oct 2024, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, ...
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, an intensional declassification policy, in the higher-order setting by adapting logical relations techniques developed for noninterference. This approach leads to the first compositional semantics for an intensional declassification policy in the higher-order setting. In the second part of my thesis, I turn towards a style of declassification that has already been more extensively studied in the higher-order setting using logical relation semantics: what declassification. I will present a new approach to what declassification in higher-order languages. Specifically, I will present a novel dependency analysis using a dependent type system and show that what-declassification guarantees can be obtained from this dependency analysis using logical relations semantics.
Read more

Strategic and counterfactual reasoning in AI-assisted decision making

Stratis Tsirtsis Max Planck Institute for Software Systems
10 Oct 2024, 5:00 pm - 6:00 pm
Virtual talk
SWS Student Defense Talks - Thesis Proposal
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. ...
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. In this thesis, I present contributions on multiple dimensions of AI-assisted decision making that range from technical methodologies to empirical findings. I start by focusing on ways that AI systems can support strategic reasoning—the ability of humans to make decisions based on expectations about others’ behavior—and I introduce algorithmic methods that support a human decision maker in allocating resources to a population of other (strategic) humans. Then, I switch focus to counterfactual reasoning—the ability to reflect on how different decisions could have led events of the past to turn out differently. In that context, I propose algorithmic methods to help a human decision maker improve their future decisions by highlighting specific past decisions that could have led to better outcomes and may serve as a good learning signal. I conclude the thesis by proposing a computational model based on counterfactual simulations to predict the extent to which people hold a human and an AI agent responsible for an outcome in cases where the two agents are working together towards a common goal. The performance of this model is evaluated through an experiment with human participants conducted online.
Read more

Computing Graph Isomorphisms and Similarities

Daniel Neuen MPI-INF - D1
02 Oct 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Deciding if two graphs are isomorphic, i.e., whether they are structurally identical, is a fundamental computational problem that has many important applications. Despite extensive efforts in the past decades, GI has remained one of only few natural problems contained in NP that are neither known to be NP-complete nor known to be contained in P. In 20216, Babai proved in a breakthrough result that GI can be solved in quasipolynomial time. In the first part of my talk, ...
Deciding if two graphs are isomorphic, i.e., whether they are structurally identical, is a fundamental computational problem that has many important applications. Despite extensive efforts in the past decades, GI has remained one of only few natural problems contained in NP that are neither known to be NP-complete nor known to be contained in P. In 20216, Babai proved in a breakthrough result that GI can be solved in quasipolynomial time. In the first part of my talk, I will give an overview on recent developments on testing isomorphism. A closely related problem is that of determining the similarity between two graphs. Arguably, this problem is even more relevant for many applications such as machine learning tasks on graph-structured data. Natural approaches such as counting edge modifications required to turn one graph into the other are computationally intractable even on very restricted input graphs. As a result, various other methods have been proposed to measure similarity between graphs such as graph motif counts, i.e., measuring similarity based on counting certain patterns appearing in the graphs. In the second part of the talk, we will consider different ways of measuring similarity and discuss relations to standard heuristics for isomorphism testing that have been discovered in recent years.
Read more

Specifying and Fuzzing Machine-Learning Models

Hasan Eniser Max Planck Institute for Software Systems
25 Sep 2024, 9:00 am - 10:00 am
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
Machine Learning (ML) models are now integral to many critical systems, from self-driving cars to aviation, where their reliability and safety are crucial. Validating that these models perform their intended functions without failure is essential to prevent catastrophic outcomes. This thesis introduces novel tools and approaches inspired by software testing to specify and fuzz ML models for their functional correctness. By leveraging fuzzing and metamorphic testing techniques, we address the challenges of generating test inputs and defining test oracles for ML models. ...
Machine Learning (ML) models are now integral to many critical systems, from self-driving cars to aviation, where their reliability and safety are crucial. Validating that these models perform their intended functions without failure is essential to prevent catastrophic outcomes. This thesis introduces novel tools and approaches inspired by software testing to specify and fuzz ML models for their functional correctness. By leveraging fuzzing and metamorphic testing techniques, we address the challenges of generating test inputs and defining test oracles for ML models. We begin by focusing on sequential decision-making problems, developing techniques to test action policies for reliability. Our PI-fuzz framework identifies bugs by generating diverse test states and applying test oracles relying on metamorphic relations. We then formalize metamorphic relations as hyperproperties and show their generalization across diverse domains and ML models. This led to the development of NOMOS, a declarative, domain-agnostic specification language for expressing and testing these hyperproperties. NOMOS is shown to be effective in identifying property violations across various ML domains. Additionally, we extend NOMOS to support code translation models. We evaluate several state-of-the-art models against a range of hyperproperties, uncovering numerous violations. This work contributes to the field by providing a comprehensive framework for assessing the reliability and safety of ML models in various applications.
Read more

Learning Temporal Properties for Explainability and Verification

Rajarshi Roy Max Planck Institute for Software Systems
13 Sep 2024, 3:00 pm - 3:45 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
We are witnessing the introduction of "intelligent" autonomous systems in many tasks that were traditionally performed by humans. Examples of such systems include autonomous vehicles, surveillance drones, and robots in warehouses, to name only a few. Often, these systems are developed using data-driven algorithms that result in them being "black-box" models. However, for the widespread usage of black-box systems in safety-critical applications, it is imperative to develop adequate trust in their behavior.

Toward the goal of developing trust in intelligent systems, ...
We are witnessing the introduction of "intelligent" autonomous systems in many tasks that were traditionally performed by humans. Examples of such systems include autonomous vehicles, surveillance drones, and robots in warehouses, to name only a few. Often, these systems are developed using data-driven algorithms that result in them being "black-box" models. However, for the widespread usage of black-box systems in safety-critical applications, it is imperative to develop adequate trust in their behavior.

Toward the goal of developing trust in intelligent systems, in this thesis, we propose techniques to (i) explain the behavior of these systems in a human-interpretable manner, and (ii) facilitate their verification through formal methods. We are particularly interested in explaining and formalizing the temporal behavior of these systems. To this end, we design several techniques to automatically learn temporal properties from observed executions of systems. We explore several different problem settings, such as learning from noisy data or from only positive data, and consider several popular logical formalisms such as Linear Temporal Logic (LTL), Metric Temporal Logic (MTL), and Property Specification Language (PSL). In my thesis defense, I will give an overview of the various problem settings and techniques used to handle them. Further, I will demonstrate some of the empirical results obtained from the prototype implementation of the learning techniques.
Read more

Scaling Up Relaxed Memory Verification with Separation Logics

Hoang-Hai Dang Max Planck Institute for Software Systems
10 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 005
SWS Student Defense Talks - Thesis Defense
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; ...
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; (2) modular reasoning so that we can compose smaller verification results into larger ones; (3) reasoning extensibility so that we can derive new reasoning principles for both complex language features and algorithms without rebuilding our logic from scratch; and (4) machine-checked proofs so that we do not miss potential unsoundness in our verifications. With these ingredients in hand, a logic designer can flexibly accommodate the intricacy of relaxed memory features and the ingenuity of programmers who exploit those features. In this dissertation, I present how to develop strong, abstract, modular, extensible, and machine-checked separation logics for realistic relaxed memory concurrency in the Iris framework, using multiple layers of abstractions. I report two main applications of such logics: (i) the verification of the Rust type system with a relaxed memory model, where relaxed memory effects are encapsulated behind the safe interface of libraries and thus are not visible to clients, and (ii) the compositional specification and verification of relaxed memory libraries, in which relaxed memory effects are exposed to clients.
Read more

Reaching New Shores with Modern Separation Logic

Simon Spies Max Planck Institute for Software Systems
09 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs. ...
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs.

With my thesis, I will present a line of work that focuses on the weak spots of modern separation logic. Concretely, in the context of the separation logic framework Iris, I will target two broader areas, *step-indexing* and *automation*. Step-indexing is a powerful technique for recursive definitions that is crucial for handling many of the more advanced features of modern languages. But if one does not closely follow the path laid out by its inventors, perfectly natural proof strategies turn into dead ends. Automation, on the other hand, is important for reducing the overhead of verification, which is all too often preventing verification from scaling to larger code bases.

Regarding step-indexing, the thesis will present two projects, Transfinite Iris and Later Credits. Transfinite Iris shows how to generalize step-indexing—which traditionally applies only to safety properties—to proving liveness properties. Later Credits leverage separation logic to develop an amortized form of step-indexing that enables more flexible proof patterns. Regarding automation, the thesis will present Quiver and, potentially, Daenerys. Quiver introduces a new form of guided specification inference to reduce the specification overhead of separation logic verification. Daenerys brings heap-dependent assertions— logic-level assertions containing program-level expressions—to separation logic. Unlike traditional separation-logic assertions about programs, these heap-dependent assertions validate first-order reasoning principles thus laying the groundwork for SMT-solver based automation.
Read more

Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs)

Krishna Gummadi Max Planck Institute for Software Systems
04 Sep 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; ...
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; specifically, how do LLMs receive, process, organize, store, and retrieve information. 

Our analysis is centered around engaging LLMs in two specific types of cognitive tasks: first, syntactically-rich (semantically-poor) tasks such as recognizing formal grammars, and next, semantically-rich (syntactically-poor) tasks such as answering factual knowledge questions about real-world entities. Using carefully designed experimental frameworks, we attempt to answer the following foundational questions: (a) how can we estimate what latent skills and knowledge a (pre-trained) LLM possesses? (b) (how) can we distinguish whether some LLM has learnt some training data by rote vs. by understanding? (c) what is the minimum amount of training data/costs needed for a (pre-trained) LLM to acquire a new skill or knowledge?  (d) when solving a task, is training over task examples better, worse, or similar to providing them as in-context demonstrations?

I will present some initial empirical results from experimenting with a number of large open-source language models and argue that our findings they have important implications for the privacy of training data (including potential for memorization), the reliability of generated outputs (including potential for hallucinations), and the robustness of LLM-based applications (including our podcast assistant for science communication). 
Read more