Recent events

Scaling Up Relaxed Memory Verification with Separation Logics

Hoang-Hai Dang Max Planck Institute for Software Systems
10 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 005
SWS Student Defense Talks - Thesis Defense
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; ...
Reasoning about concurrency in a realistic, non-toy language like C/C++ or Rust, which encompasses many interweaving complex features, is very hard. Yet, realistic concurrency involves relaxed memory models, which are significantly harder to reason about than the simple, traditional concurrency model that is sequential consistency. To scale up verifications to realistic concurrency, we need a few ingredients: (1) strong but abstract reasoning principles so that we can avoid the too tedious details of the underlying concurrency model; (2) modular reasoning so that we can compose smaller verification results into larger ones; (3) reasoning extensibility so that we can derive new reasoning principles for both complex language features and algorithms without rebuilding our logic from scratch; and (4) machine-checked proofs so that we do not miss potential unsoundness in our verifications. With these ingredients in hand, a logic designer can flexibly accommodate the intricacy of relaxed memory features and the ingenuity of programmers who exploit those features. In this dissertation, I present how to develop strong, abstract, modular, extensible, and machine-checked separation logics for realistic relaxed memory concurrency in the Iris framework, using multiple layers of abstractions. I report two main applications of such logics: (i) the verification of the Rust type system with a relaxed memory model, where relaxed memory effects are encapsulated behind the safe interface of libraries and thus are not visible to clients, and (ii) the compositional specification and verification of relaxed memory libraries, in which relaxed memory effects are exposed to clients.
Read more

Reaching New Shores with Modern Separation Logic

Simon Spies Max Planck Institute for Software Systems
09 Sep 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs. ...
The question of how to scalably verify large, stateful programs is one of the oldest—and still unsolved—questions of computer science. Over the last two decades, considerable progress has been made toward this goal with the advent of separation logic, a verification technique designed for modularly reasoning about stateful programs. While the original separation logic was only geared at verifying imperative, pointer-manipulating programs, modern versions of separation logic have become an essential tool in the toolbox of the working semanticist for modelling programming languages and verifying programs.

With my thesis, I will present a line of work that focuses on the weak spots of modern separation logic. Concretely, in the context of the separation logic framework Iris, I will target two broader areas, *step-indexing* and *automation*. Step-indexing is a powerful technique for recursive definitions that is crucial for handling many of the more advanced features of modern languages. But if one does not closely follow the path laid out by its inventors, perfectly natural proof strategies turn into dead ends. Automation, on the other hand, is important for reducing the overhead of verification, which is all too often preventing verification from scaling to larger code bases.

Regarding step-indexing, the thesis will present two projects, Transfinite Iris and Later Credits. Transfinite Iris shows how to generalize step-indexing—which traditionally applies only to safety properties—to proving liveness properties. Later Credits leverage separation logic to develop an amortized form of step-indexing that enables more flexible proof patterns. Regarding automation, the thesis will present Quiver and, potentially, Daenerys. Quiver introduces a new form of guided specification inference to reduce the specification overhead of separation logic verification. Daenerys brings heap-dependent assertions— logic-level assertions containing program-level expressions—to separation logic. Unlike traditional separation-logic assertions about programs, these heap-dependent assertions validate first-order reasoning principles thus laying the groundwork for SMT-solver based automation.
Read more

Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs)

Krishna Gummadi Max Planck Institute for Software Systems
04 Sep 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; ...
I will begin by demoing and releasing an LLM-based assistant that allows scientists to convert their papers (with a simple drag and drop) into short podcasts for communicating their research to a general audience. While we built the tool, we can’t explain its unreasonable effectiveness, i.e., we don’t really understand why it works or when it might fail. So in the rest of the talk, I will present our investigations into some basic curiosity-driven questions about LLMs; specifically, how do LLMs receive, process, organize, store, and retrieve information. 

Our analysis is centered around engaging LLMs in two specific types of cognitive tasks: first, syntactically-rich (semantically-poor) tasks such as recognizing formal grammars, and next, semantically-rich (syntactically-poor) tasks such as answering factual knowledge questions about real-world entities. Using carefully designed experimental frameworks, we attempt to answer the following foundational questions: (a) how can we estimate what latent skills and knowledge a (pre-trained) LLM possesses? (b) (how) can we distinguish whether some LLM has learnt some training data by rote vs. by understanding? (c) what is the minimum amount of training data/costs needed for a (pre-trained) LLM to acquire a new skill or knowledge?  (d) when solving a task, is training over task examples better, worse, or similar to providing them as in-context demonstrations?

I will present some initial empirical results from experimenting with a number of large open-source language models and argue that our findings they have important implications for the privacy of training data (including potential for memorization), the reliability of generated outputs (including potential for hallucinations), and the robustness of LLM-based applications (including our podcast assistant for science communication). 
Read more

Foundational Verification of Low-Level Code: RefinedC and Beyond

Deepak Garg Max Planck Institute for Software Systems
03 Jul 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Low-level code written in languages like C forms the bedrock of most computer systems today. Establishing the correctness of this code is important for the safety and security of our systems. The gold standard for code correctness is foundational verification -- the construction of mathematical proofs of program correctness verified in a proof assistant like Coq. However, foundational verification of low-level code is a tedious task that has, so far, largely relied on adapting code to suit the constraints of the verification framework, ...
Low-level code written in languages like C forms the bedrock of most computer systems today. Establishing the correctness of this code is important for the safety and security of our systems. The gold standard for code correctness is foundational verification -- the construction of mathematical proofs of program correctness verified in a proof assistant like Coq. However, foundational verification of low-level code is a tedious task that has, so far, largely relied on adapting code to suit the constraints of the verification framework, e.g., the programming idioms that are supported.

In this talk, I will present RefinedC, a new Coq-based C verification framework that aims to address the above limitation. By design, RefinedC is foundational (all proofs are mechanically verified within Coq), extensible (it can be extended to support new program properties, programming idioms and even new languages) and automation-friendly. Time permitting, I will also discuss projects that extend RefinedC's capabilities to reason about code written in other low-level languages like assembly, and code written in more than one language.

(This talk is based on joint work with several other research groups, notably the group of Derek Dreyer at MPI-SWS.)
Read more

Understanding LLMs Under the Hood

Ellie Pavlick Brown University
25 Jun 2024, 12:00 pm - 2:00 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Large language models (LLMs) currently dominate AI. They exhibit impressive, often human-like, behavior across a wide range of linguistic and non-linguistic tasks. However, LLMs are the result of a combination of clever engineering and financial investment, rather than a result of rigorous scientific or theoretical study. Thus, very little is known about how or why LLMs exhibit the behaviors that they do. As a result, LLMs often seem mysterious and unpredictable. In this talk, I will discuss recent work which seeks to characterize the mechanisms that LLMs employ in terms of higher-level data structures and algorithms. ...
Large language models (LLMs) currently dominate AI. They exhibit impressive, often human-like, behavior across a wide range of linguistic and non-linguistic tasks. However, LLMs are the result of a combination of clever engineering and financial investment, rather than a result of rigorous scientific or theoretical study. Thus, very little is known about how or why LLMs exhibit the behaviors that they do. As a result, LLMs often seem mysterious and unpredictable. In this talk, I will discuss recent work which seeks to characterize the mechanisms that LLMs employ in terms of higher-level data structures and algorithms. Using results on a series of simple tasks, I will argue that LLMs are not as inscrutable as they seem, but rather make use of simple and often modular computational components to solve complex problems.
Read more

Formal Controller Synthesis for Dynamical Systems: Decidability and Scalability

Mahmoud Salamati Max Planck Institute for Software Systems
20 Jun 2024, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
In cyber-physical systems research, a significant challenge is synthesizing reliable controllers that meet general temporal specifications. These controllers must offer formal guarantees against various sources of disturbances. Efficient formal controller design opens up numerous exciting applications across different domains, including the autonomous vehicle industry, energy systems, and healthcare. In this talk, I will discuss formal controller synthesis for three different classes of dynamical systems. First, I will address our contributions to improving the scalability of abstraction-based controller design methods for dynamical systems with continuous state spaces and bounded uncertainty. ...
In cyber-physical systems research, a significant challenge is synthesizing reliable controllers that meet general temporal specifications. These controllers must offer formal guarantees against various sources of disturbances. Efficient formal controller design opens up numerous exciting applications across different domains, including the autonomous vehicle industry, energy systems, and healthcare. In this talk, I will discuss formal controller synthesis for three different classes of dynamical systems. First, I will address our contributions to improving the scalability of abstraction-based controller design methods for dynamical systems with continuous state spaces and bounded uncertainty. These contributions include: (1) a data-driven scheme for relaxing the requirement of having analytical model, (2) a neural abstraction method to lower memory requirements of both synthesis and deployment, and (3) a scalable method for solving reach-avoid problems for multi-agent systems. Second, I will briefly describe our contributions to time-bounded model checking for continuous-time Markov decision processes, specifically: (1) a conditional decidability result, and (2) a systematic method for improving the scalability of numerical approximation methods. Finally, I will briefly discuss our results related to the decidability of reachability-related problems for linear dynamical systems, which include: (1) a hardness result for point-to-point reachability of linear dynamical systems with hyper-rectangular control sets, and (2) decidability of pseudo-reachability for hyperplane target sets.
Read more

Hunting new antibiotics with computer science

Alexey Gurevich Fachrichtung Informatik - Saarbrücken
05 Jun 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
WHO describes antimicrobial resistance (AMR) as one of the top global public health and development threats, while the World Bank estimates its additional healthcare burden as US$ 1 trillion by 2050. One strategy to combat AMR is significantly speeding up and reducing the cost of discovering new antibiotics, particularly those existing in nature but evading all current attempts to find them. Surprisingly, computer scientists play an increasingly important role in this endeavor. Modern biotechnology allows us to amass vast amounts of data on natural antibiotics and their tiny producers. ...
WHO describes antimicrobial resistance (AMR) as one of the top global public health and development threats, while the World Bank estimates its additional healthcare burden as US$ 1 trillion by 2050. One strategy to combat AMR is significantly speeding up and reducing the cost of discovering new antibiotics, particularly those existing in nature but evading all current attempts to find them. Surprisingly, computer scientists play an increasingly important role in this endeavor. Modern biotechnology allows us to amass vast amounts of data on natural antibiotics and their tiny producers. However, specialized software and algorithms are the key to unlocking this wealth of information and turning it into medically important discoveries.

In my talk, I will demonstrate how computer science methods, from classical graph and string algorithms to deep learning, help transform antibiotic discovery into a high-throughput technology and realize the promise of already collected and rapidly growing biological datasets. The particular focus will be on analyzing sequencing (strings over the DNA alphabet of {A, C, G, T} letters) and mass spectrometry data (two-dimensional arrays of floating point values). In both cases, we deal with noisy experimental data, often rely on heuristics to make the processing time reasonable, and finally, get biologically relevant findings suitable for verification by our wet-lab collaborators.
Read more

The Frontiers of Packing

Karol Węgrzycki MPI-INF - D1
08 May 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Packing problems are ubiquitous optimization problems in computer science, where the task is to pack some class of objects into some class of containers. Canonical examples of packing problems include Knapsack, Subset Sum, and Bin Packing, but also several graph problems can be expressed as packing problems.

Most packing problems are NP-hard, and thus are typically attacked by approximation algorithms and fixed-parameter tractability, the two most recognized algorithmic paradigms for dealing with NP-hardness in theoretical computer science. ...
Packing problems are ubiquitous optimization problems in computer science, where the task is to pack some class of objects into some class of containers. Canonical examples of packing problems include Knapsack, Subset Sum, and Bin Packing, but also several graph problems can be expressed as packing problems.

Most packing problems are NP-hard, and thus are typically attacked by approximation algorithms and fixed-parameter tractability, the two most recognized algorithmic paradigms for dealing with NP-hardness in theoretical computer science. This talk surveys recent developments on these two perspectives for packing problems. More concretely, I will talk about the following flavours of packing:

- (packing in geometry): how to pack polygons on the plane. - (packing numbers): how to pack items into a knapsack to maximize the value. - (packing in vector addition systems): how to reach your destination without running out of gas and money.
Read more

Towards Real-World Fact-Checking with Large Language Models

Iryna Gurevych Technical University of Darmstadt
03 May 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
SWS Distinguished Lecture Series
Misinformation poses a growing threat to our society. It has a severe impact on public health by promoting fake cures or vaccine hesitancy, and it is used as a weapon during military conflicts to spread fear and distrust. Current natural language processing (NLP) fact-checking research focuses on identifying evidence and the veracity of a claim. People’s beliefs however often do not depend on the claim and the rational reasoning as such, but on credible content that makes the claim seem more reliable, ...
Misinformation poses a growing threat to our society. It has a severe impact on public health by promoting fake cures or vaccine hesitancy, and it is used as a weapon during military conflicts to spread fear and distrust. Current natural language processing (NLP) fact-checking research focuses on identifying evidence and the veracity of a claim. People’s beliefs however often do not depend on the claim and the rational reasoning as such, but on credible content that makes the claim seem more reliable, such as scientific publications or visual content that was manipulated or stems from unrelated context. To combat misinformation we need to show (1) "Why was the claim believed to be true?", (2) "Why is the claim false?", (3) "Why is the alternative explanation correct?". In the talk, I will zoom into two critical aspects of such misinformation supported by credible though misleading content. First, I will present our efforts to dismantle misleading narratives based on fallacious interpretations of scientific publications. Second, I will show how we can use multimodal large language models to (1) detect misinformation based on visual content, (2) provide strong alternative explanations for the visual content.
Read more

On Fairness Concerns in the Blockchain Ecosystem

Johnnatan Messias Max Planck Institute for Software Systems
25 Apr 2024, 1:00 pm - 2:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Blockchains revolutionized centralized sectors like banking and finance by promoting decentralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state. Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. ...
Blockchains revolutionized centralized sectors like banking and finance by promoting decentralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state. Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks. In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.
Read more

Making machine learning predictably reliable

Andrew Ilyas Massachusetts Institute of Technology
17 Apr 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Despite ML models' impressive performance, training and deploying them is currently a somewhat messy endeavor. But does it have to be? In this talk, I overview my work on making ML "predictably reliable"---enabling developers to know when their models will work, when they will fail, and why.

To begin, we use a case study of adversarial inputs to show that human intuition can be a poor predictor of how ML models operate. Motivated by this, ...
Despite ML models' impressive performance, training and deploying them is currently a somewhat messy endeavor. But does it have to be? In this talk, I overview my work on making ML "predictably reliable"---enabling developers to know when their models will work, when they will fail, and why.

To begin, we use a case study of adversarial inputs to show that human intuition can be a poor predictor of how ML models operate. Motivated by this, we present a line of work that aims to develop a precise understanding of the ML pipeline, combining statistical tools with large-scale experiments to characterize the role of each individual design choice: from how to collect data, to what dataset to train on, to what learning algorithm to use.
Read more

Hierarchical clustering beyond heuristics: Quality guarantees without speed sacrifices

Evangelos Kipouridis MPI-INF - D1
03 Apr 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
The algorithmic task of constructing hierarchical representations of data has been studied by various communities over many decades. Their applications range from statistics and databases to the analysis of complex networks and, more recently, machine learning, where they have proven useful for understanding text, images, graphs and multi-relational data. The reason why hierarchical representations are so ubiquitous is that many data sets stemming from nature or society are organized according to a latent hierarchy. Furthermore, in contrast to "flat" clustering techniques, ...
The algorithmic task of constructing hierarchical representations of data has been studied by various communities over many decades. Their applications range from statistics and databases to the analysis of complex networks and, more recently, machine learning, where they have proven useful for understanding text, images, graphs and multi-relational data. The reason why hierarchical representations are so ubiquitous is that many data sets stemming from nature or society are organized according to a latent hierarchy. Furthermore, in contrast to "flat" clustering techniques, like k-means or k-median which cannot capture fine-grained relationships among points, hierarchical clustering reveals the structure of a data set at multiple levels of granularity simultaneously.

Despite of the plethora of applications, the theory behind hierarchical clustering is underdeveloped, and popular heuristics offer little formal guarantees. In this talk I will present my work on algorithms with near optimal quality guarantees; in fact, in certain cases the algorithms run in near linear time, bridging the gap between theory and practice. Finally, I will discuss how to incorporate domain specific knowledge, leading to semi-supervised hierarchical clustering, as opposed to the traditional view of hierarchical clustering as an unsupervised learning method.
Read more

Expanding the Horizons of Finite-Precision Analysis

Debasmita Lohar Max Planck Institute for Software Systems
27 Mar 2024, 2:45 pm - 3:45 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. ...
Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. In this work, we expand the individual capabilities of these techniques by capturing the impact of uncertain inputs on discrete decisions and roundoff errors, by scaling floating-point verification for larger programs, and by specializing optimization for feed-forward deep neural networks.
Read more

On Fairness, Invariance and Memorization in Machine Decision and DeepLearning Algorithms

Till Speicher Max Planck Institute for Software Systems
26 Mar 2024, 12:00 pm - 1:00 pm
Saarbrücken building E1 5, room 005
SWS Student Defense Talks - Thesis Proposal
As learning algorithms become more and more capable, they are used to tackle an increasingly large spectrum of tasks. Their applications range from understanding images, speech and natural language to making socially impactful decisions, such as about people's eligibility for loans and jobs. Therefore, it is important to better understand both the consequences of algorithmic decisions as well as the mechanisms by which algorithms arrive at their outputs. Of particular interest in this regard are fairness when algorithmic decisions impact peoples lives, ...
As learning algorithms become more and more capable, they are used to tackle an increasingly large spectrum of tasks. Their applications range from understanding images, speech and natural language to making socially impactful decisions, such as about people's eligibility for loans and jobs. Therefore, it is important to better understand both the consequences of algorithmic decisions as well as the mechanisms by which algorithms arrive at their outputs. Of particular interest in this regard are fairness when algorithmic decisions impact peoples lives, as well as the behavior of deep learning algorithms, the most powerful but also opaque type of learning algorithm.

To this end, this thesis makes two contributions: First, we study fairness in algorithmic decision making. At a conceptual level, we introduce a metric for measuring unfairness in algorithmic decisions based on inequality indices from the economics literature. We show that this metric can be used to decompose the overall unfairness for a given set of users into between- and within-subgroup components and highlight potential tradeoffs between them as well as between fairness and accuracy. At an empirical level, we demonstrate the necessity for studying fairness in algorithmically controlled systems by exposing the potential for discrimination that is enabled by Facebook's advertising platform. In this context, we demonstrate how advertisers can target ads to exclude users belonging to protected sensitive groups, a practice that is illegal in domains such as housing, employment and finance, and highlight the necessity for better mitigation methods.

The second contribution of this thesis is aimed at better understanding the mechanisms governing the behavior of deep learning algorithms. First, we study the role that invariance plays in learning useful representations. We show that the set of invariances possessed by representations is of critical importance in determining whether they are useful for downstream tasks, more important than many other factors commonly considered to determine transfer performance. Second, we investigate memorisation in large language models, which have recently become very popular. By training models to memorise random strings, we surface a rich and surprising set of dynamics during the memorisation process. We find that models undergo two phases during memorisation, that strings with lower entropy are harder to memorise, that the memorisation dynamics evolve during repeated memorisation and that models can recall tokens in random strings with only a very restricted amount of information.
Read more