Events

Recent events

Mechanistic interpretability of neural networks

Jonas Fischer MPI-INF - D2
06 Nov 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, ...
Modern machine learning (ML) has largely been driven by neural networks, delivering outstanding results not only in traditional ML applications, but also permeating into classical sciences solving open problems in physics, chemistry, and biology. This success came at a cost, as typical neural network architectures are inherently complex and their reasoning processes opaque. In domains where insights weigh more than prediction, such as modeling of systems biology, or in high-stakes decision making, such as healthcare and finance, the reasoning process of a neural network however needs to be transparent. Recently, the mechanistic interpretation of this reasoning process has gained significant attention, which is concerned with understanding the *internal reasoning process* of a network, including what information particular neurons respond to and how these specific neurons are organized into larger circuits. In this talk, I will (1) gently introduce the topic of mechanistic interpretability in machine learning, (2) show how to discover mechanistic circuits within a neural network, (3) discuss the relevance of mechanistic interpretability in real-world applications, and (4) discuss what is still missing in the field.
Read more

Semantics of declassification in higher-order languages

Jan Menz Max Planck Institute for Software Systems
29 Oct 2024, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, ...
Declassification allows a programmer to purposfully reveal secret information. In my thesis, I will argue that treating higher-order programs explicitly and compositionally is important when dealing with declassification, and using purpose-built logical-relations semantics, we can reason about declassification in new ways while fully supporting higher-order programs. To support this claim, I will study two different styles of declassification and their semantics. In the first part of my thesis, I will show how to reason about where declassification, an intensional declassification policy, in the higher-order setting by adapting logical relations techniques developed for noninterference. This approach leads to the first compositional semantics for an intensional declassification policy in the higher-order setting. In the second part of my thesis, I turn towards a style of declassification that has already been more extensively studied in the higher-order setting using logical relation semantics: what declassification. I will present a new approach to what declassification in higher-order languages. Specifically, I will present a novel dependency analysis using a dependent type system and show that what-declassification guarantees can be obtained from this dependency analysis using logical relations semantics.
Read more

Strategic and counterfactual reasoning in AI-assisted decision making

Stratis Tsirtsis Max Planck Institute for Software Systems
10 Oct 2024, 5:00 pm - 6:00 pm
Virtual talk
SWS Student Defense Talks - Thesis Proposal
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. ...
From finance and healthcare to criminal justice and transportation, various domains that require critical decisions, traditionally made by humans, have increasingly incorporated artificial intelligence (AI) systems into the decision making process. Despite unprecedented advances in AI capabilities, in most real-world contexts, AI systems do not make highly consequential decisions autonomously; instead, they operate in conjunction with a human decision maker. The promise of human-AI collaboration lies in the complementary strengths of each to lead to better decisions overall. In this thesis, I present contributions on multiple dimensions of AI-assisted decision making that range from technical methodologies to empirical findings. I start by focusing on ways that AI systems can support strategic reasoning—the ability of humans to make decisions based on expectations about others’ behavior—and I introduce algorithmic methods that support a human decision maker in allocating resources to a population of other (strategic) humans. Then, I switch focus to counterfactual reasoning—the ability to reflect on how different decisions could have led events of the past to turn out differently. In that context, I propose algorithmic methods to help a human decision maker improve their future decisions by highlighting specific past decisions that could have led to better outcomes and may serve as a good learning signal. I conclude the thesis by proposing a computational model based on counterfactual simulations to predict the extent to which people hold a human and an AI agent responsible for an outcome in cases where the two agents are working together towards a common goal. The performance of this model is evaluated through an experiment with human participants conducted online.
Read more

Archive