Mechanistic interpretability of neural networks
Jonas Fischer
MPI-INF - D2
06 Nov 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Modern machine learning (ML) has largely been driven by neural networks,
delivering outstanding results not only in traditional ML applications, but
also permeating into classical sciences solving open problems in physics,
chemistry, and biology. This success came at a cost, as typical neural network
architectures are inherently complex and their reasoning processes opaque. In
domains where insights weigh more than prediction, such as modeling of systems
biology, or in high-stakes decision making, such as healthcare and finance, ...
Modern machine learning (ML) has largely been driven by neural networks,
delivering outstanding results not only in traditional ML applications, but
also permeating into classical sciences solving open problems in physics,
chemistry, and biology. This success came at a cost, as typical neural network
architectures are inherently complex and their reasoning processes opaque. In
domains where insights weigh more than prediction, such as modeling of systems
biology, or in high-stakes decision making, such as healthcare and finance, the
reasoning process of a neural network however needs to be transparent.
Recently, the mechanistic interpretation of this reasoning process has gained
significant attention, which is concerned with understanding the *internal
reasoning process* of a network, including what information particular neurons
respond to and how these specific neurons are organized into larger circuits.
In this talk, I will (1) gently introduce the topic of mechanistic
interpretability in machine learning, (2) show how to discover mechanistic
circuits within a neural network, (3) discuss the relevance of mechanistic
interpretability in real-world applications, and (4) discuss what is still
missing in the field.
Read more