Events

Upcoming events

Cracking System Challenges in Optical Data Center Networks

Yiting Xia MPI-INF - RG 2
05 Mar 2025, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Optical data center networks (DCNs) are transforming cloud infrastructure, yet current architectures remain closed ecosystems tightly bound to specific optical hardware. In this talk, we unveil an innovative open framework that decouples software from hardware, empowering researchers and practitioners to freely explore and deploy diverse software solutions across multiple optical platforms. Building on this flexible foundation, we tackle three critical system challenges—time synchronization, routing, and transport protocols—to enable optical DCNs to achieve nanosecond-precision, high throughput, and ultra-low latency. ...
Optical data center networks (DCNs) are transforming cloud infrastructure, yet current architectures remain closed ecosystems tightly bound to specific optical hardware. In this talk, we unveil an innovative open framework that decouples software from hardware, empowering researchers and practitioners to freely explore and deploy diverse software solutions across multiple optical platforms. Building on this flexible foundation, we tackle three critical system challenges—time synchronization, routing, and transport protocols—to enable optical DCNs to achieve nanosecond-precision, high throughput, and ultra-low latency. This presentation highlights the fundamental design shifts brought by optical DCNs and demonstrates how our breakthrough solutions surpass traditional DCN performance, setting new standards for future cloud networks.
Read more

Efficient and Responsible Data Privacy

Tamalika Mukherjee Purdue University (hosted by Yixin Zou)
10 Mar 2025, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB1SMMW106
CIS@MPG Colloquium
Collecting user data is crucial for advancing machine learning, social science, and government policies, but the privacy of the users whose data is being collected is a growing concern. Organizations often deal with a massive volume of user data on a regular basis — the storage and analysis of such data is computationally expensive. Thus developing algorithms that not only preserve formal privacy but also perform efficiently is a challenging and important necessity. Since preserving privacy inherently involves some data distortion which potentially sacrifices accuracy for smaller populations, ...
Collecting user data is crucial for advancing machine learning, social science, and government policies, but the privacy of the users whose data is being collected is a growing concern. Organizations often deal with a massive volume of user data on a regular basis — the storage and analysis of such data is computationally expensive. Thus developing algorithms that not only preserve formal privacy but also perform efficiently is a challenging and important necessity. Since preserving privacy inherently involves some data distortion which potentially sacrifices accuracy for smaller populations, a complementary challenge is to develop responsible privacy practices that ensure that the resulting privacy implementations are equitable. My talk will focus on Differential Privacy (DP) --- a rigorous mathematical framework that preserves the privacy of individuals in the input dataset, and explore the nuanced landscape of privacy-preserving algorithms through three interconnected perspectives: the systematic design of both time and space-efficient private algorithms, and strategic approaches to creating equitable privacy practices.
Read more

Improving Trustworthiness in Foundation Models: Assessing, Mitigating, and Analyzing ML Risks

Chulin Xie University of Illinois Urbana-Champaign (hosted by Jana Hofmann)
12 Mar 2025, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB1SMMW106
CIS@MPG Colloquium
As machine learning (ML) models continue to scale in size and capability, they expand the surface area for safety and privacy risks, raising concerns about model trustworthiness and responsible data use. My research uncovers and mitigates these risks. In this presentation, I will focus on the three cornerstones of trustworthy foundation models and agents: safety, privacy, and generalization. For safety, I will introduce our comprehensive benchmarks designed to evaluate trustworthiness risks in Large Language Models (LLMs) and LLM-based code agents. ...
As machine learning (ML) models continue to scale in size and capability, they expand the surface area for safety and privacy risks, raising concerns about model trustworthiness and responsible data use. My research uncovers and mitigates these risks. In this presentation, I will focus on the three cornerstones of trustworthy foundation models and agents: safety, privacy, and generalization. For safety, I will introduce our comprehensive benchmarks designed to evaluate trustworthiness risks in Large Language Models (LLMs) and LLM-based code agents. For privacy, I will present a solution for protecting data privacy with a synthetic text generation algorithm under differential privacy guarantees. The algorithm requires only LLMs inference API access without model training, enabling efficient safe text sharing. For generalization, I will introduce our study on the interplay between the memorization and generalization of LLMs in logical reasoning during the supervised fine-tuning (SFT) stage. Finally, I will conclude with my future research plan for assessing and improving trustworthiness in foundation model-powered ML systems.
Read more

Recent events

Illuminating Generative AI: Mapping Knowledge in Large Language Models

Abhilasha Ravichander University of Washington (hosted by Krishna Gummadi)
04 Mar 2025, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Millions of everyday users are interacting with technologies built with generative AI, such as voice assistants, search engines, and chatbots. While these AI-based systems are being increasingly integrated into modern life, they can also magnify risks, inequities, and dissatisfaction when providers deploy unreliable systems. A primary obstacle to having reliable systems is the opacity of the underlying large language models - we lack a systematic understanding of how models work, where critical vulnerabilities may arise, why they are happening, ...
Millions of everyday users are interacting with technologies built with generative AI, such as voice assistants, search engines, and chatbots. While these AI-based systems are being increasingly integrated into modern life, they can also magnify risks, inequities, and dissatisfaction when providers deploy unreliable systems. A primary obstacle to having reliable systems is the opacity of the underlying large language models - we lack a systematic understanding of how models work, where critical vulnerabilities may arise, why they are happening, and how models must be redesigned to address them. In this talk, I will first describe my work in investigating large language models to illuminate when and how they acquire knowledge and capabilities. Then, I will describe my work on building methods to enable greater data transparency for large language models, that allows stakeholders to make sense of the information available to models. Finally, I will describe my work on understanding how this information can get distorted in large language models, and implications for building the next generation of robust AI systems.
Read more

Building the Tools to Program a Quantum Computer

Chenhui Yuan MIT CSAIL (hosted by Catalin Hritcu)
24 Feb 2025, 10:00 pm - 11:00 pm
Bochum building MPI-SP, room MB1SMMW106
CIS@MPG Colloquium
Bringing the promise of quantum computation into reality requires not only building a quantum computer but also correctly programming it to run a quantum algorithm. To obtain asymptotic advantage over classical algorithms, quantum algorithms rely on the ability of data in quantum superposition to exhibit phenomena such as interference and entanglement. In turn, an implementation of the algorithm as a program must correctly orchestrate these phenomena in the states of qubits. Otherwise, the algorithm would yield incorrect outputs or lose its computational advantage. ...
Bringing the promise of quantum computation into reality requires not only building a quantum computer but also correctly programming it to run a quantum algorithm. To obtain asymptotic advantage over classical algorithms, quantum algorithms rely on the ability of data in quantum superposition to exhibit phenomena such as interference and entanglement. In turn, an implementation of the algorithm as a program must correctly orchestrate these phenomena in the states of qubits. Otherwise, the algorithm would yield incorrect outputs or lose its computational advantage. Given a quantum algorithm, what are the challenges and costs to realizing it as a program that can run on a physical quantum computer? In this talk, I answer this question by showing how basic programming abstractions upon which many quantum algorithms rely – such as data structures and control flow – can fail to work correctly or efficiently on a quantum computer. I then show how we can leverage insights from programming languages to re-invent the software stack of abstractions, libraries, and compilers to meet the demands of quantum algorithms. This approach holds out a promise of expressive and efficient tools to program a quantum computer and practically realize its computational advantage.
Read more

On Fairness, Invariance and Memorization in Machine Decision and Deep Learning Algorithms

Till Speicher Max Planck Institute for Software Systems
24 Feb 2025, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
As learning algorithms become more capable, they are used to tackle an increasingly large spectrum of tasks. Their applications range from understanding images, speech and natural language to making socially impactful decisions, such as about people's eligibility for loans and jobs. Therefore, it is important to better understand both the consequences of algorithmic decisions and the mechanisms by which algorithms arrive at their outputs. Of particular interest in this regard are fairness when algorithmic decisions impact people's lives and the behavior of deep learning algorithms, ...
As learning algorithms become more capable, they are used to tackle an increasingly large spectrum of tasks. Their applications range from understanding images, speech and natural language to making socially impactful decisions, such as about people's eligibility for loans and jobs. Therefore, it is important to better understand both the consequences of algorithmic decisions and the mechanisms by which algorithms arrive at their outputs. Of particular interest in this regard are fairness when algorithmic decisions impact people's lives and the behavior of deep learning algorithms, the most powerful but also opaque type of learning algorithm. To this end, this thesis makes two contributions: First, we study fairness in algorithmic decision-making. At a conceptual level, we introduce a metric for measuring unfairness in algorithmic decisions based on inequality indices from the economics literature. We show that this metric can be used to decompose the overall unfairness for a given set of users into between- and within-subgroup components and highlight potential tradeoffs between them, as well as between fairness and accuracy. At an empirical level, we demonstrate the necessity for studying fairness in algorithmically controlled systems by exposing the potential for discrimination that is enabled by Facebook's advertising platform. In this context, we demonstrate how advertisers can target ads to exclude users belonging to protected sensitive groups, a practice that is illegal in domains such as housing, employment and finance, and highlight the necessity for better mitigation methods.

The second contribution of this thesis is aimed at better understanding the mechanisms governing the behavior of deep learning algorithms. First, we study the role that invariance plays in learning useful representations. We show that the set of invariances possessed by representations is of critical importance in determining whether they are useful for downstream tasks, more important than many other factors commonly considered to determine transfer performance. Second, we investigate memorization in large language models, which have recently become very popular. By training models to memorize random strings, we uncover a rich and surprising set of dynamics during the memorization process. We find that models undergo two phases during memorization, that strings with lower entropy are harder to memorize, that the memorization dynamics evolve during repeated memorization and that models can recall tokens in random strings with only a very restricted amount of information.
Read more

Archive