Princeton Alg-ML Seminar

Alg-ML is a weekly machine learning theory seminar primarily attended by the research groups of
Prof. Sanjeev Arora, Prof. Elad Hazan, and Prof. Boris Hanin.

We discuss recent advances in algorithm design and theoretical machine learning.

Time: Tuesdays, 12:15–1:15 pm ET
Lunch: Usually at 12:00 pm
Location: CS 402

Open to all members of the Princeton community!

For spring 2026, the seminar is organized by Gon Buzaglo and Anand Brahmbhatt.

Subscribe to the alg-ml mailing list and the Google calendar.

Spring 2026 Schedule

February 10, 2026

Abhishek Panigrahi Princeton University

CS 402 · 12:15–1:15 pm

See abstract

Title: Making LLMs better teachers

Abstract: Training capable small language models is a central challenge, yet existing distillation methods treat teachers as static supervision sources. I argue that effective learning depends on how a small model learns from a bigger one and when it learns it. I show that intermediate teacher checkpoints reveal implicit learning trajectories, and that aligning students to these trajectories yields provable sample-complexity benefits.

February 17, 2026

Yotam Alexander Tel Aviv University

CS 402 · 12:15–1:15 pm

See abstract

Title: Implicit Biases in Transformers and SSMs: Distribution Shifts Can Improve Generalization.

Abstract: Modern Large Language Models (LLMs) are typically based on Transformers and/or Structured State Space Models (SSMs), and tend to generalize well even under a distribution shift between training and test data. Conventional wisdom attributes this generalization to implicit biases induced by architectures and the gradient-based algorithms that train them. This talk will describe a series of works theoretically analyzing and empirically evaluating implicit biases in Transformers and SSMs. Beginning with Transformers, I will consider Reinforcement Learning with outcome-based supervision (as in, e.g., DeepSeek-R1), and show that on a graph traversal task, if training data includes simple examples then an implicit bias admits generalization via step-by-step reasoning (Chain-of-Thought), whereas if training data does not include simple examples then learning is intractable. Continuing to SSMs, I will consider a teacher-student setting, and show that if training data is generic then an implicit bias admits generalization, yet there are cleanly labeled examples whose inclusion in training entirely disrupts generalization. These findings carry a counterintuitive message: for both Transformers and SSMs, it is sometimes beneficial to deliberately introduce a distribution shift to training data. Further research into the potential benefits of distribution shifts for Transformers and SSMs may pave the way to more effective curricula for training modern LLMs.

February 24, 2026

Alessandro Achille AWS / Caltech

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

March 3, 2026

Zak Mhammedi Google

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

March 17, 2026

Aaron Roth University of Pennsylvania

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

March 24, 2026

Surbhi Goel University of Pennsylvania

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

March 31, 2026

Matus Telgarski New York University

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

April 7, 2026

Philippe Rigollet MIT

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

April 14, 2026

Alex Meterez Harvard University

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD

April 21, 2026

Dylan Foster Microsoft Research (NYC)

CS 402 · 12:15–1:15 pm

See abstract

Title: TBD

Abstract: TBD