Hadi Khalaf

hadikhalaf at g dot harvard dot edu

Hi, I'm Hadi! I'm a first-year Computer Science PhD student at Harvard. I am very fortunate to be advised by Prof. Flavio Calmon.

My research focuses on developing robust and scalable tools for AI post-training. I try to re-engineer optimization and evaluation workflows to target complex capabilities directly without relying on noisy proxies. By grounding these workflows in theoretical rigor, my goal is to help build AI systems that are capable of helping us address our most complex and important challenges.

News

08/25

I am at Princeton attending the Machine Learning Theory Summer School.

07/25

I am at ICML, presenting our work on reward hacking🔍 at the Models of Human Feedback for AI Alignment Workshop.

06/25

I am at University of Minnesota attending the North America School of Information Theory.

05/25

I just finished my first year of PhD at Harvard!

04/25

Our paper on discretion🔍 in AI alignment was accepted to FAccT 2025!

03/25

I am at Yale, giving a talk on discretion🔍 in AI alignment. Happy to share that this work got the Best Paper Award at the New England NLP workshop! You can check my slides here.

Research

Always happy to chat about research or potential collaborations! Check out my recent work.

ICML 2nd Workshop on Models of Human Feedback for AI Alignment 2025

Inference-Time Reward Hacking in Large Language Models

HK, Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio du Pin Calmon

TLDR We propose hedging as a lightweight and theoretically grounded strategy to mitigate reward hacking in inference-time alignment.

Paper Code Blog Post

ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2025

🏆 Best Paper Award - New England NLP Workshop

AI Alignment at Your Discretion

Maarten Buyl, HK, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio C. Vieira Machado, Flavio du Pin Calmon

TLDR We risk deploying unsafe AI systems if we ignore their discretion in applying alignment objectives.

Paper Code