Publications

You can also find my articles on my Google Scholar profile.

Selected Publications

(♩ led the theoretical parts, ♫ co-led the empirical parts; * indicates equal contribution)

On the Ability of Transformers to Verify Plans. ICML 2026. [pdf]
Yash Sarrof♩, Yupei Du♫, Katharina Stein♫, Alexander Koller, Sylvie Thiébaux, and Michael Hahn.
TL;DR: We study which types of planning problems can be verified by Transformer language models in a length-generalizable way. Our results identify a large class of classical planning domains for which transformers can provably learn to verify long plans.
Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior. ICML 2026. [pdf]
Florian Eichin, Yupei Du, Philipp Mondorf, Barbara Plank, Michael A. Hedderich.
TL;DR: We introduce ExPLAIND—an interpretability framework for jointly attributing model components, data, and training dynamics and apply it to investigate Grokking.
Reason to Rote: Rethinking Memorization in Reasoning. EMNLP 2025. [pdf]
Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, and Barbara Plank.
TL;DR: We mechanistically study benign memorization in language models in reasoning tasks, and find that memorization does not replace but rather is built on generalization.
Language models can learn implicit multi-hop reasoning, but only if they have lots of training data. EMNLP 2025. [pdf]
Yuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn*, and Alexander Koller*.
TL;DR: We studied the implicit multi-hop reasoning capabilities of language models, and find that they require an exponentially increasing amount of training data to perform well as the depth grows, and curriculum learning can substantially mitigate this.
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics. COLING 2025. [pdf]
Yupei Du, Albert Gatt, and Dong Nguyen.
TL;DR: We show that the training dynamics of an efficient but weak model can be transferred to much more capable models to achieve better robustness and efficiency.
Understanding Gender Bias in Knowledge Base Embeddings. ACL 2022. [pdf]
Yupei Du, Qi Zheng, Yuanbin Wu, Man Lan, Yan Yang, Meirong Ma.
TL;DR: We propose methods to both quantify and trace the origins of gender biases in knowledge base (embeddings), using a closed-form approximation of influence functions.
Exploring Human Gender Stereotypes with Word Association Test. EMNLP 2019. [pdf]
Yupei Du, Yuanbin Wu, Man Lan.
TL;DR: We use label propagation to quantify and visualize how gender biases are transferred and reinforced through word associations, and therefore offer a large-scale dataset of word-level gender bias scores.