publications

2025

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics

Zena Al-Khalili, Nick Howell, and Dietrich Klakow

In Proceedings of the Workshop on Generation, Evaluation, and Metrics @ ACL 2025, 2025

arXiv