2025 Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics Zena Al-Khalili, Nick Howell, and Dietrich Klakow In Proceedings of the Workshop on Generation, Evaluation, and Metrics @ ACL 2025, 2025 arXiv