2025 Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics Zena Al-Khalili, Nick Howell, and Dietrich Klakow 2025 arXiv