Related papers: Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

URL: http://arxiv.org/abs/2505.19949v1
Date: Mon, 26 May 2025 13:15:26 GMT
Title: Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
Authors: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng,
Abstract summary: Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding.<n>We leverage influence functions to attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens.<n>High-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning.
Score: 8.540135660509058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax.

Related papers

CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning [8.197518276987989]
Code reasoning is a fundamental capability for large language models (LLMs) in the code domain.<n>Prior approaches mainly rely on supervised fine-tuning to improve performance in code reasoning tasks.<n>We argue this is due to two core issues: the low quality of training data and the limitations of supervised fine-tuning.<n>We propose CodeReasoner, a framework that spans both dataset construction and a two-stage training process.
arXiv Detail & Related papers (2025-07-23T14:26:58Z)
Improving LLMs' Generalized Reasoning Abilities by Graph Problems [31.07779603207159]
Graph Problem Reasoning (GPR) tasks require sophisticated logical and relational reasoning.<n>We introduce GraphPile, the first large-scale corpus specifically designed for CPT using GPR data.<n>We train GraphMind on popular base models Llama 3 and 3.1, as well as Gemma 2, achieving up to 4.9 percent higher accuracy in mathematical reasoning.
arXiv Detail & Related papers (2025-07-23T03:19:57Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z)
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models [14.403953640255823]
We introduce a novel score used as a reward signal within a reinforcement learning framework to guide models toward generating correct and concise reasoning traces.<n>This score is evaluated by a large language model acting as a judge, enabling dynamic, context-aware feedback beyond simple token length.<n>Our method achieves state-of-the-art efficiency-accuracy trade-offs on the MATH dataset, reducing token usage by up to 31x on simple problems while improving accuracy by 7%, and on the hardest problems, it outperforms full reasoning by +7.5% accuracy with up to 3.6x fewer tokens.
arXiv Detail & Related papers (2025-05-22T19:56:35Z)
Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 78% with minimal accuracy loss across 15 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
CoinMath: Harnessing the Power of Coding Instruction for Math LLMs [34.07259769892295]
Large Language Models (LLMs) have shown strong performance in solving mathematical problems.<n>Best practice to leverage coding instruction data to enhance mathematical reasoning remains underexplored.<n> CoinMath generates a variety of code-based rationales incorporating concise comments, descriptive naming conventions, and hardcoded solutions.
arXiv Detail & Related papers (2024-12-16T12:21:11Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
Data Interpreter: An LLM Agent For Data Science [43.13678782387546]
Large Language Model (LLM)-based agents have shown effectiveness across many applications. However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging. We present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end.
arXiv Detail & Related papers (2024-02-28T19:49:55Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.