Related papers: Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

URL: http://arxiv.org/abs/2403.02178v2
Date: Wed, 10 Jul 2024 19:15:24 GMT
Title: Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li,
Abstract summary: In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
Score: 102.72940700598055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, relying instead on introducing perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks. When applied to fine-tuning with GSM8K on Llama-2-7B, this method achieved a 5\% improvement in GSM8K accuracy and a 10\% improvement in GSM-IC accuracy over standard supervised fine-tuning with a few codes modified. Furthermore, it is complementary to existing methods. When integrated with related explicit data augmentation methods, it leads to improvements across five datasets of various augmentation methods, as well as two different base models. We further investigate the mechanisms behind this improvement through case studies and quantitative analysis, suggesting that our approach may provide superior support for the model in capturing long-distance dependencies, especially those related to questions. This enhancement could deepen understanding of the premises in questions and prior steps. Our code is available at Github.

Related papers

Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling. We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z)
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [14.786100203787194]
Large language models demonstrate exceptional performance in simple code generation tasks but face challenges in tackling complex problems. We propose a reasoning-augmented data generation process, SRA-MCTS, which guides the model to autonomously generate high-quality intermediate reasoning paths. Our method operates entirely through the model itself without requiring additional supervision.
arXiv Detail & Related papers (2024-11-17T12:31:04Z)
Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning [7.702162381335683]
Step-by-step Chain-of-Thought (CoT) inference has advanced the mathematical capabilities of large language models (LLMs) We propose a novel method called Step Guidance Reasoning without involving further model fine-tuning. Our method significantly improved the math performance, raising the accuracy on the AMC23 dataset from 30% to 57.5%, a relative improvement of 91.7%, and on the sampled level 5 problem of the MATH dataset, we achieved a relative accuracy improvement of 55.8%, increasing from 43% to 67%.
arXiv Detail & Related papers (2024-10-18T01:38:24Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance. Existing direct preference learning algorithms are originally designed for the single-turn chat task. We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
PORT: Preference Optimization on Reasoning Traces [1.7292887546437081]
This paper proposes using preference optimization methods on Chain-of-Thought steps in order to improve the mathematical reasoning performances of language models. Our approach leads to increased accuracy on the GSM8K and AQuA-RAT mathematical reasoning benchmarks for Falcon2-11B and Mistral-7B. The improved abilities transfer to non-mathematical tasks, including the ARC benchmark and symbolic reasoning challenges.
arXiv Detail & Related papers (2024-06-23T09:51:06Z)
Large-Scale Meta-Learning with Continual Trajectory Shifting [76.29017270864308]
We show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale tasks. In order to increase the frequency of meta-updates, we propose to estimate the required shift of the task-specific parameters. We show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence.
arXiv Detail & Related papers (2021-02-14T18:36:33Z)
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions. Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.