Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
- URL: http://arxiv.org/abs/2411.19943v3
- Date: Mon, 13 Jan 2025 06:53:56 GMT
- Title: Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
- Authors: Zicheng Lin, Tian Liang, Jiahao Xu, Qiuzhi Lin, Xing Wang, Ruilin Luo, Chufan Shi, Siheng Li, Yujiu Yang, Zhaopeng Tu,
- Abstract summary: Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.
We present a novel framework for identifying these tokens through rollout sampling.
We show that identifying and replacing critical tokens significantly improves model accuracy.
- Score: 53.51560766150442
- License:
- Abstract: Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept of critical tokens -- elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling and demonstrate their substantial divergence from traditional error tokens. Through extensive experiments on datasets such as GSM8K and MATH500, we show that identifying and replacing critical tokens significantly improves model accuracy. We propose an efficient methodology for pinpointing these tokens in large-scale datasets using contrastive estimation and extend this framework to enhance model training processes with direct preference optimization (DPO). Experimental results on GSM8K and MATH500 benchmarks with the widely used models Llama-3 (8B and 70B) and Deepseek-math (7B) demonstrate the effectiveness of the proposed approach, cDPO. Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction. Our code, annotated datasets, and trained models are available at https://github.com/chenzhiling9954/Critical-Tokens-Matter to support and encourage future research in this promising field.
Related papers
- Improving the portability of predicting students performance models by using ontologies [0.0]
One of the main current challenges in Educational Data Mining and Learning Analytics is the portability of predictive models.
We propose the utilization of an ontology that uses a taxonomy of actions that summarises students interactions with the Moodle learning management system.
Results indicate that the use of the proposed ontology improves the portability of the models in terms of predictive accuracy.
arXiv Detail & Related papers (2024-10-09T18:18:54Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models [35.29961848648335]
Large Language Models (LLMs) have demonstrated impressive problem-solving capabilities in mathematics through step-by-step reasoning chains.
They are susceptible to reasoning errors that impact the quality of subsequent reasoning chains and the final answer due to their autoregressive token-by-token generating nature.
Recent works have proposed adopting external verifiers to guide the generation of reasoning paths, but existing works utilize models that have been trained with step-by-step labels.
arXiv Detail & Related papers (2024-07-12T13:16:50Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Identifying and Mitigating Spurious Correlations for Improving
Robustness in NLP Models [19.21465581259624]
Many problems can be attributed to models exploiting spurious correlations, or shortcuts between the training data and the task labels.
In this paper, we aim to automatically identify such spurious correlations in NLP models at scale.
We show that our proposed method can effectively and efficiently identify a scalable set of "shortcuts", and mitigating these leads to more robust models in multiple applications.
arXiv Detail & Related papers (2021-10-14T21:40:03Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.