Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
- URL: http://arxiv.org/abs/2407.03181v1
- Date: Wed, 3 Jul 2024 15:01:18 GMT
- Title: Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
- Authors: Haritz Puerto, Tilek Chubakov, Xiaodan Zhu, Harish Tayyar Madabushi, Iryna Gurevych,
- Abstract summary: We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
- Score: 63.36637269634553
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Requiring a Large Language Model to generate intermediary reasoning steps has been shown to be an effective way of boosting performance. In fact, it has been found that instruction tuning on these intermediary reasoning steps improves model performance. In this work, we present a novel method of further improving performance by requiring models to compare multiple reasoning chains before generating a solution in a single inference step. We call this method Divergent CoT (DCoT). We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, LLMs. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT consistently improves performance over the CoT baseline across model families and scales (1.3B to 70B). Through a combination of empirical and manual evaluation, we additionally show that these performance gains stem from models generating multiple divergent reasoning chains in a single inference step, indicative of the enabling of self-correction in language models. Our code and data are publicly available at https://github.com/UKPLab/arxiv2024-divergent-cot.
Related papers
- SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought [37.53215651690168]
Chain of Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step by step thinking.<n>While promising, CoT-based approaches often require costly pretraining and lack a principled framework for how reasoning should evolve.<n>We propose SCOUT, a lightweight fine tuning framework that enables Flow CoT style reasoning without the need for pretraining.
arXiv Detail & Related papers (2025-05-30T03:43:24Z) - Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Platonic Grounding for Efficient Multimodal Language Models [22.715168904364756]
We motivate and propose a simple modification to existing multimodal frameworks that rely on aligning pretrained models.
Our work also has implications for combining pretrained models into larger systems efficiently.
arXiv Detail & Related papers (2025-04-27T18:56:26Z) - Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations [2.992602379681373]
We show that multi-modal fine-tuning can achieve notable OoDD performance.
We propose a training objective that enhances cross-modal alignment by regularizing the distances between image and text embeddings of ID data.
arXiv Detail & Related papers (2025-03-24T16:00:21Z) - Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [89.50068130832635]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation MLLMs by multimodal knowledge.<n>We propose Chain-of-Description for step-by-step visual understanding and integrate structured Chain-of-Thought (CoT) reasoning to support in-depth multimodal reasoning.<n>Experiments demonstrate SIcog's effectiveness in developing MLLMs with enhanced multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z) - Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models [32.71672086718058]
Few-shot Chain-of-Thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs)<n>We observe that isolated segments, words, or tokens within CoT demonstrations can unexpectedly disrupt the generation process of LLMs.<n>We propose a Few-shot Attention Intervention method (FAI) that dynamically analyzes the attention patterns of demonstrations to accurately identify these tokens.
arXiv Detail & Related papers (2025-03-14T07:46:33Z) - Rethinking Chain-of-Thought from the Perspective of Self-Training [10.722453877596998]
Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs.<n>We propose a novel CoT framework to improve reasoning performance.<n>Our framework integrates two key components: (i) a task-specific prompt module that optimize the initial reasoning process, and (ii) an adaptive reasoning module that dynamically refines the reasoning process.
arXiv Detail & Related papers (2024-12-14T13:12:50Z) - DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization [61.492590008258986]
Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs.
We propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains.
arXiv Detail & Related papers (2024-11-21T12:02:39Z) - SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [14.786100203787194]
Large language models demonstrate exceptional performance in simple code generation tasks but face challenges in tackling complex problems.
We propose a reasoning-augmented data generation process, SRA-MCTS, which guides the model to autonomously generate high-quality intermediate reasoning paths.
Our method operates entirely through the model itself without requiring additional supervision.
arXiv Detail & Related papers (2024-11-17T12:31:04Z) - Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation [34.042565099565934]
We propose a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans.
Results show that compared to fine-tuning directly with CoT data, our approach achieves a better performance on alleviating arranging bottleneck.
arXiv Detail & Related papers (2024-10-22T08:38:50Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Pattern-Aware Chain-of-Thought Prompting in Large Language Models [26.641713417293538]
Chain-of-thought (CoT) prompting can guide language models to engage in complex multi-step reasoning.
We show that the underlying reasoning patterns play a more crucial role in such tasks.
We propose Pattern-Aware CoT, a prompting method that considers the diversity of demonstration patterns.
arXiv Detail & Related papers (2024-04-23T07:50:00Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping [53.454408491386886]
bootstrapping self-alignment markedly surpasses the single-round approach.
We propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance.
Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance.
arXiv Detail & Related papers (2024-02-12T12:30:42Z) - CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Differentiable Retrieval Augmentation via Generative Language Modeling
for E-commerce Query Intent Classification [8.59563091603226]
We propose Differentiable Retrieval Augmentation via Generative lANguage modeling(Dragan) to address this problem by a novel differentiable reformulation.
We demonstrate the effectiveness of our proposed method on a challenging NLP task in e-commerce search, namely query intent classification.
arXiv Detail & Related papers (2023-08-18T05:05:35Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations.
We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.