Related papers: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

URL: http://arxiv.org/abs/2406.09136v1
Date: Thu, 13 Jun 2024 14:07:02 GMT
Title: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin,
Abstract summary: Tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. Fine-tuning language models (LLMs) leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance. This is achieved through Chain of Preference Optimization (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT.
Score: 37.147529569445396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. This deliberation, however, comes at the cost of significantly increased inference complexity. In this work, we demonstrate that fine-tuning LLMs leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance, thereby avoiding the substantial inference burden. This is achieved through Chain of Preference Optimization (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT using the inherent preference information in the tree-search process. Extensive experimental results show that CPO significantly improves LLM performance in solving a variety of complex problems, including question answering, fact verification, and arithmetic reasoning, demonstrating its effectiveness. Our code is available at https://github.com/sail-sg/CPO.

Related papers

Compressing Chain-of-Thought in LLMs via Step Entropy [12.576398947428988]
Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency.<n>We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy.
arXiv Detail & Related papers (2025-08-05T11:48:18Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs [45.33160999781074]
Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs) We introduce UnCert-CoT, an approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism.
arXiv Detail & Related papers (2025-03-19T15:40:45Z)
Dynamic Parallel Tree Search for Efficient LLM Reasoning [102.16694475391665]
Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree. We propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference. Experiments on Qwen-2.5 and Llama-3 with Math500 and GSM8K datasets show that DPTS significantly improves efficiency by 2-4x on average.
arXiv Detail & Related papers (2025-02-22T14:13:37Z)
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks. We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs) However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy? In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z)
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [40.069109287947875]
We propose a novel reasoning framework called Forest-of-Thought (FoT) FoT integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems. FoT employs sparse activation strategies to select the most relevant reasoning paths, improving both efficiency and accuracy.
arXiv Detail & Related papers (2024-12-12T09:01:18Z)
LLMs Do Not Think Step-by-step In Implicit Reasoning [0.0]
Chain-of-Thought can remarkably enhance LLMs' performance on complex tasks. Many researches have attempted to use implicit CoT, which does not need LLMs to explicitly generate the intermediate steps. In this study, we probe the information of intermediate steps from the model's hidden states when it is performing implicit CoT.
arXiv Detail & Related papers (2024-11-24T14:38:59Z)
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought [61.588465852846646]
Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs) In this work, we introduce a novel reasoning boundary framework (RBF) to address these challenges.
arXiv Detail & Related papers (2024-10-08T05:26:28Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs) We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
Chain of Thoughtlessness? An Analysis of CoT in Planning [17.329365493094542]
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain. We find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class.
arXiv Detail & Related papers (2024-05-08T02:48:28Z)
Can Separators Improve Chain-of-Thought Prompting? [10.398343318429367]
Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs) Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting.
arXiv Detail & Related papers (2024-02-16T12:46:16Z)
Chain-of-Thought Reasoning Without Prompting [40.92854235219315]
CoT reasoning paths can be elicited from pre-trained language models by simply altering the textitdecoding process. The presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer.
arXiv Detail & Related papers (2024-02-15T18:55:41Z)
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations. We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.