Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
- URL: http://arxiv.org/abs/2402.06918v2
- Date: Wed, 26 Jun 2024 05:47:52 GMT
- Title: Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
- Authors: Zhen-Yu Zhang, Siwei Han, Huaxiu Yao, Gang Niu, Masashi Sugiyama,
- Abstract summary: Chain-of-thoughts (CoT) methods were proposed to guide large language models to reason step-by-step, enabling problem solving from simple to complex.
The evaluation from the large language model (LLMs) is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts.
In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts.
- Score: 70.30423016640749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the ability of the large language model (LLMs) to tackle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, enabling problem solving from simple to complex. State-of-the-art methods for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated by the LLM, guiding the generation of subsequent thoughts. However, a widespread yet understudied problem is that the evaluation from the LLM is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts. In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts with the noisy feedback from the LLM. In each round, we randomly pair intermediate thoughts and directly prompt the LLM to select the more promising one from each pair, allowing us to identify the most promising thoughts through an iterative process. To further alleviate the noise in the comparison, we incorporate techniques from ensemble learning and dueling bandits, proposing two variants of the algorithm. Experiments on three real-world tasks demonstrate the effectiveness of our proposed algorithm and verify the rationale of the pairwise comparison mechanism.
Related papers
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.
Our algorithm is modeled after the Bayesian theory-of-mind framework.
We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking [4.234183823376613]
We introduce Multiplex CoT (Chain of Thought), a method that enables LLMs to simulate a form of self-review while reasoning.
Multiplex CoT leverages the power of iterative reasoning, where the model generates an initial chain of thought and subsequently critiques and refines this reasoning.
arXiv Detail & Related papers (2025-01-20T12:54:57Z) - Hint Marginalization for Improved Reasoning in Large Language Models [24.67507932821155]
We present Marginalization, a novel and principled algorithmic framework to enhance the reasoning capabilities of Large Language Models (LLMs)
Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers.
Empirical evaluation on several benchmark datasets for arithmetic reasoning demonstrates the superiority of the proposed approach.
arXiv Detail & Related papers (2024-12-17T19:45:53Z) - Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding [1.3479499607624648]
Speculative decoding addresses bottleneck by introducing a two-stage framework: drafting and verification.
A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model.
This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches.
arXiv Detail & Related papers (2024-11-20T09:46:30Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning [0.0]
Iterative human engagement is a common and effective means of leveraging the advanced language processing power of large language models (LLMs)
We propose the Iteration of Thought (IoT) framework for enhancing LLM responses by generating "thought"-provoking prompts.
Unlike static or semi-static approaches, IoT adapts its reasoning path dynamically, based on evolving context.
arXiv Detail & Related papers (2024-09-19T09:44:17Z) - Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.
Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought
Reasoning in Large Language Models under Noisy Context [12.475979274233458]
We propose a novel prompting method, namely R$3$ prompting, for Chain-of-Thought (CoT) reasoning under noisy context.
Our experiments show that R$3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context.
arXiv Detail & Related papers (2023-10-25T10:34:02Z) - Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts [65.15322403136238]
We propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts.
For each question, XoT always begins with selecting the most suitable method then executes each method iteratively.
Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors.
arXiv Detail & Related papers (2023-10-23T07:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.