Related papers: Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought

Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought

URL: http://arxiv.org/abs/2402.06918v2
Date: Wed, 26 Jun 2024 05:47:52 GMT
Title: Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
Authors: Zhen-Yu Zhang, Siwei Han, Huaxiu Yao, Gang Niu, Masashi Sugiyama,
Abstract summary: Chain-of-thoughts (CoT) methods were proposed to guide large language models to reason step-by-step, enabling problem solving from simple to complex. The evaluation from the large language model (LLMs) is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts. In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts.
Score: 70.30423016640749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To improve the ability of the large language model (LLMs) to tackle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, enabling problem solving from simple to complex. State-of-the-art methods for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated by the LLM, guiding the generation of subsequent thoughts. However, a widespread yet understudied problem is that the evaluation from the LLM is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts. In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts with the noisy feedback from the LLM. In each round, we randomly pair intermediate thoughts and directly prompt the LLM to select the more promising one from each pair, allowing us to identify the most promising thoughts through an iterative process. To further alleviate the noise in the comparison, we incorporate techniques from ensemble learning and dueling bandits, proposing two variants of the algorithm. Experiments on three real-world tasks demonstrate the effectiveness of our proposed algorithm and verify the rationale of the pairwise comparison mechanism.

Related papers

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents. Our algorithm is modeled after the Bayesian theory-of-mind framework. We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model. We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking [4.234183823376613]
We introduce Multiplex CoT (Chain of Thought), a method that enables LLMs to simulate a form of self-review while reasoning. Multiplex CoT leverages the power of iterative reasoning, where the model generates an initial chain of thought and subsequently critiques and refines this reasoning.
arXiv Detail & Related papers (2025-01-20T12:54:57Z)
Refining Answer Distributions for Improved Large Language Model Reasoning [24.67507932821155]
We present Refined Answer Distributions, a novel and principled algorithmic framework to enhance the reasoning capabilities of Large Language Models (LLMs) Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode -- the most likely answer.
arXiv Detail & Related papers (2024-12-17T19:45:53Z)
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding [1.3479499607624648]
Speculative decoding addresses bottleneck by introducing a two-stage framework: drafting and verification. A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model. This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches.
arXiv Detail & Related papers (2024-11-20T09:46:30Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning [0.0]
Iterative human engagement is a common and effective means of leveraging the advanced language processing power of large language models (LLMs) We propose the Iteration of Thought (IoT) framework for enhancing LLM responses by generating "thought"-provoking prompts. Unlike static or semi-static approaches, IoT adapts its reasoning path dynamically, based on evolving context.
arXiv Detail & Related papers (2024-09-19T09:44:17Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer. Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z)
Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round. This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z)
R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context [12.475979274233458]
We propose a novel prompting method, namely R$3$ prompting, for Chain-of-Thought (CoT) reasoning under noisy context. Our experiments show that R$3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context.
arXiv Detail & Related papers (2023-10-25T10:34:02Z)
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts [65.15322403136238]
We propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts. For each question, XoT always begins with selecting the most suitable method then executes each method iteratively. Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors.
arXiv Detail & Related papers (2023-10-23T07:02:20Z)
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking [40.22335733384235]
Chain-of-Thought(CoT) prompting and its variants explore equipping large language models with high-level reasoning abilities. We propose textbfInferential textbfExclusion textbfPrompting (IEP), a novel prompting that combines the principles of elimination and inference.
arXiv Detail & Related papers (2023-10-18T21:42:16Z)
Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs [80.74263278847063]
The integration of retrieved passages and large language models (LLMs) has significantly contributed to improving open-domain question answering. This paper investigates different methods of combining retrieved passages with LLMs to enhance answer generation.
arXiv Detail & Related papers (2023-08-24T05:26:54Z)
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [17.059322033670124]
We propose a novel strategy that propels Large Language Models through algorithmic reasoning pathways. Our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself.
arXiv Detail & Related papers (2023-08-20T22:36:23Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.