Related papers: Progressive-Hint Prompting Improves Reasoning in Large Language Models

Progressive-Hint Prompting Improves Reasoning in Large Language Models

URL: http://arxiv.org/abs/2304.09797v6
Date: Mon, 07 Oct 2024 04:28:04 GMT
Title: Progressive-Hint Prompting Improves Reasoning in Large Language Models
Authors: Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li,
Abstract summary: This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP) It enables automatic multiple interactions between users and Large Language Models (LLMs) by using previously generated answers as hints to progressively guide toward the correct answers. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient.
Score: 63.98629132836499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).

Related papers

Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations. A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis [31.953858122298517]
We propose a novel inference scaling strategy, CoT-based Synthesizer. It synthesizes superior answers by analyzing complementary information from multiple candidate responses. We show that our method significantly enhances performance, with gains of 11.8% for Llama3-8B and 10.3% for GPT-4o.
arXiv Detail & Related papers (2025-01-03T06:50:06Z)
Preference Optimization for Reasoning with Pseudo Feedback [100.62603571434167]
We introduce a novel approach to generate pseudo feedback for reasoning tasks by framing the labeling of solutions as an evaluation against associated test cases. We conduct experiments on both mathematical reasoning and coding tasks using pseudo feedback for preference optimization, and observe improvements across both tasks.
arXiv Detail & Related papers (2024-11-25T12:44:02Z)
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction [39.67266805233599]
Best-of-N decoding methods instruct large language models (LLMs) to generate multiple solutions, score each using a scoring function, and select the highest scored as the final answer to mathematical reasoning problems. We propose a novel prompting method named Stepwise Correction (StepCo) that helps LLMs identify and revise incorrect steps in their generated reasoning paths. The verify-then-revise process not only improves answer correctness but also reduces token consumption with fewer paths needed to generate.
arXiv Detail & Related papers (2024-10-16T18:18:42Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Large Language Models are Contrastive Reasoners [8.427805316635318]
We show how contrastive prompting significantly improves the ability of large language models to perform complex reasoning. Experiments on various large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods.
arXiv Detail & Related papers (2024-03-13T03:15:05Z)
Benchmarking and Improving Generator-Validator Consistency of Language Models [82.73914625520686]
inconsistency between generating and validating an answer is prevalent in language models (LMs) Even GPT-4, a state-of-the-art LM, is GV-consistent only 76% of the time. We find that this approach improves GV-consistency of Alpaca-30B from 60% to 93%.
arXiv Detail & Related papers (2023-10-03T07:23:22Z)
Toward Adversarial Training on Contextualized Language Representation [78.39805974043321]
This paper investigates adversarial training (AT) from the perspective of the contextualized language representation outputted by PLM encoders. We propose textitContextualized representation-Adversarial Training (CreAT) in which the attack is explicitly optimized to deviate the contextualized representation of the encoder. CreAT produces consistent performance gains on a wider range of tasks and is proven to be more effective for language pre-training where only the encoder part is kept for downstream tasks.
arXiv Detail & Related papers (2023-05-08T08:56:51Z)
Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models. For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.