Related papers: Active Prompting with Chain-of-Thought for Large Language Models

Active Prompting with Chain-of-Thought for Large Language Models

URL: http://arxiv.org/abs/2302.12246v5
Date: Sun, 21 Jul 2024 08:01:00 GMT
Title: Active Prompting with Chain-of-Thought for Large Language Models
Authors: Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang,
Abstract summary: This paper proposes a new method, Active-Prompt, to adapt large language models to different tasks. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks.
Score: 26.5029080638055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.

Related papers

Decompositional Reasoning for Graph Retrieval with Large Language Models [1.034893617526558]
Large Language Models (LLMs) excel at many NLP tasks, but struggle with multi-hop reasoning and factual consistency.<n>We propose a novel retrieval approach that integrates textual knowledge graphs into the LLM reasoning process via query decomposition.<n>Our method decomposes complex questions into sub-questions, retrieves relevant textual subgraphs, and composes a question-specific knowledge graph to guide answer generation.
arXiv Detail & Related papers (2025-06-16T11:44:28Z)
Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
Active Task Disambiguation with LLMs [48.54945212561785]
We introduce a formal definition of task ambiguity and frame the problem of task disambiguation through the lens of Bayesian Experimental Design. Our proposed approach of active task disambiguation enables LLM agents to generate targeted questions maximizing the information gain. Empirical results demonstrate that this form of question selection leads to more effective task disambiguation in comparison to approaches relying on reasoning solely within the space of questions.
arXiv Detail & Related papers (2025-02-06T20:20:22Z)
EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning [5.172620636569522]
Large language models (LLMs) have enabled in-context learning (ICL), allowing LLMs to acquire proficiency in a specific task using only a few demonstration samples (exemplars) A critical challenge in ICL is the selection of optimal exemplars, which can be either task-specific (static) or test-example-specific (dynamic)
arXiv Detail & Related papers (2024-11-06T12:48:04Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure [0.0]
We propose a benchmark that focuses on a specific aspect of reasoning ability: the direct evaluation of multi-step inference. Our dataset comprises pairs of explicit instructions and corresponding questions, where the procedures necessary for solving the questions are entirely detailed within the instructions. By constructing problems that require varying numbers of steps to solve and evaluating responses at each step, we enable a thorough assessment of state-of-the-art LLMs' ability to follow instructions.
arXiv Detail & Related papers (2024-10-04T03:21:24Z)
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks [0.0]
This paper examines how the sequencing of images and text within multi-modal prompts influences the reasoning performance of large language models (LLMs) For simpler tasks involving a single image, modality sequencing had a clear impact on accuracy. In more complex tasks involving multiple images and intricate reasoning steps, the effect of sequencing diminished, likely due to the increased cognitive demands of the task.
arXiv Detail & Related papers (2024-10-04T00:55:15Z)
Seek and Solve Reasoning for Table Question Answering [49.006950918895306]
This paper reveals that the reasoning process during task simplification may be more valuable than the simplified tasks themselves. We propose a Seek-and-solving pipeline that instructs the LLM to first seek relevant information and then answer questions. We distill a single-step TQA-solving prompt from this pipeline, using demonstrations with SS-CoT paths to guide the LLM in solving complex TQA tasks.
arXiv Detail & Related papers (2024-09-09T02:41:00Z)
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. How to select new tasks to improve the performance and generalizability of IT models remains an open question. We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z)
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models. TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z)
Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation [18.531941086922256]
Few-shot question answering (QA) aims at precisely discovering answers to a set of questions from context passages. We develop Gotta, a Generative prOmpT-based daTa Augmentation framework. Inspired by the human reasoning process, we propose to integrate the cloze task to enhance few-shot QA learning.
arXiv Detail & Related papers (2023-06-07T01:44:43Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.