ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models
- URL: http://arxiv.org/abs/2411.12736v1
- Date: Tue, 19 Nov 2024 18:58:03 GMT
- Title: ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models
- Authors: Salma Kharrat, Fares Fourati, Marco Canini,
- Abstract summary: ACING is a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning problem.
We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks.
ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points.
- Score: 4.890873355984701
- License:
- Abstract: The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We propose ACING, a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning (RL) problem, known as the continuum bandit setting. ACING leverages an actor-critic-based method to optimize prompts, learning from non-differentiable reward signals. We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points. Furthermore, ACING not only recovers but also surpasses human-crafted expert instructions, achieving up to a 39 percentage point improvement against human benchmarks.
Related papers
- Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.
In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.
Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z) - GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers [52.17222304851524]
We introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning.
By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models.
GReaTer consistently outperforms previous state-of-the-art prompt optimization methods.
arXiv Detail & Related papers (2024-12-12T20:59:43Z) - Instruct or Interact? Exploring and Eliciting LLMs' Capability in Code Snippet Adaptation Through Prompt Engineering [19.019004855931676]
Large language models (LLMs) have confirmed their effectiveness in the code generation task with promising results.
Their performance on adaptation, a reuse-oriented and context-dependent code change prediction task, is still unclear.
We propose an interactive prompting approach to elicit LLMs' adaptation ability.
arXiv Detail & Related papers (2024-11-23T09:40:36Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.
Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.
Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching [20.607323649079845]
Low-Rank Adaptation (LoRA) has become a promising alternative to instruction fine-tuning.
PILLOW aims to improve LoRA's performance by a discrimination-based LLM ability.
PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods.
arXiv Detail & Related papers (2023-12-09T17:38:39Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - The Wisdom of Hindsight Makes Language Models Better Instruction
Followers [84.9120606803906]
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback.
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions.
arXiv Detail & Related papers (2023-02-10T12:16:38Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.