ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models
- URL: http://arxiv.org/abs/2411.12736v1
- Date: Tue, 19 Nov 2024 18:58:03 GMT
- Title: ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models
- Authors: Salma Kharrat, Fares Fourati, Marco Canini,
- Abstract summary: ACING is a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning problem.
We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks.
ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points.
- Score: 4.890873355984701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We propose ACING, a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning (RL) problem, known as the continuum bandit setting. ACING leverages an actor-critic-based method to optimize prompts, learning from non-differentiable reward signals. We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points. Furthermore, ACING not only recovers but also surpasses human-crafted expert instructions, achieving up to a 39 percentage point improvement against human benchmarks.
Related papers
- Prompt-MII: Meta-Learning Instruction Induction for LLMs [40.37193886125562]
In this paper we propose a method to perform instruction induction, where we take training examples and reduce them to a compact but descriptive prompt.<n>We train on over 3,000 diverse classification datasets from the HuggingFace hub, and evaluate on 90 unseen tasks.<n> PROMPT-MII improves downstream model quality by 4-9 F1 points (10-20% relative), matching ICL performance while requiring 3-13x fewer tokens.
arXiv Detail & Related papers (2025-10-19T17:05:25Z) - UniErase: Towards Balanced and Precise Unlearning in Language Models [69.04923022755547]
Large language models (LLMs) require iterative updates to address the outdated information problem.<n>UniErase is a novel unlearning framework that demonstrates precision and balanced performances between knowledge unlearning and ability retaining.
arXiv Detail & Related papers (2025-05-21T15:53:28Z) - RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models [48.63476198469349]
We propose a task-objective-driven instruction selection framework RAISE.
RAISE incorporates the entire instruction fine-tuning process into optimization.
It selects instruction at each step based on the expected impact of instruction on model performance improvement.
arXiv Detail & Related papers (2025-04-09T21:17:52Z) - A Sequential Optimal Learning Approach to Automated Prompt Engineering in Large Language Models [14.483240353801074]
This paper proposes an optimal learning framework for automated prompt engineering.<n>It is designed to sequentially identify effective prompt features while efficiently allocating a limited evaluation budget.<n>Our framework provides a solution to deploying automated prompt engineering in a wider range applications.
arXiv Detail & Related papers (2025-01-07T03:51:10Z) - Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.
In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.
Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z) - GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers [52.17222304851524]
We introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning.
By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models.
GReaTer consistently outperforms previous state-of-the-art prompt optimization methods.
arXiv Detail & Related papers (2024-12-12T20:59:43Z) - Instruct or Interact? Exploring and Eliciting LLMs' Capability in Code Snippet Adaptation Through Prompt Engineering [19.019004855931676]
Large language models (LLMs) have confirmed their effectiveness in the code generation task with promising results.
Their performance on adaptation, a reuse-oriented and context-dependent code change prediction task, is still unclear.
We propose an interactive prompting approach to elicit LLMs' adaptation ability.
arXiv Detail & Related papers (2024-11-23T09:40:36Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning [0.08795040582681389]
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts.
We propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens.
Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding.
arXiv Detail & Related papers (2024-10-11T07:55:09Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.
Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.
Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models [11.77848664657788]
We show that instruction tuning is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities.
We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features.
We use our novel data synthesis method, DELIA, to transform biased features in instruction tuning into approximations of ideal features.
arXiv Detail & Related papers (2024-08-19T17:56:06Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching [20.607323649079845]
Low-Rank Adaptation (LoRA) has become a promising alternative to instruction fine-tuning.
PILLOW aims to improve LoRA's performance by a discrimination-based LLM ability.
PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods.
arXiv Detail & Related papers (2023-12-09T17:38:39Z) - Auto-Instruct: Automatic Instruction Generation and Ranking for
Black-Box Language Models [91.02730155418699]
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions.
We introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs.
In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions.
arXiv Detail & Related papers (2023-10-19T19:52:55Z) - From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes.
The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models.
Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Automatic Prompt Optimization with "Gradient Descent" and Beam Search [64.08364384823645]
Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts.
We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO)
APO uses minibatches of data to form natural language "gradients" that criticize the current prompt.
The gradients are then "propagated" into the prompt by editing the prompt in the opposite semantic direction of the gradient.
arXiv Detail & Related papers (2023-05-04T15:15:22Z) - The Wisdom of Hindsight Makes Language Models Better Instruction
Followers [84.9120606803906]
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback.
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions.
arXiv Detail & Related papers (2023-02-10T12:16:38Z) - Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks.
We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates.
We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.