Related papers: Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs

Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs

URL: http://arxiv.org/abs/2507.09839v1
Date: Mon, 14 Jul 2025 00:20:14 GMT
Title: Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs
Authors: MohammadReza Davari, Utkarsh Garg, Weixin Cai, Eugene Belilovsky,
Abstract summary: We propose a novel Automatic Prompt Optimization (APO) framework centered on enhancing the feedback mechanism.<n>To mitigate the noise inherent in LLM-generated feedback, we introduce a technique called feedback diversification.<n>Our approach consistently outperforms strong baselines, achieving significant accuracy improvements, faster convergence, and lower computational costs.
Score: 10.434732630519377
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An increasing number of NLP applications interact with large language models (LLMs) through black-box APIs, making prompt engineering critical for controlling model outputs. While recent Automatic Prompt Optimization (APO) methods iteratively refine prompts using model-generated feedback, textual gradients, they primarily focus on error correction and neglect valuable insights from correct predictions. This limits both their effectiveness and efficiency. In this paper, we propose a novel APO framework centered on enhancing the feedback mechanism. We reinterpret the textual gradient as a form of negative reinforcement and introduce the complementary positive reinforcement to explicitly preserve beneficial prompt components identified through successful predictions. To mitigate the noise inherent in LLM-generated feedback, we introduce a technique called feedback diversification, which aggregates multiple feedback signals, emphasizing consistent, actionable advice while filtering out outliers. Motivated by the rapid evolution and diversity of available LLMs, we also formalize Continual Prompt Optimization (CPO), addressing the practical challenge of efficiently migrating optimized prompts between different model versions or API providers. Our experiments reveal that naive prompt migration often degrades performance due to loss of critical instructions. In contrast, our approach consistently outperforms strong baselines, achieving significant accuracy improvements, faster convergence, and lower computational costs in both standard and migration scenarios.

Related papers

Prompt Optimization Via Diffusion Language Models [73.9599434962714]
We propose a diffusion-based framework for prompt optimization.<n>Our method enables flexible, span-level prompt updates without requiring access or modifying the downstream language model.<n>We show that moderate diffusion step counts provide the best balance between refinement quality and stability.
arXiv Detail & Related papers (2026-01-30T00:00:54Z)
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization [66.82303841930752]
diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs)<n>DLMs still lag behind LLMs in reasoning performance, especially as the number of denoising steps decreases.<n>We propose a Multi-Reward Optimization (MRO) approach, which encourages DLMs to consider the token correlation during the denoising process.
arXiv Detail & Related papers (2025-10-24T13:57:59Z)
In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning [10.138497038893096]
We introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response.<n> Empirical evaluations on reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback.
arXiv Detail & Related papers (2025-10-01T11:16:04Z)
TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits [7.615431299673158]
Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding.<n>We propose TopoSizing, an end-to-end framework that performs robust circuit understanding directly from raw netlists.
arXiv Detail & Related papers (2025-09-17T16:52:46Z)
Retrieval Enhanced Feedback via In-context Neural Error-book [8.862195491555575]
We propose REFINE: Retrieval-Enhanced Feedback via In-student Neural Error-context book.<n> REFINE systematically structures errors and provides targeted feedback.<n>Our results demonstrate substantial speedup, reduced computational costs, and successful generalization.
arXiv Detail & Related papers (2025-08-22T11:50:04Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [41.69340422699651]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent [24.134616865308985]
We introduce a novel multi-turn preference optimization paradigm ECPO.<n>We show that ECPO significantly enhances CRA's interaction capabilities, delivering notable improvements in both efficiency and effectiveness over existing MTPO methods.
arXiv Detail & Related papers (2025-06-17T08:29:04Z)
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning [78.17782197231325]
We propose a reasoning-guided reinforcement learning strategy that aligns the extractor's captioning behavior with the reasoning objective.<n> Experiments on multi-modal math and science benchmarks show that the proposed RACRO method achieves state-of-the-art average performance.
arXiv Detail & Related papers (2025-06-05T02:28:07Z)
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization [59.39976343879587]
VerIPO aims to gradually improve video LLMs' capacity for generating deep, long-term reasoning chains.<n>The training loop benefits from GRPO's expansive search and DPO's targeted optimization.<n>Our trained models exceed the direct inference of large-scale instruction-tuned Video-LLMs.
arXiv Detail & Related papers (2025-05-25T06:41:28Z)
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining [66.54211199959298]
We propose a novel preference learning framework, Modality-Balancing Preference Optimization (MBPO), to address the modality imbalance in LMMs.<n>MBPO constructs a more effective offline preference dataset by generating hard negatives, i.e., rejected responses misled by LLM biases.<n>It can enhance LMM performance on challenging vision-language tasks and effectively reduce hallucinations.
arXiv Detail & Related papers (2025-05-20T03:59:05Z)
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization [29.706347050700867]
Large Video Language Models (LVLMs) struggle with fine-grained temporal understanding, hallucinate, and often make simple mistakes on even simple video question-answering tasks.<n>We propose a self-alignment framework that enables LVLMs to learn from their own errors.
arXiv Detail & Related papers (2025-04-16T13:43:56Z)
Auto-Prompt Generation is Not Robust: Prompt Optimization Driven by Pseudo Gradient [50.15090865963094]
We introduce PertBench, a comprehensive benchmark dataset that includes a wide range of input perturbations.<n>Our analysis reveals substantial vulnerabilities in existing prompt generation strategies.<n>We propose PGO, a gradient-free prompt generation framework that leverages perturbation types as pseudo-gradient signals.
arXiv Detail & Related papers (2024-12-24T06:05:08Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game [31.66896160733569]
We propose an Adversarial Preference Optimization (APO) framework to target more efficient human preference optimization. We find the proposed adversarial training framework further enhances existing alignment baselines in terms of LLM helpfulness and harmlessness.
arXiv Detail & Related papers (2023-11-14T10:10:31Z)
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine [24.888093229577965]
We propose a simple, universal, and automatic method named PREFER to address the stated limitations. Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
arXiv Detail & Related papers (2023-08-23T09:46:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.