Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
- URL: http://arxiv.org/abs/2508.10030v1
- Date: Fri, 08 Aug 2025 18:45:53 GMT
- Title: Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
- Authors: Saaduddin Mahmud, Mason Nakamura, Kyle H. Wray, Shlomo Zilberstein,
- Abstract summary: Existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without regard to the inference strategy employed during deployment.<n>We introduce a unified novel framework named IAPO that jointly optimize the prompt and inference scale, while being aware of the inference budget and different task objectives.<n>We develop a fixed-budget training algorithm for IAPO, which we call PSST, and analyze finite-budget guarantees on error probability.
- Score: 8.579682278783784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have also proven to enhance alignment and performance by trading off computation. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without regard to the inference strategy employed during deployment. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a unified novel framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, which we call PSST (Prompt Scaling via Sequential Trimming), and analyze finite-budget guarantees on error probability. Finally, we evaluate the effectiveness of PSST on six different tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness when aligning black-box LLMs through prompt optimization.
Related papers
- Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z) - Large Language Model Assisted Automated Algorithm Generation and Evolution via Meta-black-box optimization [9.184788298623062]
AwesomeDE is proposed that leverages large language models (LLMs) as the strategy of meta-optimizer to generate update rules for constrained evolutionary algorithm without human intervention.<n>Key components, including prompt design and iterative refinement, are systematically analyzed to determine their impact on design quality.<n> Experimental results demonstrate that the proposed approach outperforms existing methods in terms of computational efficiency and solution accuracy.
arXiv Detail & Related papers (2025-09-16T17:02:24Z) - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z) - Optimizing Anytime Reasoning via Budget Relative Policy Optimization [70.32755424260336]
We present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance.<n>We truncate the complete thinking process to fit within sampled token budgets from a prior distribution.<n>We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward.
arXiv Detail & Related papers (2025-05-19T17:58:44Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs [40.57154850069878]
We propose a scalable framework designed to automatically construct alignment preferences grounded in instruction fulfillment efficacy.<n>Our method involves an automated preference construction coupled with a dedicated verification process.<n>Experiments conducted on Qwen2VL-7B demonstrate IPA's effectiveness across multiple benchmarks.
arXiv Detail & Related papers (2025-03-26T08:19:02Z) - A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning [61.403275660120606]
Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives.<n>We propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method.<n>Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.
arXiv Detail & Related papers (2025-03-02T13:43:53Z) - Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment [45.45508377432791]
This paper introduces Reward-Aware Preference Optimization (RPO), a mathematical framework that unifies popular preference optimization techniques.<n>RPO provides a structured approach to disentangle and systematically study the impact of various design choices.<n>We propose a new experimental setup that enables the clean and direct ablation of such design choices.
arXiv Detail & Related papers (2025-01-31T22:39:04Z) - Evolutionary Pre-Prompt Optimization for Mathematical Reasoning [45.461506988071534]
This paper explores the optimization of example selection for designing effective chain-of-thought pre-prompts.<n>It shows that the choice of the algorithm, typically in favor of comparison-based methods such as evolutionary computation, significantly enhances efficacy and feasibility.
arXiv Detail & Related papers (2024-12-05T16:12:06Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization [8.975505323004427]
We propose a novel Cohesive In-Context Prompt Optimization framework for Large Language Models (LLMs)<n>We introduce SEE, a scalable and efficient prompt optimization framework that adopts metaheuristic optimization principles and strategically exploration and exploitation.<n> SEE significantly outperforms state-of-the-art baseline methods by a large margin, achieving an average performance gain of 13.94 while reducing computational costs by 58.67.
arXiv Detail & Related papers (2024-02-17T17:47:10Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime [59.27851754647913]
Predictive optimization is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising.
We develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for advertising.
Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO.
arXiv Detail & Related papers (2023-11-13T13:19:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.