Making Prompts First-Class Citizens for Adaptive LLM Pipelines
- URL: http://arxiv.org/abs/2508.05012v1
- Date: Thu, 07 Aug 2025 03:49:56 GMT
- Title: Making Prompts First-Class Citizens for Adaptive LLM Pipelines
- Authors: Ugur Cetintemel, Shu Chen, Alexander W. Lee, Deepti Raghavan,
- Abstract summary: We describe our vision and an initial design for SPEAR, a language and runtime that fills the prompt management gap.<n> SPEAR defines a prompt algebra that governs how prompts are constructed and adapted within a pipeline.<n>By treating prompt logic as structured data, SPEAR enables optimizations such as operator fusion, prefix caching, and view reuse.
- Score: 46.59482681672941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern LLM pipelines increasingly resemble data-centric systems: they retrieve external context, compose intermediate outputs, validate results, and adapt based on runtime feedback. Yet, the central element guiding this process -- the prompt -- remains a brittle, opaque string, disconnected from the surrounding dataflow. This disconnect limits reuse, optimization, and runtime control. In this paper, we describe our vision and an initial design for SPEAR, a language and runtime that fills this prompt management gap by making prompts structured, adaptive, and first-class components of the execution model. SPEAR enables (1) runtime prompt refinement -- modifying prompts dynamically in response to execution-time signals such as confidence, latency, or missing context; and (2) structured prompt management -- organizing prompt fragments into versioned views with support for introspection and logging. SPEAR defines a prompt algebra that governs how prompts are constructed and adapted within a pipeline. It supports multiple refinement modes (manual, assisted, and automatic), giving developers a balance between control and automation. By treating prompt logic as structured data, SPEAR enables optimizations such as operator fusion, prefix caching, and view reuse. Preliminary experiments quantify the behavior of different refinement modes compared to static prompts and agentic retries, as well as the impact of prompt-level optimizations such as operator fusion.
Related papers
- PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries [16.40921376558516]
We introduce PARALLELPROMPT, the first benchmark for measuring intra-query parallelism in natural user prompts.<n>Our dataset comprises over 37,000 real-world prompts from public LLM chat logs.<n>We provide an execution suite that benchmarks serial vs. parallel strategies, measuring latency, structural adherence, and semantic fidelity.
arXiv Detail & Related papers (2025-06-23T15:05:54Z) - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback [17.851957960438483]
We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning.<n> SIPDO couples a synthetic data generator with a prompt, where the generator produces new examples that reveal current prompt weaknesses and refine the prompt in response.<n>This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks.
arXiv Detail & Related papers (2025-05-26T04:56:48Z) - Fast Prompt Alignment for Text-to-Image Generation [28.66112701912297]
This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach.<n>FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts.<n>FPA achieves competitive text-image alignment scores at a fraction of the processing time.
arXiv Detail & Related papers (2024-12-11T18:58:41Z) - Generative Prompt Internalization [48.91617280112579]
We propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach.<n>GenPI not only replicates the behavior of models with prompt inputs but also generates the content of the prompt.<n>We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z) - IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO)
IPO utilizes large language models (LLMs) to generate textual prompts dynamically.
We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.