Related papers: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

URL: http://arxiv.org/abs/2310.03714v1
Date: Thu, 5 Oct 2023 17:37:25 GMT
Title: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
Abstract summary: We introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines.
Score: 44.772892598128784
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

Related papers

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines [23.421567721746765]
We introduce Text-to-Pipeline, a task that translates data preparation instructions into DP pipelines.<n>We also develop a benchmark named PARROT to support systematic evaluation.<n>Despite this improvement, there remains substantial room for progress on Text-to-Pipeline.
arXiv Detail & Related papers (2025-05-21T15:40:53Z)
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines [0.8148009849453334]
Large language models (LLMs) are increasingly deployed in specialized production data processing pipelines across diverse domains. To improve reliability in these applications, creating assertions or guardrails for LLM outputs to run alongside the pipelines is essential. In this paper, we introduce PROMPTEVALS, a dataset of 2087 pipeline prompts with 12623 corresponding assertion criteria.
arXiv Detail & Related papers (2025-04-20T21:04:23Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together [21.797319884895025]
We seek strategies to optimize both the module-level LM weights and the associated prompt templates of such systems to maximize a downstream task metric. We propose for the first time combining the weight and prompt optimization strategies to optimize a modular LM pipeline by alternating between the two to get the same LM to teach itself.
arXiv Detail & Related papers (2024-07-15T17:30:31Z)
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [40.159064885288245]
We study prompt optimization for Language Model Programs. We factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module. We develop MIPRO, a novel algorithm for optimizing LM programs.
arXiv Detail & Related papers (2024-06-17T16:12:03Z)
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines [41.779902953557425]
Chaining language model (LM) calls as composable modules is fueling a new way of programming. We introduce LM Assertions, a construct for expressing computational constraints that LMs should satisfy. We present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.
arXiv Detail & Related papers (2023-12-20T19:13:26Z)
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions. We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence. The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP [77.817293104436]
We propose a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings.
arXiv Detail & Related papers (2022-12-28T18:52:44Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.