DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
- URL: http://arxiv.org/abs/2310.03714v1
- Date: Thu, 5 Oct 2023 17:37:25 GMT
- Title: DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
- Authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav
Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi,
Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
- Abstract summary: We introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs.
Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines.
- Score: 44.772892598128784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ML community is rapidly exploring techniques for prompting language
models (LMs) and for stacking them into pipelines that solve complex tasks.
Unfortunately, existing LM pipelines are typically implemented using hard-coded
"prompt templates", i.e. lengthy strings discovered via trial and error. Toward
a more systematic approach for developing and optimizing LM pipelines, we
introduce DSPy, a programming model that abstracts LM pipelines as text
transformation graphs, i.e. imperative computational graphs where LMs are
invoked through declarative modules. DSPy modules are parameterized, meaning
they can learn (by creating and collecting demonstrations) how to apply
compositions of prompting, finetuning, augmentation, and reasoning techniques.
We design a compiler that will optimize any DSPy pipeline to maximize a given
metric. We conduct two case studies, showing that succinct DSPy programs can
express and optimize sophisticated LM pipelines that reason about math word
problems, tackle multi-hop retrieval, answer complex questions, and control
agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and
llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot
prompting (generally by over 25% and 65%, respectively) and pipelines with
expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top
of that, DSPy programs compiled to open and relatively small LMs like
770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely
on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at
https://github.com/stanfordnlp/dspy
Related papers
- Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together [21.797319884895025]
We evaluate approximate optimization strategies in which we bootstrap training labels for all pipeline stages and use these to optimize the pipeline's prompts and fine-tune its weights alternatingly.
Simple approaches for optimizing the prompts and weights together outperform directly optimizing weights alone and prompts alone by up to 65% and 5%, respectively, on average across LMs and tasks.
arXiv Detail & Related papers (2024-07-15T17:30:31Z) - Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [40.159064885288245]
Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks.
We study prompt optimization for LM programs to maximize a downstream metric without access to module-level labels or gradients.
We develop MIPRO, a novel refine that outperforms baselines on five of six diverse LM programs using a best-in-class open-source model.
arXiv Detail & Related papers (2024-06-17T16:12:03Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over
Tabular and Textual Data [77.66158066013924]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - DSPy Assertions: Computational Constraints for Self-Refining Language
Model Pipelines [41.779902953557425]
Chaining language model (LM) calls as composable modules is fueling a new way of programming.
We introduce LM Assertions, a construct for expressing computational constraints that LMs should satisfy.
We present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.
arXiv Detail & Related papers (2023-12-20T19:13:26Z) - PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task
Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions.
We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence.
The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Demonstrate-Search-Predict: Composing retrieval and language models for
knowledge-intensive NLP [77.817293104436]
We propose a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM.
We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings.
arXiv Detail & Related papers (2022-12-28T18:52:44Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.