DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
- URL: http://arxiv.org/abs/2310.03714v1
- Date: Thu, 5 Oct 2023 17:37:25 GMT
- Title: DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
- Authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav
Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi,
Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
- Abstract summary: We introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs.
Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines.
- Score: 44.772892598128784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ML community is rapidly exploring techniques for prompting language
models (LMs) and for stacking them into pipelines that solve complex tasks.
Unfortunately, existing LM pipelines are typically implemented using hard-coded
"prompt templates", i.e. lengthy strings discovered via trial and error. Toward
a more systematic approach for developing and optimizing LM pipelines, we
introduce DSPy, a programming model that abstracts LM pipelines as text
transformation graphs, i.e. imperative computational graphs where LMs are
invoked through declarative modules. DSPy modules are parameterized, meaning
they can learn (by creating and collecting demonstrations) how to apply
compositions of prompting, finetuning, augmentation, and reasoning techniques.
We design a compiler that will optimize any DSPy pipeline to maximize a given
metric. We conduct two case studies, showing that succinct DSPy programs can
express and optimize sophisticated LM pipelines that reason about math word
problems, tackle multi-hop retrieval, answer complex questions, and control
agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and
llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot
prompting (generally by over 25% and 65%, respectively) and pipelines with
expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top
of that, DSPy programs compiled to open and relatively small LMs like
770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely
on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at
https://github.com/stanfordnlp/dspy
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together [21.797319884895025]
We seek strategies to optimize both the module-level LM weights and the associated prompt templates of such systems to maximize a downstream task metric.
We propose for the first time combining the weight and prompt optimization strategies to optimize a modular LM pipeline by alternating between the two to get the same LM to teach itself.
arXiv Detail & Related papers (2024-07-15T17:30:31Z) - Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [40.159064885288245]
We study prompt optimization for Language Model Programs.
We factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module.
We develop MIPRO, a novel algorithm for optimizing LM programs.
arXiv Detail & Related papers (2024-06-17T16:12:03Z) - DSPy Assertions: Computational Constraints for Self-Refining Language
Model Pipelines [41.779902953557425]
Chaining language model (LM) calls as composable modules is fueling a new way of programming.
We introduce LM Assertions, a construct for expressing computational constraints that LMs should satisfy.
We present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.
arXiv Detail & Related papers (2023-12-20T19:13:26Z) - PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task
Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions.
We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence.
The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Demonstrate-Search-Predict: Composing retrieval and language models for
knowledge-intensive NLP [77.817293104436]
We propose a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM.
We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings.
arXiv Detail & Related papers (2022-12-28T18:52:44Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.