BatchPrompt: Accomplish more with less
- URL: http://arxiv.org/abs/2309.00384v3
- Date: Mon, 15 Jul 2024 05:42:34 GMT
- Title: BatchPrompt: Accomplish more with less
- Authors: Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham,
- Abstract summary: BatchPrompt is an efficient way to batch data within the token limit.
To retain efficiency and overcome performance loss, we propose Batch Permutation and Ensembling.
This is the first work to technically improve prompting efficiency of large language models.
- Score: 9.204837699571788
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As the ever-increasing token limits of large language models (LLMs) have enabled long context as input, prompting with single data samples might no longer an efficient way. A straightforward strategy improving efficiency is to batch data within the token limit (e.g., 8k for gpt-3.5-turbo; 32k for GPT-4), which we call BatchPrompt. We have two initial observations for prompting with batched data. First, we find that prompting with batched data in longer contexts will inevitably lead to worse performance, compared to single-data prompting. Second, the performance of the language model is significantly correlated with the positions and order of the batched data, due to the corresponding change in decoder context. To retain efficiency and overcome performance loss, we propose Batch Permutation and Ensembling (BPE), and a novel Self-reflection-guided EArly Stopping (SEAS) technique. Our comprehensive experimental evaluation demonstrates that BPE can boost the performance of BatchPrompt with a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP). These performances are even competitive with/higher than single-data prompting(SinglePrompt), while BatchPrompt requires much fewer LLM calls and input tokens (For SinglePrompt v.s. BatchPrompt with batch size 32, using just 9%-16% the number of LLM calls, Boolq accuracy 90.6% to 90.9% with 27.4% tokens, QQP accuracy 87.2% to 88.4% with 18.6% tokens, RTE accuracy 91.5% to 91.1% with 30.8% tokens). To the best of our knowledge, this is the first work to technically improve prompting efficiency of large language models. We hope our simple yet effective approach will shed light on the future research of large language models. The code will be released.
Related papers
- Length-MAX Tokenizer for Language Models [2.243087516606811]
We introduce a new tokenizer for language models that minimizes the average tokens per character.<n>The Length-MAX tokenizer achieves 99.62% vocabulary coverage and the out-of-vocabulary rate remains low at 0.12% on test sets.
arXiv Detail & Related papers (2025-11-25T20:56:56Z) - Fast Quiet-STaR: Thinking Without Thought Tokens [51.79231070632772]
Fast Quiet STaR is a more efficient reasoning framework that preserves the benefits of token-level reasoning while reducing computational cost.<n>Our method introduces a curriculum learning based training strategy that gradually reduces the number of thought tokens.<n>Experiments on four benchmark datasets with Mistral 7B and Qwen2.5 7B demonstrate that Fast Quiet-STaR consistently outperforms Quiet-STaR in terms of average accuracy.
arXiv Detail & Related papers (2025-05-23T11:14:12Z) - FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models [24.030755262499994]
FastMem is a novel method designed to enhance instruction fine-tuned large language models' context awareness.
It maximizes the likelihood of the prompt before inference by updating only the last Feed-Forward Network (FFN) module.
Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures.
arXiv Detail & Related papers (2024-06-23T10:36:35Z) - CliqueParcel: An Approach For Batching LLM Prompts That Jointly
Optimizes Efficiency And Faithfulness [13.554160815699435]
CliqueParcel is designed to improve efficiency of large language models (LLMs) during the inference process.
CliqueParcel is tested on eight widely recognized datasets.
This work provides novel insights into inference efficiency and demonstrates promising performance.
arXiv Detail & Related papers (2024-02-17T22:37:17Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - Progressive-Hint Prompting Improves Reasoning in Large Language Models [63.98629132836499]
This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP)
It enables automatic multiple interactions between users and Large Language Models (LLMs) by using previously generated answers as hints to progressively guide toward the correct answers.
We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient.
arXiv Detail & Related papers (2023-04-19T16:29:48Z) - Pre-trained Language Models Can be Fully Zero-Shot Learners [26.60008734311909]
We propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding.
NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning.
We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks.
arXiv Detail & Related papers (2022-12-14T00:03:52Z) - Ask Me Anything: A simple strategy for prompting language models [24.294416731247427]
Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt.
We develop an understanding of the effective prompt formats, finding that question-answering (QA) prompts tend to outperform those that restrict the model outputs.
We apply the collected prompts to obtain several noisy votes for the input's true label.
We find that the prompts can have very different accuracies and complex dependencies.
arXiv Detail & Related papers (2022-10-05T17:59:45Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - PERFECT: Prompt-free and Efficient Few-shot Learning with Language
Models [67.3725459417758]
PERFECT is a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting.
We show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning.
Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods.
arXiv Detail & Related papers (2022-04-03T22:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.