Joint Prompt Optimization of Stacked LLMs using Variational Inference
- URL: http://arxiv.org/abs/2306.12509v2
- Date: Mon, 4 Dec 2023 15:07:13 GMT
- Title: Joint Prompt Optimization of Stacked LLMs using Variational Inference
- Authors: Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Matheus
Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner,
Nicolas Le Roux
- Abstract summary: Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences.
By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN)
We show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4.
- Score: 66.04409787899583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) can be seen as atomic units of computation
mapping sequences to a distribution over sequences. Thus, they can be seen as
stochastic language layers in a language network, where the learnable
parameters are the natural language prompts at each layer. By stacking two such
layers and feeding the output of one layer to the next, we obtain a Deep
Language Network (DLN). We first show how to effectively perform prompt
optimization for a 1-Layer language network (DLN-1). Then, we present an
extension that applies to 2-layer DLNs (DLN-2), where two prompts must be
learned. The key idea is to consider the output of the first layer as a latent
variable, which requires inference, and prompts to be learned as the parameters
of the generative distribution. We first test the effectiveness of DLN-1 in
multiple reasoning and natural language understanding tasks. Then, we show that
DLN-2 can reach higher performance than a single layer, showing promise that we
might reach comparable performance to GPT-4, even when each LLM in the network
is smaller and less powerful.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - LinkGPT: Teaching Large Language Models To Predict Missing Links [23.57145845001286]
Large Language Models (LLMs) have shown promising results on various language and vision tasks.
Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs)
arXiv Detail & Related papers (2024-06-07T04:54:36Z) - Can we obtain significant success in RST discourse parsing by using
Large Language Models? [32.94244684710954]
decoder-only large language models (LLMs) have significantly impacted a wide range of natural language processing (NLP) tasks.
This paper explores how beneficial such LLMs are for Rhetorical Structure Theory (RST) discourse parsing.
Experimental results on three benchmark datasets, RST-DT, Instr-DT, and the GUM corpus, demonstrate that Llama 2 with 70 billion parameters in the bottom-up strategy obtained state-of-the-art results with significant differences.
arXiv Detail & Related papers (2024-03-08T05:34:29Z) - Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference [32.62084449979531]
We extend SortedNet to generative NLP tasks by replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT)
Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference.
Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit)
arXiv Detail & Related papers (2023-09-16T11:58:34Z) - Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback [61.83548032416181]
We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages.
Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
arXiv Detail & Related papers (2023-07-29T18:01:46Z) - Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
Language Models [77.2078051555533]
We propose a novel and affordable solution for the effective VL adaption of large language models (LLMs)
Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters.
MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions.
arXiv Detail & Related papers (2023-05-24T11:06:15Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Investigating the Effectiveness of Task-Agnostic Prefix Prompt for
Instruction Following [44.701091969256055]
We present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference.
We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average.
arXiv Detail & Related papers (2023-02-28T16:06:35Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.