Joint Prompt Optimization of Stacked LLMs using Variational Inference
- URL: http://arxiv.org/abs/2306.12509v2
- Date: Mon, 4 Dec 2023 15:07:13 GMT
- Title: Joint Prompt Optimization of Stacked LLMs using Variational Inference
- Authors: Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Matheus
Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner,
Nicolas Le Roux
- Abstract summary: Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences.
By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN)
We show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4.
- Score: 66.04409787899583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) can be seen as atomic units of computation
mapping sequences to a distribution over sequences. Thus, they can be seen as
stochastic language layers in a language network, where the learnable
parameters are the natural language prompts at each layer. By stacking two such
layers and feeding the output of one layer to the next, we obtain a Deep
Language Network (DLN). We first show how to effectively perform prompt
optimization for a 1-Layer language network (DLN-1). Then, we present an
extension that applies to 2-layer DLNs (DLN-2), where two prompts must be
learned. The key idea is to consider the output of the first layer as a latent
variable, which requires inference, and prompts to be learned as the parameters
of the generative distribution. We first test the effectiveness of DLN-1 in
multiple reasoning and natural language understanding tasks. Then, we show that
DLN-2 can reach higher performance than a single layer, showing promise that we
might reach comparable performance to GPT-4, even when each LLM in the network
is smaller and less powerful.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy [33.85811169010525]
Large language models (LLMs) exhibit suboptimal performance on low-resource languages.
Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models.
We propose aname, a framework that integrates representations from all encoder layers.
arXiv Detail & Related papers (2025-02-17T03:45:03Z) - How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - LinkGPT: Teaching Large Language Models To Predict Missing Links [23.57145845001286]
Large Language Models (LLMs) have shown promising results on various language and vision tasks.
Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs)
arXiv Detail & Related papers (2024-06-07T04:54:36Z) - Can we obtain significant success in RST discourse parsing by using
Large Language Models? [32.94244684710954]
decoder-only large language models (LLMs) have significantly impacted a wide range of natural language processing (NLP) tasks.
This paper explores how beneficial such LLMs are for Rhetorical Structure Theory (RST) discourse parsing.
Experimental results on three benchmark datasets, RST-DT, Instr-DT, and the GUM corpus, demonstrate that Llama 2 with 70 billion parameters in the bottom-up strategy obtained state-of-the-art results with significant differences.
arXiv Detail & Related papers (2024-03-08T05:34:29Z) - Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
Language Models [77.2078051555533]
We propose a novel and affordable solution for the effective VL adaption of large language models (LLMs)
Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters.
MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions.
arXiv Detail & Related papers (2023-05-24T11:06:15Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Investigating the Effectiveness of Task-Agnostic Prefix Prompt for
Instruction Following [44.701091969256055]
We present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference.
We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average.
arXiv Detail & Related papers (2023-02-28T16:06:35Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.