Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation
- URL: http://arxiv.org/abs/2310.02842v2
- Date: Thu, 5 Oct 2023 21:13:11 GMT
- Title: Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation
- Authors: Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan
Awadallah, Anastasios Kyrillidis, Robert Sim
- Abstract summary: Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions.
Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks.
MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
- Score: 45.90925587972781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have the ability to solve a variety of tasks,
such as text summarization and mathematical questions, just out of the box, but
they are often trained with a single task in mind. Due to high computational
costs, the current trend is to use prompt instruction tuning to better adjust
monolithic, pretrained LLMs for new -- but often individual -- downstream
tasks. Thus, how one would expand prompt tuning to handle -- concomitantly --
heterogeneous tasks and data distributions is a widely open question. To
address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs,
associated with smart gating functionality: the latter -- whose design is one
of the contributions of this paper -- can identify relevant skills embedded in
different groups of prompts and dynamically assign combined experts (i.e.,
collection of prompts), based on the target task. Additionally, MoPs are
empirically agnostic to any model compression technique applied -- for
efficiency reasons -- as well as instruction data source and task composition.
In practice, MoPs can simultaneously mitigate prompt training "interference" in
multi-task, multi-source scenarios (e.g., task and data heterogeneity across
sources), as well as possible implications from model approximations. As a
highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to
$\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim
3\%$ up to $\sim30\%$ in the centralized scenario.
Related papers
- MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - Lightweight In-Context Tuning for Multimodal Unified Models [57.10831399642176]
MultiModal In-conteXt Tuning (M$2$IXT) is a lightweight module to enhance the ICL capabilities of multimodal unified models.
When tuned on as little as 50K multimodal data, M$2$IXT can boost the few-shot ICL performance significantly.
arXiv Detail & Related papers (2023-10-08T10:47:24Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Scaling Distributed Multi-task Reinforcement Learning with Experience
Sharing [38.883540444516605]
DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents.
We conduct both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks.
We propose an algorithm called DistMT-LSVI, where each agent independently learns $epsilon$-optimal policies for all $M$ tasks.
arXiv Detail & Related papers (2023-07-11T22:58:53Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Curriculum Modeling the Dependence among Targets with Multi-task
Learning for Financial Marketing [26.80709680959278]
We propose a prior information merged model (textbfPIMM) for multiple sequential dependence task learning.
The PIM randomly selects the true label information or the prior task prediction with a soft sampling strategy to transfer to the downstream task during the training.
The offline experimental results on both public and product datasets verify that PIMM outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T07:55:16Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic
Conditional Random Fields [67.51177964010967]
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks.
We find that explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
arXiv Detail & Related papers (2020-05-01T07:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.