One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
- URL: http://arxiv.org/abs/2407.00256v1
- Date: Fri, 28 Jun 2024 23:05:08 GMT
- Title: One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
- Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh,
- Abstract summary: Large Language Models (LLMs) exhibit strong generalization capabilities when prompted with language instructions and in-context demos.
Various methods have been explored to automate the instruction design, but they restricted the searched prompt to one instruction.
We adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions.
A two-phase process is developed to construct the specialized expert for each region.
A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect.
- Score: 110.94724216491753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks.
Related papers
- Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.
This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.
The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z) - Mixture-of-Instructions: Aligning Large Language Models via Mixture Prompting [7.103987978402038]
We introduce a novel technique termed Mixture-of-Instructions (MoI)
MoI employs a strategy of instruction packing combined with diverse system prompts to boost the alignment efficiency of language models.
Our methodology was applied to the open-source Qwen-7B-chat model, culminating in the development of Qwen-SFT-MoI.
arXiv Detail & Related papers (2024-04-29T03:58:12Z) - TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning [37.09785060896196]
We propose textbfTEGEE (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions.
Our framework employs a dual 3B model approach, with each model assigned a distinct role.
Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model.
arXiv Detail & Related papers (2024-03-07T05:26:41Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - A Unified Framework for Multi-intent Spoken Language Understanding with
prompting [14.17726194025463]
We describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form.
In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence.
Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.
arXiv Detail & Related papers (2022-10-07T05:58:05Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Instance-aware Prompt Learning for Language Understanding and Generation [49.22899822734549]
We propose an instance-aware prompt learning method that learns a different prompt for each instance.
Our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
arXiv Detail & Related papers (2022-01-18T17:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.