Related papers: One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

URL: http://arxiv.org/abs/2407.00256v1
Date: Fri, 28 Jun 2024 23:05:08 GMT
Title: One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh,
Abstract summary: Large Language Models (LLMs) exhibit strong generalization capabilities when prompted with language instructions and in-context demos. Various methods have been explored to automate the instruction design, but they restricted the searched prompt to one instruction. We adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions. A two-phase process is developed to construct the specialized expert for each region. A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect.
Score: 110.94724216491753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks.

Related papers

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields. This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity. The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z)
Task Facet Learning: A Structured Approach to Prompt Optimization [14.223730629357178]
We propose an algorithm that learns multiple facets of a task from a set of training examples. The resulting algorithm, UniPrompt, consists of a generative model to generate initial candidates for each prompt section. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using UniPrompt obtain higher accuracy than human-tuned prompts.
arXiv Detail & Related papers (2024-06-15T04:54:26Z)
Mixture-of-Instructions: Aligning Large Language Models via Mixture Prompting [7.103987978402038]
We introduce a novel technique termed Mixture-of-Instructions (MoI) MoI employs a strategy of instruction packing combined with diverse system prompts to boost the alignment efficiency of language models. Our methodology was applied to the open-source Qwen-7B-chat model, culminating in the development of Qwen-SFT-MoI.
arXiv Detail & Related papers (2024-04-29T03:58:12Z)
TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning [37.09785060896196]
We propose textbfTEGEE (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions. Our framework employs a dual 3B model approach, with each model assigned a distinct role. Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model.
arXiv Detail & Related papers (2024-03-07T05:26:41Z)
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning [14.456571495691561]
We introduce Ada-Instruct, an adaptive instruction generator developed through fine-tuning. We empirically validated Ada-Instruct's efficacy across different applications.
arXiv Detail & Related papers (2023-10-06T13:28:04Z)
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z)
Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z)
A Unified Framework for Multi-intent Spoken Language Understanding with prompting [14.17726194025463]
We describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form. In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence. Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.
arXiv Detail & Related papers (2022-10-07T05:58:05Z)
Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Instance-aware Prompt Learning for Language Understanding and Generation [49.22899822734549]
We propose an instance-aware prompt learning method that learns a different prompt for each instance. Our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
arXiv Detail & Related papers (2022-01-18T17:03:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.