Related papers: Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time

Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time

URL: http://arxiv.org/abs/2404.11916v3
Date: Thu, 21 Aug 2025 14:04:54 GMT
Title: Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time
Authors: Nakyeong Yang, Jiwon Moon, Junseok Kim, Yunah Jang, Kyomin Jung,
Abstract summary: We introduce Decomposition of Experts (DoE), a novel framework that dynamically identifies and activates task-specific experts within a language model.<n>DoE achieves up to a x1.73 inference speed-up with a 65% pruning rate, without compromising accuracy.
Score: 15.695487920048816
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Enabled by large-scale text corpora with huge parameters, pre-trained language models operate as multi-task experts using a single model architecture. However, recent studies have revealed that certain neurons play disproportionately important roles in solving specific tasks, suggesting that task-relevant substructures can be isolated and selectively activated for each task. Therefore, we introduce Decomposition of Experts (DoE), a novel framework that dynamically identifies and activates task-specific experts within a language model to reduce inference cost without sacrificing accuracy. We first define a task expert as a set of parameters that significantly influence the performance of a specific task and propose a four-step unplug-and-play process: (1) receiving a user request, (2) identifying the corresponding task expert, (3) performing inference using the expert-localized model, and (4) restoring the original model and waiting for the next task. Using attribution methods and prompt tuning, DoE isolates task-relevant neurons, minimizing computational overhead while maintaining task performance. We assume a setting where a language model receives user requests from five widely used natural language understanding benchmarks, processing one task at a time. In this setup, we demonstrate that DoE achieves up to a x1.73 inference speed-up with a 65% pruning rate, without compromising accuracy. Comparisons with various task expert localization methods reveal that DoE effectively identifies task experts, while ablation studies validate the importance of its components. Additionally, we analyze the effects of batch size, token count, and layer types on inference speed-up, providing practical insights for adopting DoE. The proposed framework is both practical and scalable, applicable to any transformer-based architecture, offering a robust solution for efficient task-specific inference.

Related papers

Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models [6.0783502693642495]
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip"<n>This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge.<n>We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks.
arXiv Detail & Related papers (2025-09-16T09:00:25Z)
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning [76.32953653161417]
Class-incremental learning enables models to learn new classes progressively while preserving knowledge of previously learned ones. Recent advances in this field have shifted towards parameter-efficient fine-tuning techniques. We present a novel prompt-based approach that addresses the limitation of current approaches.
arXiv Detail & Related papers (2025-03-11T02:27:37Z)
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning [0.08795040582681389]
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts. We propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens. Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding.
arXiv Detail & Related papers (2024-10-11T07:55:09Z)
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks. RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model. Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z)
SpeechVerse: A Large-scale Generalizable Audio Language Model [40.81864091356371]
SpeechVerse is a robust multi-task training and curriculum learning framework.<n>It combines pre-trained speech and text foundation models via a small set of learnable parameters.<n>Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z)
TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning [37.09785060896196]
We propose textbfTEGEE (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions.<n>Our framework employs a dual 3B model approach, with each model assigned a distinct role.<n> Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model.
arXiv Detail & Related papers (2024-03-07T05:26:41Z)
Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. Our experiments assess four target tasks with distinct coverage levels. The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)
Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion [44.056153052137674]
We propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models. Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms.
arXiv Detail & Related papers (2022-11-04T19:03:45Z)
Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z)
Prompt-Matched Semantic Segmentation [96.99924127527002]
The objective of this work is to explore how to effectively adapt pre-trained foundation models to various downstream tasks of image semantic segmentation. We propose a novel Inter-Stage Prompt-Matched Framework, which maintains the original structure of the foundation model while generating visual prompts adaptively for task-oriented tuning. A lightweight module termed Semantic-aware Prompt Matcher is then introduced to hierarchically interpolate between two stages to learn reasonable prompts for each specific task.
arXiv Detail & Related papers (2022-08-22T09:12:53Z)
Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages. We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z)
IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage. We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z)
Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning [41.15017636192417]
We present CP-Tuning, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning Language Models. It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters. Experiments over a variety of language understanding tasks used in IR systems and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-01T02:24:24Z)
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM) Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z)
Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding [101.24748444126982]
Decomposable tasks are complex and comprise of a hierarchy of sub-tasks. Existing benchmarks, however, typically hold out examples for only the surface-level sub-task. We propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions.
arXiv Detail & Related papers (2021-06-29T02:53:59Z)
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers. We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z)
Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. We explore various attention-based contexts, such as global and local, in the multi-task setting. We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.