Related papers: Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

URL: http://arxiv.org/abs/2403.17749v2
Date: Mon, 27 May 2024 16:09:48 GMT
Title: Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Authors: Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li,
Abstract summary: We present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE) To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Our experiments show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics.
Score: 35.11968315125389
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.

Related papers

Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts [22.936728143586443]
Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization.<n>We introduce a novel Fine-Grained Mixture of Experts architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning.
arXiv Detail & Related papers (2025-07-25T08:59:30Z)
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition [16.14787920254091]
We present a novel Multimodal Mixture of Low-Rank Experts (MMoLRE) method for multimodal sentiment analysis (MSA) and multimodal emotion recognition (MER)<n>MMoLRE utilizes shared and task-specific experts to distinctly model common and unique task characteristics, thereby avoiding parameter conflicts.<n>Experiments on the CMU-MOSI and CMU-MOSEI benchmarks demonstrate that MMoLRE achieves state-of-the-art performance on the MSA task and competitive results on the MER task.
arXiv Detail & Related papers (2025-05-20T09:46:56Z)
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data. Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt. We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z)
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
We present an innovative framework named LLaVA-CMoE, which is a continuous Mixture of Experts (MoE) architecture without any replay data. Specifically, we have developed a method called Probe-Guided Knowledge Extension (PGKE), which employs probe experts to assess whether additional knowledge is required. We also introduce a hierarchical routing algorithm called Probabilistic Task Locator (PTL), where high-level routing captures inter-task information and low-level routing focuses on intra-task details.
arXiv Detail & Related papers (2025-03-27T07:36:11Z)
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z)
Mixture of Parrots: Experts improve memorization more than reasoning [72.445819694797]
We show that as we increase the number of experts, the memorization performance consistently increases while the reasoning capabilities saturate. We find that increasing the number of experts helps solve knowledge-intensive tasks, but fails to yield the same benefits for reasoning tasks.
arXiv Detail & Related papers (2024-10-24T17:54:41Z)
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model. To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts [6.245113492272563]
Mixture of Dyadic Experts (MoDE) is a novel design for efficient multi-task adaptation. Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks.
arXiv Detail & Related papers (2024-08-02T18:05:10Z)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models. Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z)
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts [74.40198929049959]
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. generalist LMMs often suffer from performance degradation when tuned over a large collection of tasks. We propose Omni-SMoLA, an architecture that uses the Soft MoE approach to mix many multimodal low rank experts.
arXiv Detail & Related papers (2023-12-01T23:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.