Complexity Experts are Task-Discriminative Learners for Any Image Restoration
- URL: http://arxiv.org/abs/2411.18466v2
- Date: Thu, 13 Mar 2025 17:39:00 GMT
- Title: Complexity Experts are Task-Discriminative Learners for Any Image Restoration
- Authors: Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, Radu Timofte,
- Abstract summary: We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.<n>This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.<n>The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
- Score: 80.46313715427928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in all-in-one image restoration models have revolutionized the ability to address diverse degradations through a unified framework. However, parameters tied to specific tasks often remain inactive for other tasks, making mixture-of-experts (MoE) architectures a natural extension. Despite this, MoEs often show inconsistent behavior, with some experts unexpectedly generalizing across tasks while others struggle within their intended scope. This hinders leveraging MoEs' computational benefits by bypassing irrelevant experts during inference. We attribute this undesired behavior to the uniform and rigid architecture of traditional MoEs. To address this, we introduce ``complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields. A key challenge is assigning tasks to each expert, as degradation complexity is unknown in advance. Thus, we execute tasks with a simple bias toward lower complexity. To our surprise, this preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity. Extensive experiments validate our approach, demonstrating the ability to bypass irrelevant experts during inference while maintaining superior performance. The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability. The source code and models are publicly available at \href{https://eduardzamfir.github.io/moceir/}{\texttt{eduardzamfir.github.io/MoCE-IR/}}
Related papers
- Phase-Aware Mixture of Experts for Agentic Reinforcement Learning [23.18318273534301]
A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network.<n>MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters.<n>We propose textbfPhase-Aware Mixture of Experts (PA-MoE).<n>It first features a lightweight emphphase router that learns latent phase boundaries directly from the RL objective without pre-defining phase categories.<n>Then, the phase router allocates temporally consistent assignments to the same expert, allowing experts to preserve phase-specific expertise
arXiv Detail & Related papers (2026-02-19T03:18:30Z) - SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning [83.66308307152808]
We propose StAbilized Mixture-of-Experts (SAME) for Multimodal Continual Instruction Tuning (MCIT)<n>SAME stabilizes expert selection by decomposing routing dynamics into subspaces and updating only task-relevant directions.<n>It also introduces adaptive expert activation to freeze selected experts during training, reducing redundant and cross-task interference.
arXiv Detail & Related papers (2026-02-02T11:47:06Z) - Split-on-Share: Mixture of Sparse Experts for Task-Agnostic Continual Learning [10.01449025634975]
Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma.<n>We introduce SETA, a framework that resolves the plasticity-stability conflict by decomposing the model into modular subspaces.<n>We show that SETA consistently outperforms state-of-the-art parameter-efficient fine-tuning-based continual learning methods.
arXiv Detail & Related papers (2026-01-24T22:39:22Z) - MoE Pathfinder: Trajectory-driven Expert Pruning [19.790092938955336]
We propose an expert pruning approach based on the trajectory of activated experts across layers.<n>Our approach achieves superior pruning performance on nearly all tasks compared with most existing approaches.
arXiv Detail & Related papers (2025-12-20T17:05:08Z) - One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z) - MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning [39.892628170627496]
Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data.<n> prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks.<n>We propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions.
arXiv Detail & Related papers (2025-05-21T03:06:10Z) - SEE: Continual Fine-tuning with Sequential Ensemble of Experts [25.96255683276355]
Continual fine-tuning of large language models (LLMs) suffers from catastrophic forgetting.
We introduce the Sequential Ensemble of Experts (SEE) framework.
SEE removes the need for an additional router, allowing each expert to independently decide whether a query should be handled.
arXiv Detail & Related papers (2025-04-09T07:56:56Z) - LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z) - More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing [5.846028298833611]
Conditionally Overlapping Mixture of ExperTs (COMET) is a general deep learning method that inducing a modular, sparse architecture with an exponential number of overlapping experts.
We demonstrate the effectiveness of COMET on a range of tasks, including image classification, language modeling, and regression.
arXiv Detail & Related papers (2024-10-10T14:58:18Z) - Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts [44.09546603624385]
We introduce a notion of expert specialization for Soft MoE.
We show that when there are many small experts, the architecture is implicitly biased in a fashion that allows us to efficiently approximate the specialized expert subset.
arXiv Detail & Related papers (2024-09-02T00:39:00Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts [110.94724216491753]
Large Language Models (LLMs) exhibit strong generalization capabilities when prompted with language instructions and in-context demos.
Various methods have been explored to automate the instruction design, but they restricted the searched prompt to one instruction.
We adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions.
A two-phase process is developed to construct the specialized expert for each region.
A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect.
arXiv Detail & Related papers (2024-06-28T23:05:08Z) - Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models.
Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z) - Improving Expert Specialization in Mixture of Experts [0.7366405857677227]
Mixture of experts (MoE) is the simplest gated modular neural network architecture.
We show that the original MoE architecture and its training method do not guarantee intuitive task decompositions and good expert utilization.
We introduce a novel gating architecture, similar to attention, that improves performance and results in a lower entropy task decomposition.
arXiv Detail & Related papers (2023-02-28T16:16:45Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z) - Towards Collaborative Question Answering: A Preliminary Study [63.91687114660126]
We propose CollabQA, a novel QA task in which several expert agents coordinated by a moderator work together to answer questions that cannot be answered with any single agent alone.
We make a synthetic dataset of a large knowledge graph that can be distributed to experts.
We show that the problem can be challenging without introducing prior to the collaboration structure, unless experts are perfect and uniform.
arXiv Detail & Related papers (2022-01-24T14:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.