Related papers: M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration

M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration

URL: http://arxiv.org/abs/2506.07814v1
Date: Mon, 09 Jun 2025 14:43:39 GMT
Title: M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration
Authors: Yongzhen Wang, Yongjun Li, Zhuoran Zheng, Xiao-Ping Zhang, Mingqiang Wei,
Abstract summary: M2Restore is a novel Mixture-of-Experts (MoE)-based Mamba-CNN fusion framework for efficient and robust all-in-one image restoration.<n>We exploit a CLIP-guided MoE gating mechanism that fuses task-conditioned prompts with CLIP-derived semantic priors.<n>We design a dual-stream architecture that integrates the localized representational strength of CNNs with the long-range modeling efficiency of Mamba.
Score: 24.997494864875883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural images are often degraded by complex, composite degradations such as rain, snow, and haze, which adversely impact downstream vision applications. While existing image restoration efforts have achieved notable success, they are still hindered by two critical challenges: limited generalization across dynamically varying degradation scenarios and a suboptimal balance between preserving local details and modeling global dependencies. To overcome these challenges, we propose M2Restore, a novel Mixture-of-Experts (MoE)-based Mamba-CNN fusion framework for efficient and robust all-in-one image restoration. M2Restore introduces three key contributions: First, to boost the model's generalization across diverse degradation conditions, we exploit a CLIP-guided MoE gating mechanism that fuses task-conditioned prompts with CLIP-derived semantic priors. This mechanism is further refined via cross-modal feature calibration, which enables precise expert selection for various degradation types. Second, to jointly capture global contextual dependencies and fine-grained local details, we design a dual-stream architecture that integrates the localized representational strength of CNNs with the long-range modeling efficiency of Mamba. This integration enables collaborative optimization of global semantic relationships and local structural fidelity, preserving global coherence while enhancing detail restoration. Third, we introduce an edge-aware dynamic gating mechanism that adaptively balances global modeling and local enhancement by reallocating computational attention to degradation-sensitive regions. This targeted focus leads to more efficient and precise restoration. Extensive experiments across multiple image restoration benchmarks validate the superiority of M2Restore in both visual quality and quantitative performance.

Related papers

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution [16.90182090355781]
Visual autoregressive models offer stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction.<n>But their application remains underexplored and faces two critical challenges: locality-biased attention, and residual-only supervision.<n>We propose a globally consistent visual autoregressive framework tailored for image super-resolution.
arXiv Detail & Related papers (2026-02-28T10:39:06Z)
M3SR: Multi-Scale Multi-Perceptual Mamba for Efficient Spectral Reconstruction [47.507960245579106]
We propose a multi-scale, multi-perceptual Mamba architecture for the spectral reconstruction task, called M3SR.<n>Specifically, we design a multi-perceptual fusion block to enhance the ability of the model to comprehensively understand and analyze the input features.
arXiv Detail & Related papers (2026-01-13T07:33:38Z)
ClearAIR: A Human-Visual-Perception-Inspired All-in-One Image Restoration [40.50200240865161]
All-in-One Image Restoration (AiOIR) has advanced significantly, offering promising solutions for complex real-world degradations.<n>We propose ClearAIR, a novel AiOIR framework inspired by Human Visual Perception (HVP) and designed with a hierarchical, coarse-to-fine restoration strategy.<n> Experimental results show that ClearAIR achieves superior performance across diverse synthetic and real-world datasets.
arXiv Detail & Related papers (2026-01-06T06:55:08Z)
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z)
Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation [9.91438130100011]
MambaRec is a novel framework that integrates local feature alignment and global distribution regularization.<n>DREAM module captures hierarchical relationships and context-aware associations, improving cross-modal semantic modeling.<n>Experiments on real-world e-commerce datasets show that MambaRec outperforms existing methods in fusion quality, generalization, and efficiency.
arXiv Detail & Related papers (2025-09-11T02:52:26Z)
RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement [59.364418120895]
Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications.<n>We develop a novel relation-driven Mamba framework for effective UIE (RD-UIE)<n>Experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant.<n>Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations.<n>To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z)
Contrast: A Hybrid Architecture of Transformers and State Space Models for Low-Level Vision [3.574664325523221]
We propose textbfContrast, a hybrid SR model that combines textbfConvolutional, textbfTransformer, and textbfState Space components.<n>By integrating transformer and state space mechanisms, textbfContrast compensates for the shortcomings of each approach, enhancing both global context modeling and pixel-level accuracy.
arXiv Detail & Related papers (2025-01-23T03:34:14Z)
Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction.<n>We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages.<n>We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z)
Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding [67.57487747508179]
Multiple-in-one image restoration (IR) has made significant progress, aiming to handle all types of single degraded image restoration with a single model. In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.
arXiv Detail & Related papers (2024-11-25T09:26:34Z)
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality [69.76121008898677]
Fine-grained Selective Calibrated CLIP integrates local hard negative loss and selective calibrated regularization. Our evaluations show that FSC-CLIP not only achieves compositionality on par with state-of-the-art models but also retains strong multi-modal capabilities.
arXiv Detail & Related papers (2024-10-07T17:16:20Z)
Restore Anything Model via Efficient Degradation Adaptation [129.38475243424563]
RAM takes a unified path that leverages inherent similarities across various degradations to enable efficient and comprehensive restoration.<n> RAM's SOTA performance confirms RAM's SOTA performance, reducing model complexity by approximately 82% in trainable parameters and 85% in FLOPs.
arXiv Detail & Related papers (2024-07-18T10:26:53Z)
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model [7.842507196763463]
IRSRMamba is a novel framework integrating wavelet transform feature modulation for multi-scale adaptation.<n>IRSRMamba outperforms state-of-the-art methods in PSNR, SSIM, and perceptual quality.<n>This work establishes Mamba-based architectures as a promising direction for high-fidelity IR image enhancement.
arXiv Detail & Related papers (2024-05-16T07:49:24Z)
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition [35.15390769958969]
We propose a unified framework, Modality-Collaborative Transformer with Hybrid Feature Reconstruction (MCT-HFR) MCT-HFR consists of a novel attention-based encoder which concurrently extracts and dynamically balances the intra- and inter-modality relations. During model training, LFI leverages complete features as supervisory signals to recover local missing features, while GFA is designed to reduce the global semantic gap between pairwise complete and incomplete representations.
arXiv Detail & Related papers (2023-12-26T01:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.