Related papers: Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment

Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment

URL: http://arxiv.org/abs/2406.09546v1
Date: Thu, 13 Jun 2024 19:21:01 GMT
Title: Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment
Authors: Fengbin Guan, Xin Li, Zihao Yu, Yiting Lu, Zhibo Chen,
Abstract summary: We take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment. We propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks. Our proposed StylePrompt enables better perception transfer capability with less computational cost.
Score: 15.320011514412437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been under-explored. Consequently, we propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals that the Mamba model has obvious advantages compared with existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost for IQA. To increase the transferability of Q-Mamba, we propose the StylePrompt tuning paradigm, where the basic lightweight mean and variance prompts are injected to assist the task-adaptive transfer learning of pre-trained Q-Mamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our proposed StylePrompt enables better perception transfer capability with less computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets have demonstrated the effectiveness of our proposed Q-Mamba.

Related papers

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba. Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks. For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z)
From Markov to Laplace: How Mamba In-Context Learns Markov Chains [36.22373318908893]
We study in-context learning on Markov chains and uncover a surprising phenomenon. Unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator. These theoretical insights align strongly with empirical results and represent the first formal connection between Mamba and optimal statistical estimators.
arXiv Detail & Related papers (2025-02-14T14:13:55Z)
PTQ4VM: Post-Training Quantization for Visual Mamba [9.446971590056945]
We propose Post-Training Quantization for Visual Mamba (PTQ4VM), which introduces two key strategies: Per-Token Static (PTS) quantization and Joint Learning of Smoothing Scale and Step Size (JLSS)<n> PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation.
arXiv Detail & Related papers (2024-12-29T07:21:33Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z)
A Survey of Mamba [27.939712558507516]
Recently, a novel architecture named Mamba has emerged as a promising alternative for building foundation models. This study investigates the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel.
arXiv Detail & Related papers (2024-08-02T09:18:41Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba [77.21394300708172]
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. The recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of
arXiv Detail & Related papers (2024-06-24T15:27:21Z)
Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity. We show that Mamba shares surprising similarities with linear attention Transformer. We propose a Mamba-Like Linear Attention (MLLA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z)
MambaOut: Do We Really Need Mamba for Vision? [70.60495392198686]
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism. This paper conceptually concludes that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. We construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM.
arXiv Detail & Related papers (2024-05-13T17:59:56Z)
An Investigation of Incorporating Mamba for Speech Enhancement [45.610243349192096]
We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset.
arXiv Detail & Related papers (2024-05-10T16:18:49Z)
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation [18.383760896304604]
This report introduces the first attempt to train a Mamba model utilizing contrastive technical-image pretraining (CLIP) A Mamba model 67 million parameters is on par with a 307 million- parameters Vision Transformer (ViT) model in zero-shot classification tasks.
arXiv Detail & Related papers (2024-04-30T09:40:07Z)
Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling. Since January 2024, Mamba has been actively applied to diverse computer vision tasks. This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z)
Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z)
Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL) This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.