Vision Mamba Distillation for Low-resolution Fine-grained Image Classification
- URL: http://arxiv.org/abs/2411.17980v1
- Date: Wed, 27 Nov 2024 01:29:44 GMT
- Title: Vision Mamba Distillation for Low-resolution Fine-grained Image Classification
- Authors: Yao Chen, Jiabao Wang, Peichao Wang, Rui Zhang, Yang Li,
- Abstract summary: We propose a Vision Mamba Distillation (ViMD) approach to enhance the effectiveness and efficiency of low-resolution fine-grained image classification.
ViMD outperforms similar methods with fewer parameters and FLOPs, which is more suitable for embedded device applications.
- Score: 11.636461046632183
- License:
- Abstract: Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the number of parameters and computational complexity of models. In order to solve this problem, in this letter, we propose a Vision Mamba Distillation (ViMD) approach to enhance the effectiveness and efficiency of low-resolution fine-grained image classification. Concretely, a lightweight super-resolution vision Mamba classification network (SRVM-Net) is proposed to improve its capability for extracting visual features by redesigning the classification sub-network with Mamba modeling. Moreover, we design a novel multi-level Mamba knowledge distillation loss boosting the performance, which can transfer prior knowledge obtained from a High-resolution Vision Mamba classification Network (HRVM-Net) as a teacher into the proposed SRVM-Net as a student. Extensive experiments on seven public fine-grained classification datasets related to benchmarks confirm our ViMD achieves a new state-of-the-art performance. While having higher accuracy, ViMD outperforms similar methods with fewer parameters and FLOPs, which is more suitable for embedded device applications. Code is available at https://github.com/boa2004plaust/ViMD.
Related papers
- MambaLiteSR: Image Super-Resolution with Low-Rank Mamba using Knowledge Distillation [0.5243460995467893]
MambaLiteSR is a novel lightweight image Super-Resolution (SR) model that utilizes the architecture of Vision Mamba.
We show that MambaLiteSR achieves performance comparable to both the baseline and other edge models while using 15% fewer parameters.
It also improves power consumption by up to 58% compared to state-of-the-art SR edge models, all while maintaining low energy use during training.
arXiv Detail & Related papers (2025-02-19T20:32:03Z) - Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.
Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.
SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba [22.852768590511058]
We introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation.
A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation.
For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN.
arXiv Detail & Related papers (2024-05-27T21:04:43Z) - VMambaCC: A Visual State Space Model for Crowd Counting [3.688427498755018]
We propose a novel VMambaCC (VMamba Crowd Counting) model.
VMambaCC inherits the merits of VMamba, or global modeling for images and low computational cost.
We present a High-level Semantic Supervised Feature Pyramid Network (HS2PFN) that progressively integrates and enhances high-level semantic information with low-level semantic information.
arXiv Detail & Related papers (2024-05-07T03:30:57Z) - DVMSR: Distillated Vision Mamba for Efficient Super-Resolution [7.551130027327461]
We propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy.
Our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters.
arXiv Detail & Related papers (2024-05-05T17:34:38Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - MedMamba: Vision Mamba for Medical Image Classification [0.0]
Vision transformers (ViTs) and convolutional neural networks (CNNs) have been extensively studied and widely used in medical image classification tasks.
Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies.
We propose MedMamba, the first Vision Mamba for generalized medical image classification.
arXiv Detail & Related papers (2024-03-06T16:49:33Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.