CMIC: Content-Adaptive Mamba for Learned Image Compression
- URL: http://arxiv.org/abs/2508.02192v2
- Date: Tue, 05 Aug 2025 13:36:08 GMT
- Title: CMIC: Content-Adaptive Mamba for Learned Image Compression
- Authors: Yunuo Chen, Zezheng Lyu, Bing He, Hongwei Hu, Qi Wang, Yuan Tian, Li Song, Wenjun Zhang, Guo Lu,
- Abstract summary: Recent Learned image compression (LIC) leverages Mamba-style state-space models (SSMs) for global fields with linear complexity.<n>We introduce Content-Adaptive Mamba (CAM), a dynamic SSM that addresses two critical limitations.<n>CAM employs content-aware token reorganization, clustering and reordering tokens based on content similarity to prioritize proximity in feature space over Euclidean space.
- Score: 28.348742499973493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Learned image compression (LIC) leverages Mamba-style state-space models (SSMs) for global receptive fields with linear complexity. However, vanilla Mamba is content-agnostic, relying on fixed and predefined selective scans, which restricts its ability to dynamically and fully exploit content dependencies. We introduce Content-Adaptive Mamba (CAM), a dynamic SSM that addresses two critical limitations. First, it employs content-aware token reorganization, clustering and reordering tokens based on content similarity to prioritize proximity in feature space over Euclidean space. Second, it integrates global priors into SSM via a prompt dictionary, effectively mitigating the strict causality and long-range decay in the token interactions of Mamba. These innovations enable CAM to better capture global dependencies while preserving computational efficiency. Leveraging CAM, our Content-Adaptive Mamba-based LIC model (CMIC) achieves state-of-the-art rate-distortion performance, surpassing VTM-21.0 by -15.91\%, -21.34\%, and -17.58\% BD-rate on Kodak, Tecnick, and CLIC benchmarks, respectively.
Related papers
- A2Mamba: Attention-augmented State Space Models for Visual Recognition [45.68176825375723]
We propose A2Mamba, a powerful Transformer-Mamba hybrid network architecture.<n>A key step of A2SSM performs a variant of cross-attention by spatially aggregating the SSM's hidden states.<n>Our A2Mamba outperforms all previous ConvNet-, Transformer-, and Mamba-based architectures in visual recognition tasks.
arXiv Detail & Related papers (2025-07-22T14:17:08Z) - MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution [33.457410717030946]
We propose MambaVSR, the first state-space model framework for super-resolution video.<n>MambaVSR enables dynamic interactions through the Shared Compass Construction ( SCC) and the Content-Aware Sequentialization (CAS)<n>Building upon, the CAS module effectively aligns and aggregates non-local similar content across multiple frames by interleaving temporal features along the learned spatial order.
arXiv Detail & Related papers (2025-06-13T13:22:28Z) - RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement [59.364418120895]
Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications.<n>We develop a novel relation-driven Mamba framework for effective UIE (RD-UIE)<n>Experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification [1.9526430269580959]
We propose a full-field interaction multi-groups Mamba framework (HS-Mamba) for classification of hyperspectral images.<n>HS-Mamba consists of a dual-channel spatial-spectral encoder (DCSS-encoder) module and a lightweight global inline attention (LGI-Att) branch.<n>Extensive experiments demonstrate the superiority of the proposed HS-Mamba, outperforming state-of-the-art methods on four benchmark HSI datasets.
arXiv Detail & Related papers (2025-04-22T06:13:02Z) - DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba.<n>By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details.<n>Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z) - MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration [24.66368406718623]
We propose a novel Mamba-based Image Restoration model (MaIR)<n>MaIR consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA)<n>Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets.
arXiv Detail & Related papers (2024-12-28T07:40:39Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.<n>State Space Models (SSMs) have achieved notable success in computer vision.<n>We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - MambaIRv2: Attentive State Space Restoration [96.4452232356586]
Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency.<n>We propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model.
arXiv Detail & Related papers (2024-11-22T12:45:12Z) - V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - GlobalMamba: Global Image Serialization for Vision Mamba [73.50475621164037]
Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens.
Most existing methods employ patch-based image tokenization and then flatten them into 1D sequences for causal processing.
We propose a global image serialization method to transform the image into a sequence of causal tokens.
arXiv Detail & Related papers (2024-10-14T09:19:05Z) - StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos [27.604572990625144]
State-space models (SSMs) have introduced a novel context modeling method by integrating state-space techniques into deep learning.<n>Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications.<n>We propose a Mamba-Attention interleaved architecture that enhances scalability, robustness, and performance.
arXiv Detail & Related papers (2024-09-18T10:48:10Z) - MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs [14.42424591513825]
MambaCSR is a framework based on Mamba for the challenging compressed image super-resolution (CSR) task.<n>We propose an efficient dual-interleaved scanning paradigm (DIS) for CSR, which is composed of two scanning strategies.<n>Results on multiple benchmarks have shown the great performance of our MambaCSR in the compressed image super-resolution task.
arXiv Detail & Related papers (2024-08-21T16:30:45Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - GroupMamba: Efficient Group-Based Visual State Space Model [66.35608254724566]
State-space models (SSMs) have recently shown promise in capturing long-range dependencies with subquadratic computational complexity.<n>However, purely SSM-based models face critical challenges related to stability and achieving state-of-the-art performance in computer vision tasks.<n>Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - MambaVC: Learned Visual Compression with Selective State Spaces [74.29217829932895]
We introduce MambaVC, a simple, strong and efficient compression network based on SSM.
MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling.
On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads.
arXiv Detail & Related papers (2024-05-24T10:24:30Z) - Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task.
MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module.
Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z) - PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition [21.761988930589727]
PlainMamba is a simple non-hierarchical state space model (SSM) designed for general visual recognition.
We adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images.
Our architecture is designed to be easy to use and easy to scale, formed by stacking identical PlainMamba blocks.
arXiv Detail & Related papers (2024-03-26T13:35:10Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.