Related papers: InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba

InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba

URL: http://arxiv.org/abs/2506.08735v3
Date: Wed, 23 Jul 2025 10:31:35 GMT
Title: InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba
Authors: Yuhang Wang, Jun Li, Zhijian Wu, Jifeng Shen, Jianhua Xu, Wankou Yang,
Abstract summary: InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks.<n>Built on parallel one-dimensional strip convolutions, InceptionNeXt suffers from limited ability of capturing spatial dependencies along different dimensions.<n>We propose a novel backbone architecture termed InceptionMamba to overcome these limitations.
Score: 21.47782205082816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Within the family of convolutional neural networks, InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks. Built on parallel one-dimensional strip convolutions, however, it suffers from limited ability of capturing spatial dependencies along different dimensions and fails to fully explore spatial modeling in local neighborhood. Besides, inherent locality constraints of convolution operations are detrimental to effective global context modeling. To overcome these limitations, we propose a novel backbone architecture termed InceptionMamba in this study. More specifically, the traditional one-dimensional strip convolutions are replaced by orthogonal band convolutions in our InceptionMamba to achieve cohesive spatial modeling. Furthermore, global contextual modeling can be achieved via a bottleneck Mamba module, facilitating enhanced cross-channel information fusion and enlarged receptive field. Extensive evaluations on classification and various downstream tasks demonstrate that the proposed InceptionMamba achieves state-of-the-art performance with superior parameter and computational efficiency. The source code will be available at https://github.com/Wake1021/InceptionMamba.

Related papers

MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture [12.168520751389622]
Hyperspectral image (HSI) classification faces challenges such as high-dimensional data, limited training samples, and spectral redundancy.<n>This paper proposes a novel MVNet network architecture that integrates 3D-CNN's local feature extraction, Transformer's global modeling, and Mamba's linear sequence modeling capabilities.<n>On IN, UP, and KSC datasets, MVNet outperforms mainstream hyperspectral image classification methods in both classification accuracy and computational efficiency.
arXiv Detail & Related papers (2025-07-06T14:52:26Z)
RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement [59.364418120895]
Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications.<n>We develop a novel relation-driven Mamba framework for effective UIE (RD-UIE)<n>Experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction [66.84997711357101]
EventMamba is a specialized model designed for event-based video reconstruction tasks.<n>We show that EventMamba markedly improves speed while delivering superior visual quality compared to Transformer-based methods.
arXiv Detail & Related papers (2025-03-25T14:46:45Z)
MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation [5.369567679302849]
We propose Mamba, a novel scene flow estimation network with a mamba-based decoder.<n>MambaFlow achieves state-of-the-art performance with real-time inference speed among existing works.<n>Experiments on the Argoverse 2 benchmark demonstrate that MambaFlow achieves state-of-the-art performance with real-time inference speed.
arXiv Detail & Related papers (2025-02-24T07:05:49Z)
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z)
STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks. PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z)
OccMamba: Semantic Occupancy Prediction with State Space Models [24.697645636701797]
OccMamba is the first Mamba-based network for semantic occupancy prediction.<n>Inspired by the global modeling and linear complexity of the Mamba architecture, we present the first OccMamba network for semantic occupancy prediction.
arXiv Detail & Related papers (2024-08-19T10:07:00Z)
MambaDepth: Enhancing Long-range Dependency for Self-Supervised Fine-Structured Monocular Depth Estimation [0.0]
MambaDepth is a versatile network tailored for self-supervised depth estimation. MambaDepth combines the U-Net's effectiveness in self-supervised depth estimation with the advanced capabilities of Mamba. MambaDepth proves its superior generalization capacities on other datasets such as Make3D and Cityscapes.
arXiv Detail & Related papers (2024-06-06T22:08:48Z)
LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation [9.862277278217045]
In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation. A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers. Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields.
arXiv Detail & Related papers (2024-03-12T05:34:51Z)
PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z)
VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.<n>At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.