Related papers: MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval

MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval

URL: http://arxiv.org/abs/2506.16353v1
Date: Thu, 19 Jun 2025 14:30:55 GMT
Title: MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval
Authors: Chao He, Hongxi Wei,
Abstract summary: Vision Mamba with linear time complexity has attracted extensive attention from researchers.<n>We propose a visual state space hashing model, called MambaHash.<n>We have conducted comprehensive experiments on three widely used datasets.
Score: 0.3880517371454968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep image hashing aims to enable effective large-scale image retrieval by mapping the input images into simple binary hash codes through deep neural networks. More recently, Vision Mamba with linear time complexity has attracted extensive attention from researchers by achieving outstanding performance on various computer tasks. Nevertheless, the suitability of Mamba for large-scale image retrieval tasks still needs to be explored. Towards this end, we propose a visual state space hashing model, called MambaHash. Concretely, we propose a backbone network with stage-wise architecture, in which grouped Mamba operation is introduced to model local and global information by utilizing Mamba to perform multi-directional scanning along different groups of the channel. Subsequently, the proposed channel interaction attention module is used to enhance information communication across channels. Finally, we meticulously design an adaptive feature enhancement module to increase feature diversity and enhance the visual representation capability of the model. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that compared with the state-of-the-art deep hashing methods, our proposed MambaHash has well efficiency and superior performance to effectively accomplish large-scale image retrieval tasks. Source code is available https://github.com/shuaichaochao/MambaHash.git

Related papers

DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba.<n>By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details.<n>Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications.<n>We show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies.<n>For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval [0.3880517371454968]
We propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods.
arXiv Detail & Related papers (2024-05-13T07:45:20Z)
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion [14.293042131263924]
In image fusion tasks, images from different sources possess distinct characteristics. Mamba, as a state space model, has emerged in the field of natural language processing. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task.
arXiv Detail & Related papers (2024-04-14T16:09:33Z)
ReMamber: Referring Image Segmentation with Mamba Twister [51.291487576255435]
ReMamber is a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism.
arXiv Detail & Related papers (2024-03-26T16:27:37Z)
Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion [0.0]
We introduce Mamba HUNet, a novel architecture tailored for robust and efficient segmentation tasks. We first converted HUNet into a lighter version, maintaining performance parity and then integrated this lighter HUNet into Mamba HUNet, further enhancing its efficiency. Experimental results on publicly available Magnetic Resonance Imaging scans, notably in Multiple Sclerosis lesion segmentation, demonstrate Mamba HUNet's effectiveness across diverse segmentation tasks.
arXiv Detail & Related papers (2024-03-26T06:57:50Z)
MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection [72.46396769642787]
We develop a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient infrared small target detection. MiM-ISTD is $8 times$ faster than the SOTA method and reduces GPU memory usage by 62.2$%$ when testing on $2048 times 2048$ images.
arXiv Detail & Related papers (2024-03-04T15:57:29Z)
VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.<n>At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z)
Visual Search at Alibaba [38.106392977338146]
We take advantage of large image collection of Alibaba and state-of-the-art deep learning techniques to perform visual search at scale. Model and search-based fusion approach is introduced to effectively predict categories. We propose a deep CNN model for joint detection and feature learning by mining user click behavior.
arXiv Detail & Related papers (2021-02-09T06:46:50Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
Deep Reinforcement Learning with Label Embedding Reward for Supervised Image Hashing [85.84690941656528]
We introduce a novel decision-making approach for deep supervised hashing. We learn a deep Q-network with a novel label embedding reward defined by Bose-Chaudhuri-Hocquenghem codes. Our approach outperforms state-of-the-art supervised hashing methods under various code lengths.
arXiv Detail & Related papers (2020-08-10T09:17:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.