Related papers: TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

URL: http://arxiv.org/abs/2105.01823v1
Date: Wed, 5 May 2021 01:35:53 GMT
Title: TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Authors: Yongbiao Chen (1), Sheng Zhang (2), Fangxin Liu (1), Zhigang Chang (1), Mang Ye (3), Zhengwei Qi (1) ((1) Shanghai Jiao Tong University, (2) University of Southern California, (3) Wuhan University)
Abstract summary: We present textbfTranshash, a pure transformer-based framework for deep hashing learning. We achieve 8.2%, 2.6%, 12.7% performance gains in terms of average textitmAP for different hash bit lengths on three public datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt{Resnet}\cite{he2016deep}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf{Transhash}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit{Vision Transformer} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner.~To the best of our knowledge, this is the first work to tackle deep hashing learning problems without convolutional neural networks (\textit{CNNs}). We perform comprehensive experiments on three widely-studied datasets: \textbf{CIFAR-10}, \textbf{NUSWIDE} and \textbf{IMAGENET}. The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically, we achieve 8.2\%, 2.6\%, 12.7\% performance gains in terms of average \textit{mAP} for different hash bit lengths on three public datasets, respectively.

Related papers

HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval [0.3880517371454968]
We propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods.
arXiv Detail & Related papers (2024-05-13T07:45:20Z)
DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identification [5.407157027628579]
We propose a deep hash-based vehicle re-identification framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efficiency. DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module. textbfDVHN of $2048$ bits can achieve 13.94% and 10.21% accuracy improvement in terms of textbfmAP and textbfRank@1 for textbfVehicleID (800) dataset.
arXiv Detail & Related papers (2021-12-09T14:11:27Z)
PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval [68.05570413133462]
We propose a Pyramid Hybrid Pooling Quantization (PHPQ) module to capture and preserve fine-grained semantic information from multi-level features. Experiments on two widely-used public benchmarks, CUB-200-2011 and Stanford Dogs, demonstrate that PHPQ outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-09-11T07:21:02Z)
Contextual Transformer Networks for Visual Recognition [103.79062359677452]
We design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix. Our CoT block is appealing in the view that it can readily replace each $3times3$ convolution in ResNet architectures.
arXiv Detail & Related papers (2021-07-26T16:00:21Z)
Deep Reinforcement Learning with Label Embedding Reward for Supervised Image Hashing [85.84690941656528]
We introduce a novel decision-making approach for deep supervised hashing. We learn a deep Q-network with a novel label embedding reward defined by Bose-Chaudhuri-Hocquenghem codes. Our approach outperforms state-of-the-art supervised hashing methods under various code lengths.
arXiv Detail & Related papers (2020-08-10T09:17:20Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
A survey on deep hashing for image retrieval [7.156209824590489]
I propose a Shadow Recurrent Hashing(SRH) method as a try to break through the bottleneck of existing hashing methods. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.
arXiv Detail & Related papers (2020-06-10T03:01:59Z)
Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous. We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)
Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph. It is updated by decoding in the context of an auto-encoder. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.