Related papers: Unsupervised Deep Cross-modality Spectral Hashing

Unsupervised Deep Cross-modality Spectral Hashing

URL: http://arxiv.org/abs/2008.00223v3
Date: Tue, 18 Aug 2020 09:23:55 GMT
Title: Unsupervised Deep Cross-modality Spectral Hashing
Authors: Tuan Hoang and Thanh-Toan Do and Tam V. Nguyen and Ngai-Man Cheung
Abstract summary: The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
Score: 65.3842441716661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.

Related papers

PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval [6.5710696868737175]
Cross-modal hashing is a promising approach for efficient data retrieval and storage optimization. We present PromptHash, an innovative framework leveraging affinity prompt-aware collaborative learning for adaptive cross-modal hashing.
arXiv Detail & Related papers (2025-03-20T11:56:27Z)
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations [26.46540034821343]
We introduce SONO, a novel method leveraging Second-Order Neural Ordinary Differential Equations (Second-Order NODEs) to enhance cross-modal few-shot learning. Our second-order approach can approximate a broader class of functions, enhancing the model's expressive power and feature generalization capabilities. We utilize text-based image augmentation, exploiting CLIP's robust image-text correlation to enrich training data significantly.
arXiv Detail & Related papers (2024-12-20T11:42:41Z)
A Two-Stage Progressive Pre-training using Multi-Modal Contrastive Masked Autoencoders [5.069884983892437]
We propose a new progressive pre-training method for image understanding tasks which leverages RGB-D datasets. In the first stage, we pre-train the model using contrastive learning to learn cross-modal representations. In the second stage, we further pre-train the model using masked autoencoding and denoising/noise prediction. Our approach is scalable, robust and suitable for pre-training RGB-D datasets.
arXiv Detail & Related papers (2024-08-05T05:33:59Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing [1.6758573326215689]
Cross-modal text-image retrieval has attracted great attention in remote sensing. We introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS. Experimental results show that the proposed DUCH outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-19T07:25:25Z)
Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization [5.799838997511804]
Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. We propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision.
arXiv Detail & Related papers (2022-02-15T22:00:04Z)
Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing [1.6758573326215689]
We introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval. Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.
arXiv Detail & Related papers (2022-01-20T12:05:10Z)
Multi-Modal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH) We learn informative representations that can preserve both intra- and inter-modal similarities. The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous. We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.