CLIP Multi-modal Hashing for Multimedia Retrieval
- URL: http://arxiv.org/abs/2410.07783v1
- Date: Thu, 10 Oct 2024 10:13:48 GMT
- Title: CLIP Multi-modal Hashing for Multimedia Retrieval
- Authors: Jian Zhu, Mingkai Sheng, Zhangmin Huang, Jingfei Chang, Jinling Jiang, Jian Long, Cheng Luo, Lei Liu,
- Abstract summary: We propose a novel CLIP Multi-modal Hashing ( CLIPMH) method.
Our method employs the CLIP framework to extract both text and vision features and then fuses them to generate hash code.
Compared with state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly improve performance.
- Score: 7.2683522480676395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal hashing methods are widely used in multimedia retrieval, which can fuse multi-source data to generate binary hash code. However, the individual backbone networks have limited feature expression capabilities and are not jointly pre-trained on large-scale unsupervised multi-modal data, resulting in low retrieval accuracy. To address this issue, we propose a novel CLIP Multi-modal Hashing (CLIPMH) method. Our method employs the CLIP framework to extract both text and vision features and then fuses them to generate hash code. Due to enhancement on each modal feature, our method has great improvement in the retrieval performance of multi-modal hashing methods. Compared with state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly improve performance (a maximum increase of 8.38% in mAP).
Related papers
- Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.
We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.
We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - Multi-modal Crowd Counting via a Broker Modality [64.5356816448361]
Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images.
We propose a novel approach by introducing an auxiliary broker modality and frame the task as a triple-modal learning problem.
We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models.
arXiv Detail & Related papers (2024-07-10T10:13:11Z) - CLIP Multi-modal Hashing: A new baseline CLIPMH [4.057431980018267]
We propose a new baseline CLIP Multi-modal Hashing ( CLIPMH) method.
It uses CLIP model to extract text and image features, and then fuse to generate hash code.
In comparison to state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly enhance performance.
arXiv Detail & Related papers (2023-08-22T21:29:55Z) - Deep Metric Multi-View Hashing for Multimedia Retrieval [3.539519688102545]
We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems.
On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-04-13T09:25:35Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Online Enhanced Semantic Hashing: Towards Effective and Efficient
Retrieval for Streaming Multi-Modal Data [21.157717777481572]
We propose a new model, termed Online enhAnced SemantIc haShing (OASIS)
We design novel semantic-enhanced representation for data, which could help handle the new coming classes.
Our method can exceed the state-of-the-art models.
arXiv Detail & Related papers (2021-09-09T13:30:31Z) - MOON: Multi-Hash Codes Joint Learning for Cross-Media Retrieval [30.77157852327981]
Cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost.
We develop a novel Multiple hash cOdes jOint learNing method (MOON) for cross-media retrieval.
arXiv Detail & Related papers (2021-08-17T14:47:47Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z) - Creating Something from Nothing: Unsupervised Knowledge Distillation for
Cross-Modal Hashing [132.22315429623575]
Cross-modal hashing (CMH) can map contents from different modalities, especially in vision and language, into the same space.
There are two main frameworks for CMH, differing from each other in whether semantic supervision is required.
In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method.
arXiv Detail & Related papers (2020-04-01T08:32:15Z) - A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries.
With the development of deep learning, deep hashing methods show more advantages than traditional methods.
Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization.
Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z) - Deep Multi-View Enhancement Hashing for Image Retrieval [40.974719473643724]
This paper proposes a supervised multi-view hash model which can enhance the multi-view information through neural networks.
The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets.
arXiv Detail & Related papers (2020-02-01T08:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.