Deep Metric Multi-View Hashing for Multimedia Retrieval
- URL: http://arxiv.org/abs/2304.06358v1
- Date: Thu, 13 Apr 2023 09:25:35 GMT
- Title: Deep Metric Multi-View Hashing for Multimedia Retrieval
- Authors: Jian Zhu, Zhangmin Huang, Xiaohu Ruan, Yu Cui, Yongli Cheng, Lingfang
Zeng
- Abstract summary: We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems.
On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin.
- Score: 3.539519688102545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning the hash representation of multi-view heterogeneous data is an
important task in multimedia retrieval. However, existing methods fail to
effectively fuse the multi-view features and utilize the metric information
provided by the dissimilar samples, leading to limited retrieval precision.
Current methods utilize weighted sum or concatenation to fuse the multi-view
features. We argue that these fusion methods cannot capture the interaction
among different views. Furthermore, these methods ignored the information
provided by the dissimilar samples. We propose a novel deep metric multi-view
hashing (DMMVH) method to address the mentioned problems. Extensive empirical
evidence is presented to show that gate-based fusion is better than typical
methods. We introduce deep metric learning to the multi-view hashing problems,
which can utilize metric information of dissimilar samples. On the
MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current
state-of-the-art methods by a large margin (up to 15.28 mean Average Precision
(mAP) improvement).
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - CLIP Multi-modal Hashing for Multimedia Retrieval [7.2683522480676395]
We propose a novel CLIP Multi-modal Hashing ( CLIPMH) method.
Our method employs the CLIP framework to extract both text and vision features and then fuses them to generate hash code.
Compared with state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly improve performance.
arXiv Detail & Related papers (2024-10-10T10:13:48Z) - MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs [47.353720361676004]
multimodal misinformation detection methods often assume a single source and type of forgery for each sample.
The lack of a benchmark for mixed-source misinformation has hindered progress in this field.
We introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD.
arXiv Detail & Related papers (2024-06-13T03:04:28Z) - Adaptive Confidence Multi-View Hashing for Multimedia Retrieval [23.018331993442285]
Multi-view hash method converts heterogeneous data from multiple views into binary hash codes.
To conduct the confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method.
arXiv Detail & Related papers (2023-12-12T14:43:09Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Central Similarity Multi-View Hashing for Multimedia Retrieval [14.766486538338498]
We present a novel Central Similarity Multi-View Hashing (CSMVH) method to address the mentioned problems.
On the MS COCO and NUS-WIDE, the proposed CSMVH performs better than the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-08-26T05:43:29Z) - A Comparative Assessment of Multi-view fusion learning for Crop
Classification [3.883984493622102]
This work assesses different fusion strategies for crop classification in the CropHarvest dataset.
We present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance.
arXiv Detail & Related papers (2023-08-10T08:03:58Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Deep Multi-View Enhancement Hashing for Image Retrieval [40.974719473643724]
This paper proposes a supervised multi-view hash model which can enhance the multi-view information through neural networks.
The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets.
arXiv Detail & Related papers (2020-02-01T08:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.