Adaptive Confidence Multi-View Hashing for Multimedia Retrieval
- URL: http://arxiv.org/abs/2312.07327v2
- Date: Tue, 16 Jan 2024 08:40:21 GMT
- Title: Adaptive Confidence Multi-View Hashing for Multimedia Retrieval
- Authors: Jian Zhu, Yu Cui, Zhangmin Huang, Xingyu Li, Lei Liu, Lingfang Zeng,
Li-Rong Dai
- Abstract summary: Multi-view hash method converts heterogeneous data from multiple views into binary hash codes.
To conduct the confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method.
- Score: 23.018331993442285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multi-view hash method converts heterogeneous data from multiple views
into binary hash codes, which is one of the critical technologies in multimedia
retrieval. However, the current methods mainly explore the complementarity
among multiple views while lacking confidence learning and fusion. Moreover, in
practical application scenarios, the single-view data contain redundant noise.
To conduct the confidence learning and eliminate unnecessary noise, we propose
a novel Adaptive Confidence Multi-View Hashing (ACMVH) method. First, a
confidence network is developed to extract useful information from various
single-view features and remove noise information. Furthermore, an adaptive
confidence multi-view network is employed to measure the confidence of each
view and then fuse multi-view features through a weighted summation. Lastly, a
dilation network is designed to further enhance the feature representation of
the fused features. To the best of our knowledge, we pioneer the application of
confidence learning into the field of multimedia retrieval. Extensive
experiments on two public datasets show that the proposed ACMVH performs better
than state-of-the-art methods (maximum increase of 3.24%). The source code is
available at https://github.com/HackerHyper/ACMVH.
Related papers
- Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Deep Metric Multi-View Hashing for Multimedia Retrieval [3.539519688102545]
We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems.
On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-04-13T09:25:35Z) - Exploring Graph-aware Multi-View Fusion for Rumor Detection on Social
Media [23.231289922442414]
We propose a novel multi-view fusion framework for rumor representation learning and classification.
It encodes the multiple views based on Graph Convolutional Networks (GCN) and leverages Convolutional Neural Networks (CNN)
Experimental results on two public datasets demonstrate that our method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-08T13:27:43Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
More Step Towards Generalization [65.09758931804478]
Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs.
A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones.
arXiv Detail & Related papers (2022-03-14T13:15:09Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - Deep Multi-View Enhancement Hashing for Image Retrieval [40.974719473643724]
This paper proposes a supervised multi-view hash model which can enhance the multi-view information through neural networks.
The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets.
arXiv Detail & Related papers (2020-02-01T08:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.