Related papers: Consistent and Invariant Generalization Learning for Short-video Misinformation Detection

Consistent and Invariant Generalization Learning for Short-video Misinformation Detection

URL: http://arxiv.org/abs/2507.04061v2
Date: Wed, 06 Aug 2025 06:38:14 GMT
Title: Consistent and Invariant Generalization Learning for Short-video Misinformation Detection
Authors: Hanghui Guo, Weijie Shi, Mengze Li, Juncheng Li, Hao Chen, Yue Cui, Jiajie Xu, Jia Zhu, Jiawei Shen, Zhangze Chen, Sirui Han,
Abstract summary: Short-video misinformation detection has attracted wide attention in the multi-modal domain.<n>Current models often exhibit unsatisfactory performance on unseen domains due to domain gaps.<n>We propose a new DOmain generalization model via ConsisTency and invariance learning for shORt-video misinformation detection.
Score: 10.402862106017965
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Short-video misinformation detection has attracted wide attention in the multi-modal domain, aiming to accurately identify the misinformation in the video format accompanied by the corresponding audio. Despite significant advancements, current models in this field, trained on particular domains (source domains), often exhibit unsatisfactory performance on unseen domains (target domains) due to domain gaps. To effectively realize such domain generalization on the short-video misinformation detection task, we propose deep insights into the characteristics of different domains: (1) The detection on various domains may mainly rely on different modalities (i.e., mainly focusing on videos or audios). To enhance domain generalization, it is crucial to achieve optimal model performance on all modalities simultaneously. (2) For some domains focusing on cross-modal joint fraud, a comprehensive analysis relying on cross-modal fusion is necessary. However, domain biases located in each modality (especially in each frame of videos) will be accumulated in this fusion process, which may seriously damage the final identification of misinformation. To address these issues, we propose a new DOmain generalization model via ConsisTency and invariance learning for shORt-video misinformation detection (named DOCTOR), which contains two characteristic modules: (1) We involve the cross-modal feature interpolation to map multiple modalities into a shared space and the interpolation distillation to synchronize multi-modal learning; (2) We design the diffusion model to add noise to retain core features of multi modal and enhance domain invariant features through cross-modal guided denoising. Extensive experiments demonstrate the effectiveness of our proposed DOCTOR model. Our code is public available at https://github.com/ghh1125/DOCTOR.

Related papers

Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency. Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z)
POND: Multi-Source Time Series Domain Adaptation with Information-Aware Prompt Tuning [40.197245493051526]
Time series domain adaptation stands as a pivotal and intricate challenge with diverse applications. We introduce PrOmpt-based domaiN Discrimination (POND), the first framework to utilize prompts for time series domain adaptation. Our proposed POND model outperforms all state-of-the-art comparison methods by up to $66%$ on the F1-score.
arXiv Detail & Related papers (2023-12-19T15:57:37Z)
Unified Domain Adaptive Semantic Segmentation [96.74199626935294]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.<n>We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.<n>Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z)
UOD: Universal One-shot Detection of Anatomical Landmarks [16.360644135635333]
We develop a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD) UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. We investigate both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains.
arXiv Detail & Related papers (2023-06-13T08:19:14Z)
Causality-based Dual-Contrastive Learning Framework for Domain Generalization [16.81075442901155]
Domain Generalization (DG) is essentially a sub-branch of out-of-distribution generalization. In this paper, we propose a Dual-Contrastive Learning (DCL) module on feature and prototype contrast. We also introduce a Similarity-based Hard-pair Mining (SHM) strategy to leverage information on diversity shift.
arXiv Detail & Related papers (2023-01-22T13:07:24Z)
Attention Diversification for Domain Generalization [92.02038576148774]
Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. When applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. We propose a novel Attention Diversification framework, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated.
arXiv Detail & Related papers (2022-10-09T09:15:21Z)
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval [55.122020263319634]
Video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain but the domain of interest only contains unannotated datasets. We propose a novel Multi-Modal Cross-Domain Alignment network to transfer the annotation knowledge from the source domain to the target domain.
arXiv Detail & Related papers (2022-09-23T12:58:20Z)
INDIGO: Intrinsic Multimodality for Domain Generalization [26.344372409315177]
We study how multimodal information can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. We propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO)
arXiv Detail & Related papers (2022-06-13T05:41:09Z)
Compound Domain Generalization via Meta-Knowledge Encoding [55.22920476224671]
We introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions. We harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space. Experiments on four standard Domain Generalization benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
arXiv Detail & Related papers (2022-03-24T11:54:59Z)
Domain Generalization via Frequency-based Feature Disentanglement and Interaction [23.61154228837516]
Domain generalization aims at mining domain-irrelevant knowledge from multiple source domains. We introduce (i) an encoder-decoder structure for high-frequency and low-frequency feature disentangling, (ii) an information interaction mechanism that ensures helpful knowledge from both parts can cooperate effectively. The proposed method obtains state-of-the-art results on three widely used domain generalization benchmarks.
arXiv Detail & Related papers (2022-01-20T07:42:12Z)
TAL: Two-stream Adaptive Learning for Generalizable Person Re-identification [115.31432027711202]
We argue that both domain-specific and domain-invariant features are crucial for improving the generalization ability of re-id models. We name two-stream adaptive learning (TAL) to simultaneously model these two kinds of information. Our framework can be applied to both single-source and multi-source domain generalization tasks.
arXiv Detail & Related papers (2021-11-29T01:27:42Z)
Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area. Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos. This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.