A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization
- URL: http://arxiv.org/abs/2507.16596v2
- Date: Mon, 04 Aug 2025 08:10:14 GMT
- Title: A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization
- Authors: Wenbo Xu, Junyan Wu, Wei Lu, Xiangyang Luo, Qian Wang,
- Abstract summary: We present a framework for weakly-supervised temporal forgery localization.<n>It aims to identify temporal partial forged segments using only video-level annotations.<n>Extensive experiments demonstrate the effectiveness of the proposed framework.
- Score: 21.13433908232578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current researches on Deepfake forensics often treat detection as a classification task or temporal forgery localization problem, which are usually restrictive, time-consuming, and challenging to scale for large datasets. To resolve these issues, we present a multimodal deviation perceiving framework for weakly-supervised temporal forgery localization (MDP), which aims to identify temporal partial forged segments using only video-level annotations. The MDP proposes a novel multimodal interaction mechanism (MI) and an extensible deviation perceiving loss to perceive multimodal deviation, which achieves the refined start and end timestamps localization of forged segments. Specifically, MI introduces a temporal property preserving cross-modal attention to measure the relevance between the visual and audio modalities in the probabilistic embedding space. It could identify the inter-modality deviation and construct comprehensive video features for temporal forgery localization. To explore further temporal deviation for weakly-supervised learning, an extensible deviation perceiving loss has been proposed, aiming at enlarging the deviation of adjacent segments of the forged samples and reducing that of genuine samples. Extensive experiments demonstrate the effectiveness of the proposed framework and achieve comparable results to fully-supervised approaches in several evaluation metrics.
Related papers
- Weakly Supervised Multimodal Temporal Forgery Localization via Multitask Learning [17.800327873103885]
Deepfake videos have caused a trust crisis and impaired social stability.<n>We propose a novel weakly supervised multimodal temporal forgery localization via multitask learning.<n>Extensive experiments demonstrate the effectiveness of multitask learning for WS-MTFL.
arXiv Detail & Related papers (2025-08-04T08:22:39Z) - Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization [60.73623588349311]
We propose a universal context-aware contrastive learning framework (UniCaCLF) for temporal forgery localization.<n>Our approach leverages supervised contrastive learning to discover and identify forged instants by means of anomaly detection.<n>An efficient context-aware contrastive coding is introduced to further push the limit of instant feature distinguishability between genuine and forged instants.
arXiv Detail & Related papers (2025-06-10T06:40:43Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - Spatio-temporal predictive tasks for abnormal event detection in videos [60.02503434201552]
We propose new constrained pretext tasks to learn object level normality patterns.
Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics.
Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies.
arXiv Detail & Related papers (2022-10-27T19:45:12Z) - An Unsupervised Short- and Long-Term Mask Representation for
Multivariate Time Series Anomaly Detection [2.387411589813086]
This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR)
Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2022-08-19T09:34:11Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive
Learning [42.22064610886404]
We present a general framework of predictive learning, in which the encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame dependencies.
To parallelize the temporal module, we propose the Temporal Attention Unit (TAU), which decomposes the temporal attention into intraframe statical attention and inter-frame dynamical attention.
arXiv Detail & Related papers (2022-06-24T07:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.