DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization
- URL: http://arxiv.org/abs/2601.01784v1
- Date: Mon, 05 Jan 2026 04:35:39 GMT
- Title: DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization
- Authors: Boyang Zhao, Xin Liao, Jiaxin Chen, Xiaoshuai Wu, Yufeng Wu,
- Abstract summary: AIGC technology enables misleading viewers by tampering mere small segments within a video.<n> temporal forgery localization (TFL) aims to precisely pinpoint tampered segments.<n>We propose a underlinedual-stream graph learning and underlinedisentanglement framework for temporal forgery localization (DDNet)
- Score: 28.183875836729484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.
Related papers
- Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents [10.559617160878227]
GUIPruner is a training-free framework tailored for high-resolution GUI navigation.<n>It synergizes Temporal-temporal Resolution (TAR) and Stratified Structure-aware Pruning (SSP)<n>It consistently achieves state-of-the-art performance, effectively preventing the collapse observed in large-scale models under high-resolution compression.
arXiv Detail & Related papers (2026-02-26T17:12:40Z) - AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting [59.31340724915079]
Event Spotting is a key task for applications in sports analytics, robotics, and autonomous systems.<n>bfAdaSpot achieves state-of-the-art performance under strict evaluation metrics.
arXiv Detail & Related papers (2026-02-25T16:24:48Z) - Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges [1.9116784879310027]
Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory.<n>This report presents Deep Global Clustering (DGC), a conceptual framework for memory-efficient HSI segmentation.<n>DGC operates on small patches with overlapping regions to enforce consistency, enabling training in under 30 minutes on consumer hardware.
arXiv Detail & Related papers (2025-12-30T12:10:43Z) - UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction [83.48950950780554]
Building extraction from remote sensing images is a challenging task due to the complex structure variations of buildings.<n>Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models.<n>We present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet) to exploit high-quality global-local visual semantics.
arXiv Detail & Related papers (2025-12-15T02:59:16Z) - Graph Unlearning Meets Influence-aware Negative Preference Optimization [45.33243345077153]
In this paper, we introduce textbfINPO, an textbfInfluence-aware textbfNegative textbfPreference textbfOptimization framework.<n>We first analyze that NPO has slower divergence speed and theoretically propose that unlearning high-influence edges can reduce impact of unlearning.<n>Experiments conducted on five real-world datasets demonstrate that INPO-based model achieves state-of-the-art performance on all forget quality metrics.
arXiv Detail & Related papers (2025-10-22T11:18:00Z) - SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection [25.04992532067041]
We propose the Spectral Cross-Attentional Network (SpecXNet), a dual-domain architecture for robust deepfake detection.<n>Built atop a modified XceptionNet backbone, we embed the DDFC and DFA modules within a separable convolution block.<n>Our results highlight the effectiveness of unified spatial-spectral learning for robust and general deepfake detection.
arXiv Detail & Related papers (2025-09-26T08:51:59Z) - TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading [5.768165707140847]
We propose TD3Net, a temporal densely connected multi-d convolutional network that combines dense skip connections and temporal convolutions as the backend architecture.<n> Experimental results on a word-level lipreading task using two large publicly available datasets, Lip Reading in the Wild (LRW) and LRW-1000, indicate that the proposed method achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-19T06:55:03Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - FedSpeed: Larger Local Interval, Less Communication Round, and Higher
Generalization Accuracy [84.45004766136663]
Federated learning is an emerging distributed machine learning framework.
It suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting.
We propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems.
arXiv Detail & Related papers (2023-02-21T03:55:29Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Temporal Transductive Inference for Few-Shot Video Object Segmentation [27.140141181513425]
Few-shot object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training.
Key to our approach is the use of both global and local temporal constraints.
Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%.
arXiv Detail & Related papers (2022-03-27T14:08:30Z) - SOE-Net: A Self-Attention and Orientation Encoding Network for Point
Cloud based Place Recognition [50.9889997200743]
We tackle the problem of place recognition from point cloud data with a self-attention and orientation encoding network (SOE-Net)
SOE-Net fully explores the relationship between points and incorporates long-range context into point-wise local descriptors.
Experiments on various benchmark datasets demonstrate superior performance of the proposed network over the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-11-24T22:28:25Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.