Related papers: DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization

DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization

URL: http://arxiv.org/abs/2601.01784v1
Date: Mon, 05 Jan 2026 04:35:39 GMT
Title: DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization
Authors: Boyang Zhao, Xin Liao, Jiaxin Chen, Xiaoshuai Wu, Yufeng Wu,
Abstract summary: AIGC technology enables misleading viewers by tampering mere small segments within a video.<n> temporal forgery localization (TFL) aims to precisely pinpoint tampered segments.<n>We propose a underlinedual-stream graph learning and underlinedisentanglement framework for temporal forgery localization (DDNet)
Score: 28.183875836729484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.

Related papers

Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents [10.559617160878227]
GUIPruner is a training-free framework tailored for high-resolution GUI navigation.<n>It synergizes Temporal-temporal Resolution (TAR) and Stratified Structure-aware Pruning (SSP)<n>It consistently achieves state-of-the-art performance, effectively preventing the collapse observed in large-scale models under high-resolution compression.
arXiv Detail & Related papers (2026-02-26T17:12:40Z)
AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting [59.31340724915079]
Event Spotting is a key task for applications in sports analytics, robotics, and autonomous systems.<n>bfAdaSpot achieves state-of-the-art performance under strict evaluation metrics.
arXiv Detail & Related papers (2026-02-25T16:24:48Z)
Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges [1.9116784879310027]
Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory.<n>This report presents Deep Global Clustering (DGC), a conceptual framework for memory-efficient HSI segmentation.<n>DGC operates on small patches with overlapping regions to enforce consistency, enabling training in under 30 minutes on consumer hardware.
arXiv Detail & Related papers (2025-12-30T12:10:43Z)
UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction [83.48950950780554]
Building extraction from remote sensing images is a challenging task due to the complex structure variations of buildings.<n>Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models.<n>We present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet) to exploit high-quality global-local visual semantics.
arXiv Detail & Related papers (2025-12-15T02:59:16Z)
Graph Unlearning Meets Influence-aware Negative Preference Optimization [45.33243345077153]
In this paper, we introduce textbfINPO, an textbfInfluence-aware textbfNegative textbfPreference textbfOptimization framework.<n>We first analyze that NPO has slower divergence speed and theoretically propose that unlearning high-influence edges can reduce impact of unlearning.<n>Experiments conducted on five real-world datasets demonstrate that INPO-based model achieves state-of-the-art performance on all forget quality metrics.
arXiv Detail & Related papers (2025-10-22T11:18:00Z)
SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection [25.04992532067041]
We propose the Spectral Cross-Attentional Network (SpecXNet), a dual-domain architecture for robust deepfake detection.<n>Built atop a modified XceptionNet backbone, we embed the DDFC and DFA modules within a separable convolution block.<n>Our results highlight the effectiveness of unified spatial-spectral learning for robust and general deepfake detection.
arXiv Detail & Related papers (2025-09-26T08:51:59Z)
TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading [5.768165707140847]
We propose TD3Net, a temporal densely connected multi-d convolutional network that combines dense skip connections and temporal convolutions as the backend architecture.<n> Experimental results on a word-level lipreading task using two large publicly available datasets, Lip Reading in the Wild (LRW) and LRW-1000, indicate that the proposed method achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-19T06:55:03Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy [84.45004766136663]
Federated learning is an emerging distributed machine learning framework. It suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting. We propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems.
arXiv Detail & Related papers (2023-02-21T03:55:29Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
Temporal Transductive Inference for Few-Shot Video Object Segmentation [27.140141181513425]
Few-shot object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. Key to our approach is the use of both global and local temporal constraints. Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%.
arXiv Detail & Related papers (2022-03-27T14:08:30Z)
SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition [50.9889997200743]
We tackle the problem of place recognition from point cloud data with a self-attention and orientation encoding network (SOE-Net) SOE-Net fully explores the relationship between points and incorporates long-range context into point-wise local descriptors. Experiments on various benchmark datasets demonstrate superior performance of the proposed network over the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-11-24T22:28:25Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.