Related papers: Counteracting temporal attacks in Video Copy Detection

Counteracting temporal attacks in Video Copy Detection

URL: http://arxiv.org/abs/2501.11171v1
Date: Sun, 19 Jan 2025 21:16:39 GMT
Title: Counteracting temporal attacks in Video Copy Detection
Authors: Katarzyna Fojcik, Piotr Syga,
Abstract summary: The META AI Challenge on video copy detection provided a benchmark for evaluating state-of-the-art methods.<n>Our analysis reveals significant limitations in the VED component, particularly in its ability to handle exact copies.<n>We propose an improved frame selection strategy based on local maxima of interframe differences.
Score: 1.0742675209112622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video Copy Detection (VCD) plays a crucial role in copyright protection and content verification by identifying duplicates and near-duplicates in large-scale video databases. The META AI Challenge on video copy detection provided a benchmark for evaluating state-of-the-art methods, with the Dual-level detection approach emerging as a winning solution. This method integrates Video Editing Detection and Frame Scene Detection to handle adversarial transformations and large datasets efficiently. However, our analysis reveals significant limitations in the VED component, particularly in its ability to handle exact copies. Moreover, Dual-level detection shows vulnerability to temporal attacks. To address it, we propose an improved frame selection strategy based on local maxima of interframe differences, which enhances robustness against adversarial temporal modifications while significantly reducing computational overhead. Our method achieves an increase of 1.4 to 5.8 times in efficiency over the standard 1 FPS approach. Compared to Dual-level detection method, our approach maintains comparable micro-average precision ($\mu$AP) while also demonstrating improved robustness against temporal attacks. Given 56\% reduced representation size and the inference time of more than 2 times faster, our approach is more suitable to real-world resource restriction.

Related papers

VideoPure: Diffusion-based Adversarial Purification for Video Recognition [21.317424798634086]
We propose the first diffusion-based video purification framework to improve video recognition models' adversarial robustness: VideoPure. We employ temporal DDIM inversion to transform the input distribution into a temporally consistent and trajectory-defined distribution, covering adversarial noise while preserving more video structure. We investigate the defense performance of our method against black-box, gray-box, and adaptive attacks on benchmark datasets and models.
arXiv Detail & Related papers (2025-01-25T00:24:51Z)
Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames. Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs. This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
UVL2: A Unified Framework for Video Tampering Localization [0.0]
Malicious video tampering can lead to public misunderstanding, property losses, and legal disputes. This paper proposes an effective video tampering localization network that significantly improves the detection performance of video inpainting and splicing.
arXiv Detail & Related papers (2023-09-28T03:13:09Z)
A Dual-level Detection Method for Video Copy Detection [13.517933749704866]
Meta AI hold Video Similarity Challenge on CVPR 2023 to push the technology forward. We propose a dual-level detection method with Video Editing Detection (VED) and Frame Scenes Detection (FSD) to tackle the core challenges on Video Copy Detection.
arXiv Detail & Related papers (2023-05-21T06:19:08Z)
DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z)
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames. NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z)
Temporal Early Exits for Efficient Video Object Detection [1.1470070927586016]
We propose temporal early exits to reduce the computational complexity of per-frame video object detection. Our method significantly reduces the computational complexity and execution of per-frame video object detection up to $34 times$ compared to existing methods.
arXiv Detail & Related papers (2021-06-21T15:49:46Z)
Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds. We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
Multiple Instance-Based Video Anomaly Detection using Deep Temporal Encoding-Decoding [5.255783459833821]
We propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos. The proposed approach uses both abnormal and normal video clips during the training phase. The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
arXiv Detail & Related papers (2020-07-03T08:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.