Beyond Real versus Fake Towards Intent-Aware Video Analysis
- URL: http://arxiv.org/abs/2511.22455v1
- Date: Thu, 27 Nov 2025 13:44:06 GMT
- Title: Beyond Real versus Fake Towards Intent-Aware Video Analysis
- Authors: Saurabh Atreya, Nabyl Quignon, Baptiste Chopin, Abhijit Das, Antitza Dantcheva,
- Abstract summary: We introduce IntentHQ: a new benchmark for human-centered intent analysis.<n>IntrepidHQ consists of 5168 videos annotated with 23 finegrained intent-categories.<n>We perform intent recognition with supervised and self-supervised multi-temporality models.
- Score: 9.47027153139396
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid advancement of generative models has led to increasingly realistic deepfake videos, posing significant societal and security risks. While existing detection methods focus on distinguishing real from fake videos, such approaches fail to address a fundamental question: What is the intent behind a manipulated video? Towards addressing this question, we introduce IntentHQ: a new benchmark for human-centered intent analysis, shifting the paradigm from authenticity verification to contextual understanding of videos. IntentHQ consists of 5168 videos that have been meticulously collected and annotated with 23 fine-grained intent-categories, including "Financial fraud", "Indirect marketing", "Political propaganda", as well as "Fear mongering". We perform intent recognition with supervised and self-supervised multi-modality models that integrate spatio-temporal video features, audio processing, and text analysis to infer underlying motivations and goals behind videos. Our proposed model is streamlined to differentiate between a wide range of intent-categories.
Related papers
- Video-BrowseComp: Benchmarking Agentic Video Research on Open Web [64.53060049124961]
Video-BrowseComp is a benchmark comprising 210 questions tailored for open-web agentic video reasoning.<n>It enforces a mandatory dependency on temporal visual evidence, ensuring answers cannot be derived solely through text search.<n>As the first open-web video research benchmark, Video-BrowseComp advances the field beyond passive perception toward proactive video reasoning.
arXiv Detail & Related papers (2025-12-28T19:08:27Z) - Video-CoM: Interactive Video Reasoning via Chain of Manipulations [78.64256470920166]
We introduce Interactive Video Reasoning, enabling models to "think with videos"<n>Our model, Video CoM, reasons through a Chain of Manipulations (CoM), performing iterative visual actions to gather and refine evidence.<n>Video CoM achieves strong results across nine video reasoning benchmarks, improving average performance by 3.6 percent over recent state of art models.
arXiv Detail & Related papers (2025-11-28T18:59:57Z) - Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models [65.23999399834638]
We introduce DeceptionDecoded, a benchmark of 12,000 image-caption pairs grounded in trustworthy reference articles.<n>The dataset captures both misleading and non-misleading cases, spanning manipulations across visual and textual modalities.<n>It supports three intent-centric tasks: misleading intent detection, misleading source attribution, and creator desire inference.
arXiv Detail & Related papers (2025-05-21T13:14:32Z) - How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach [46.85336335756483]
Learned 3D Evaluation (L3DE) is a method for assessing AI-generated videos' ability to simulate the real world in terms of 3D visual qualities and consistencies.<n>Confidence scores quantify the gap between real and synthetic videos in terms of 3D visual coherence.<n>L3DE extends to broader applications: benchmarking video generation models, serving as a deepfake detector, and enhancing video synthesis by inpainting flagged inconsistencies.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos [16.34393937800271]
generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities.
Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples.
We propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models.
arXiv Detail & Related papers (2024-06-13T21:52:49Z) - Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training [69.54948297520612]
Learning a generalist embodied agent poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets.
We introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-02-22T09:48:47Z) - VGMShield: Mitigating Misuse of Video Generative Models [7.1819804607793705]
VGMShield is a set of straightforward but effective mitigations through the lifecycle of fake video generation.<n>We start from fake video detection, trying to understand whether there is uniqueness in generated videos.<n>Then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it.
arXiv Detail & Related papers (2024-02-20T16:39:23Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Detecting Deepfake Videos Using Euler Video Magnification [1.8506048493564673]
Deepfake videos are manipulating videos using advanced machine learning techniques.
In this paper, we examine a technique for possible identification of deepfake videos.
Our approach uses features extracted from the Euler technique to train three models to classify counterfeit and unaltered videos.
arXiv Detail & Related papers (2021-01-27T17:37:23Z) - Cost Sensitive Optimization of Deepfake Detector [6.427063076424032]
We argue that deepfake detection task should be viewed as a screening task, where the user will screen a large number of videos daily.
It is clear then that only a small fraction of the uploaded videos are deepfakes, so the detection performance needs to be measured in a cost-sensitive way.
arXiv Detail & Related papers (2020-12-08T04:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.