The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
- URL: http://arxiv.org/abs/2512.21670v1
- Date: Thu, 25 Dec 2025 13:27:56 GMT
- Title: The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
- Authors: Subramanyam Sahoo, Jared Junkin,
- Abstract summary: We present a mechanistic interpretability framework for deepfake detection applied to a vision-language model.<n>Our approach combines a sparse autoencoder analysis of internal network representations with a novel forensic manifold analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deepfake detection models have achieved high accuracy in identifying synthetic media, but their decision processes remain largely opaque. In this paper we present a mechanistic interpretability framework for deepfake detection applied to a vision-language model. Our approach combines a sparse autoencoder (SAE) analysis of internal network representations with a novel forensic manifold analysis that probes how the model's features respond to controlled forensic artifact manipulations. We demonstrate that only a small fraction of latent features are actively used in each layer, and that the geometric properties of the model's feature manifold, including intrinsic dimensionality, curvature, and feature selectivity, vary systematically with different types of deepfake artifacts. These insights provide a first step toward opening the "black box" of deepfake detectors, allowing us to identify which learned features correspond to specific forensic artifacts and to guide the development of more interpretable and robust models.
Related papers
- Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization [49.71303998618939]
This paper presents a novel Propose-Rectify framework that bridges semantic reasoning with forensic-specific analysis.<n>Our framework ensures that initial semantic proposals are systematically validated and enhanced through concrete technical evidence, resulting in comprehensive detection accuracy and localization precision.
arXiv Detail & Related papers (2025-08-25T12:43:53Z) - FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap Deepfakes [9.462613446025001]
Face-fake Deepfake videos pose growing risks to digital security, privacy, and media integrity.<n>FAME is a framework designed to capture subtle artifacts specific to different face-generative models.<n>Results show that FAME consistently outperforms existing methods in both accuracy and runtime.
arXiv Detail & Related papers (2025-06-13T05:47:09Z) - Navigating the Latent Space Dynamics of Neural Models [45.503229142378814]
We present an alternative interpretation of neural models as dynamical systems acting on the latent manifold.<n>We show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map.<n>We propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data.
arXiv Detail & Related papers (2025-05-28T18:57:41Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Rethinking the Up-Sampling Operations in CNN-based Generative Network
for Generalizable Deepfake Detection [86.97062579515833]
We introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations.
A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft28 distinct generative models.
This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft11.6% improvement over existing methods.
arXiv Detail & Related papers (2023-12-16T14:27:06Z) - DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake
Detection [67.3143177137102]
Deepfake detection refers to detecting artificially generated or edited faces in images or videos.
We propose a novel Deepfake detection framework named DeepFidelity to adaptively distinguish real and fake faces.
arXiv Detail & Related papers (2023-12-07T07:19:45Z) - GazeForensics: DeepFake Detection via Gaze-guided Spatial Inconsistency
Learning [63.547321642941974]
We introduce GazeForensics, an innovative DeepFake detection method that utilizes gaze representation obtained from a 3D gaze estimation model.
Experiment results reveal that our proposed GazeForensics outperforms the current state-of-the-art methods.
arXiv Detail & Related papers (2023-11-13T04:48:33Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - Dynamic texture analysis for detecting fake faces in video sequences [6.1356022122903235]
This work explores the analysis of texture-temporal dynamics of the video signal.
The goal is to characterizing and distinguishing real fake sequences.
We propose to build multiple binary decision on the joint analysis of temporal segments.
arXiv Detail & Related papers (2020-07-30T07:21:24Z) - Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes [20.358053429294458]
We propose a novel human-centered approach for detecting forgery in face images, using dynamic prototypes as a form of visual explanations.
Extensive experimental results show that DPNet achieves competitive predictive performance, even on unseen testing datasets.
arXiv Detail & Related papers (2020-06-28T00:25:34Z) - DeepFake Detection by Analyzing Convolutional Traces [0.0]
We focus on the analysis of Deepfakes of human faces with the objective of creating a new detection method.
The proposed technique, by means of an Expectation Maximization (EM) algorithm, extracts a set of local features specifically addressed to model the underlying convolutional generative process.
Results demonstrated the effectiveness of the technique in distinguishing the different architectures and the corresponding generation process.
arXiv Detail & Related papers (2020-04-22T09:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.