Related papers: How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms

How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms

URL: http://arxiv.org/abs/2603.04456v1
Date: Mon, 02 Mar 2026 14:31:51 GMT
Title: How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms
Authors: Michael Rettinger, Ben Beaumont, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le,
Abstract summary: Deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity.<n>This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches.
Score: 1.189955933770711
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.

Related papers

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection [95.08316274158165]
X-AIGD provides pixel-level, categorized annotations of perceptual artifacts, spanning low-level distortions, high-level semantics, and cognitive-level counterfactuals.<n>Existing AIGI detectors demonstrate negligible reliance on perceptual artifacts, even at the most basic distortion level.<n>Explicitly aligning model attention with artifact regions can increase the interpretability and generalization of detectors.
arXiv Detail & Related papers (2026-01-27T10:09:17Z)
Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection [59.04089915447622]
ForenAgent is an interactive IFD framework that enables MLLMs to autonomously generate, execute, and refine Python-based low-level tools around the detection objective.<n>Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication.<n>Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks.
arXiv Detail & Related papers (2025-12-18T08:38:44Z)
Will AI also replace inspectors? Investigating the potential of generative AIs in usability inspection [0.0]
This study examines the performance of generative AIs in identifying usability problems, comparing them to those of experienced human inspectors.<n>While inspectors achieved the highest levels of precision and overall coverage, the AIs demonstrated high individual performance and discovered many novel defects, but with a higher rate of false positives and redundant reports.<n>These findings suggest that AI, in its current stage, cannot replace human inspectors but can serve as a valuable augmentation tool to improve efficiency and expand defect coverage.
arXiv Detail & Related papers (2025-10-19T23:59:15Z)
RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z)
Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework [8.017827642932746]
Generalized Attribute Utility and Detectability-Induced bias Testing (G-AUDIT) for datasets is a modality-agnostic dataset auditing framework.<n>Our method examines the relationship between task-level annotations and data properties including patient attributes.<n>G-AUDIT successfully identifies subtle biases commonly overlooked by traditional qualitative methods.
arXiv Detail & Related papers (2025-03-13T02:16:48Z)
Adversarial Robustness of AI-Generated Image Detectors in the Real World [13.52355280061187]
We show that current state-of-the-art classifiers are vulnerable to adversarial examples under real-world conditions.<n>Most attacks remain effective even when images are degraded during the upload to, e.g., social media platforms.<n>In a case study, we demonstrate that these robustness challenges are also found in commercial tools by conducting black-box attacks on HIVE.
arXiv Detail & Related papers (2024-10-02T14:11:29Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations [29.59765394512256]
This study focuses on the deepfake detection of facial images of individual public figures.<n>We demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks.<n> Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17%.
arXiv Detail & Related papers (2023-12-13T10:21:00Z)
CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF) Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts. It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z)
Improving Object Detection in Medical Image Analysis through Multiple Expert Annotators: An Empirical Investigation [0.3670422696827525]
The work discusses the use of machine learning algorithms for anomaly detection in medical image analysis. We introduce a simple and effective approach that aggregates annotations from multiple annotators with varying levels of expertise. We then aim to improve the efficiency of predictive models in abnormal detection tasks by estimating hidden labels from multiple annotations and using a re-weighted loss function to improve detection performance.
arXiv Detail & Related papers (2023-03-29T07:34:20Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.