Related papers: When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges

When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges

URL: http://arxiv.org/abs/2508.09022v2
Date: Wed, 13 Aug 2025 03:21:16 GMT
Title: When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges
Authors: Zhiqiang Yang, Renshuai Tao, Xiaolong Zheng, Guodong Yang, Chunjie Zhang,
Abstract summary: As AI-generated content becomes increasingly realistic, even textbfhuman annotators struggle to distinguish between deepfakes and authentic images.<n>There is a growing demand for approaches that can effectively utilize large-scale unlabeled data from online social networks.<n>In this paper, we introduce the Dual-Path Guidance Network (DPGNet), to tackle two key challenges: (1) bridging the domain gap between faces from different generation models, and (2) utilizing unlabeled image samples.
Score: 16.185033263537317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing deepfake detection methods heavily depend on labeled training data. However, as AI-generated content becomes increasingly realistic, even \textbf{human annotators struggle to distinguish} between deepfakes and authentic images. This makes the labeling process both time-consuming and less reliable. Specifically, there is a growing demand for approaches that can effectively utilize large-scale unlabeled data from online social networks. Unlike typical unsupervised learning tasks, where categories are distinct, AI-generated faces closely mimic real image distributions and share strong similarities, causing performance drop in conventional strategies. In this paper, we introduce the Dual-Path Guidance Network (DPGNet), to tackle two key challenges: (1) bridging the domain gap between faces from different generation models, and (2) utilizing unlabeled image samples. The method features two core modules: text-guided cross-domain alignment, which uses learnable prompts to unify visual and textual embeddings into a domain-invariant feature space, and curriculum-driven pseudo label generation, which dynamically exploit more informative unlabeled samples. To prevent catastrophic forgetting, we also facilitate bridging between domains via cross-domain knowledge distillation. Extensive experiments on \textbf{11 popular datasets}, show that DPGNet outperforms SoTA approaches by \textbf{6.3\%}, highlighting its effectiveness in leveraging unlabeled data to address the annotation challenges posed by the increasing realism of deepfakes.

Related papers

SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing [13.549403813487022]
Unsupervised domain adaptation (UDA) enables models to learn from unlabeled target domain data while leveraging labeled source domain data.<n>We propose integrating contrastive learning into UDA, enhancing the model's ability to capture semantic information in the target domain.<n>Our method, SimSeg, outperforms existing approaches, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-17T11:59:39Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Dual-level Interaction for Domain Adaptive Semantic Segmentation [0.0]
We propose a dual-level interaction for domain adaptation (DIDA) in semantic segmentation. Explicitly, we encourage the different augmented views of the same pixel to have similar class prediction. Our method outperforms the state-of-the-art by a notable margin, especially on confusing and long-tailed classes.
arXiv Detail & Related papers (2023-07-16T07:51:18Z)
Adaptive Face Recognition Using Adversarial Information Network [57.29464116557734]
Face recognition models often degenerate when training data are different from testing data. We propose a novel adversarial information network (AIN) to address it.
arXiv Detail & Related papers (2023-05-23T02:14:11Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Feature Representation Learning for Unsupervised Cross-domain Image Retrieval [73.3152060987961]
Current supervised cross-domain image retrieval methods can achieve excellent performance. The cost of data collection and labeling imposes an intractable barrier to practical deployment in real applications. We introduce a new cluster-wise contrastive learning mechanism to help extract class semantic-aware features.
arXiv Detail & Related papers (2022-07-20T07:52:14Z)
Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations. We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification [25.96221714337815]
Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain. Existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms. We propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning.
arXiv Detail & Related papers (2020-09-13T10:12:02Z)
Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification [44.9767103065442]
One-shot re-identification is a potential candidate towards reducing this labeling effort. Current one-shot re-identification methods function by modeling the inter-relationships amongst the labeled and the unlabeled data. We propose a new framework named Temporal Consistency Progressive Learning, which uses temporal coherence as a novel self-supervised auxiliary task.
arXiv Detail & Related papers (2020-07-21T19:49:06Z)
Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field. This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation. Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
Extreme Consistency: Overcoming Annotation Scarcity and Domain Shifts [2.707399740070757]
Supervised learning has proved effective for medical image analysis. It can utilize only the small labeled portion of data. It fails to leverage the large amounts of unlabeled data that is often available in medical image datasets.
arXiv Detail & Related papers (2020-04-15T15:32:01Z)
Person Re-identification: Implicitly Defining the Receptive Fields of Deep Learning Classification Frameworks [5.123298347655088]
This paper describes a solution for implicitly driving the inference of the networks' receptive fields. We use a segmentation module to distinguish between the foreground (important)/background (irrelevant) parts of each learning instance. This strategy typically drives the networks to early convergence and appropriate solutions, where the identity and descriptions are not correlated.
arXiv Detail & Related papers (2020-01-30T11:45:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.