Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
- URL: http://arxiv.org/abs/2508.10110v1
- Date: Wed, 13 Aug 2025 18:06:29 GMT
- Title: Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
- Authors: Sushrut Patwardhan, Raghavendra Ramachandra, Sushma Venkatesh,
- Abstract summary: We present a multimodal learning approach that can provide a textual description of morphing attack detection.<n>We first show that zero-shot evaluation of the proposed framework can yield not only generalizable morphing attack detection, but also predict the most relevant text snippet.
- Score: 3.013675405024281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Morphing attack detection has become an essential component of face recognition systems for ensuring a reliable verification scenario. In this paper, we present a multimodal learning approach that can provide a textual description of morphing attack detection. We first show that zero-shot evaluation of the proposed framework using Contrastive Language-Image Pretraining (CLIP) can yield not only generalizable morphing attack detection, but also predict the most relevant text snippet. We present an extensive analysis of ten different textual prompts that include both short and long textual prompts. These prompts are engineered by considering the human understandable textual snippet. Extensive experiments were performed on a face morphing dataset that was developed using a publicly available face biometric dataset. We present an evaluation of SOTA pre-trained neural networks together with the proposed framework in the zero-shot evaluation of five different morphing generation techniques that are captured in three different mediums.
Related papers
- Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer [13.21801650767302]
Face morphing attacks have posed severe threats to Face Recognition Systems (FRS), which are operated in border control and passport issuance use cases.<n>We propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from Vision Transformer (ViT) architecture.<n>Experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets.
arXiv Detail & Related papers (2025-01-16T20:09:19Z) - Revisiting Tampered Scene Text Detection in the Era of Generative AI [33.38946428507517]
We present open-set tampered scene text detection, which evaluates forensics models on their ability to identify both seen and unseen forgery types.<n>We introduce a novel and effective training paradigm that subtly alters the texture of selected texts within an image and trains the model to identify these regions.<n>We also present DAF, a framework that improves open-set generalization by distinguishing between the features of authentic and tampered text.
arXiv Detail & Related papers (2024-07-31T08:17:23Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Holistic Visual-Textual Sentiment Analysis with Prior Models [64.48229009396186]
We propose a holistic method that achieves robust visual-textual sentiment analysis.
The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions.
arXiv Detail & Related papers (2022-11-23T14:40:51Z) - Deepfake Text Detection: Limitations and Opportunities [4.283184763765838]
We collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild.
We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker.
Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.
arXiv Detail & Related papers (2022-10-17T20:40:14Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - Identifying Adversarial Attacks on Text Classifiers [32.958568467774704]
In this paper, we analyze adversarial text to determine which methods were used to create it.
Our first contribution is an extensive dataset for attack detection and labeling.
As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification.
arXiv Detail & Related papers (2022-01-21T06:16:04Z) - Asymmetric Modality Translation For Face Presentation Attack Detection [55.09300842243827]
Face presentation attack detection (PAD) is an essential measure to protect face recognition systems from being spoofed by malicious users.
We propose a novel framework based on asymmetric modality translation forPAD in bi-modality scenarios.
Our method achieves state-of-the-art performance under different evaluation protocols.
arXiv Detail & Related papers (2021-10-18T08:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.