Emergent Morphing Attack Detection in Open Multi-modal Large Language Models
- URL: http://arxiv.org/abs/2602.15461v1
- Date: Tue, 17 Feb 2026 09:56:33 GMT
- Title: Emergent Morphing Attack Detection in Open Multi-modal Large Language Models
- Authors: Marija Ivanovska, Vitomir Štruc,
- Abstract summary: Face morphing attacks threaten biometric verification.<n>Most morphing attack detection (MAD) systems require task-specific training and generalize poorly to unseen attack types.<n>We present the first systematic zero-shot evaluation of open-source multimodal large language models (MLLMs) for single-image MAD.
- Score: 1.9620938589583623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face morphing attacks threaten biometric verification, yet most morphing attack detection (MAD) systems require task-specific training and generalize poorly to unseen attack types. Meanwhile, open-source multimodal large language models (MLLMs) have demonstrated strong visual-linguistic reasoning, but their potential in biometric forensics remains underexplored. In this paper, we present the first systematic zero-shot evaluation of open-source MLLMs for single-image MAD, using publicly available weights and a standardized, reproducible protocol. Across diverse morphing techniques, many MLLMs show non-trivial discriminative ability without any fine-tuning or domain adaptation, and LLaVA1.6-Mistral-7B achieves state-of-the-art performance, surpassing highly competitive task-specific MAD baselines by at least 23% in terms of equal error rate (EER). The results indicate that multimodal pretraining can implicitly encode fine-grained facial inconsistencies indicative of morphing artifacts, enabling zero-shot forensic sensitivity. Our findings position open-source MLLMs as reproducible, interpretable, and competitive foundations for biometric security and forensic image analysis. This emergent capability also highlights new opportunities to develop state-of-the-art MAD systems through targeted fine-tuning or lightweight adaptation, further improving accuracy and efficiency while preserving interpretability. To support future research, all code and evaluation protocols will be released upon publication.
Related papers
- Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models [85.30893355216486]
We study how visual token redundancy evolves with different dMLLM architectures and tasks.<n>Our study reveals that visual redundancy emerges only in from-scratch dMLLMs while handling long-answer tasks.<n>Layer-skipping is promising for accelerating AR-to-diffusion dMLLMs, whereas progressive or late-step pruning is more effective for from-scratch dMLLMs.
arXiv Detail & Related papers (2025-11-19T04:13:36Z) - Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline [5.740204096484276]
We propose Foresee, a training-free MLLM-based pipeline tailored for image forgery analysis.<n>Foresee employs a type-prior-driven strategy and utilizes a Flexible Feature Detector module to handle copy-move manipulations.<n>Our approach simultaneously achieves superior localization accuracy and provides more comprehensive textual explanations.
arXiv Detail & Related papers (2025-11-17T14:49:57Z) - MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation [8.045296450065019]
Face Morphing Attack Detection (MAD) is a critical challenge in face recognition security.<n>This work explores a pure zero-shot approach to MAD by leveraging CLIP without any additional training or fine-tuning.<n>By aggregating the embeddings of diverse prompts, we better align the model's internal representations with the MAD task.
arXiv Detail & Related papers (2025-08-12T13:47:27Z) - FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities [76.46448367752944]
multimodal large language models (MLLMs) unify visual understanding and image generation within a single framework.<n>Most existing MLLMs rely on autore (AR) architectures, which impose inherent limitations on future development.<n>We introduce FUDOKI, a unified multimodal model purely based on discrete flow matching.
arXiv Detail & Related papers (2025-05-26T15:46:53Z) - Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems.<n>Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data.<n>We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z) - ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models [13.21801650767302]
Face Recognition Systems (FRS) are increasingly vulnerable to face-morphing attacks, prompting the development of Morphing Attack Detection (MAD) algorithms.<n>A key challenge in MAD lies in its limited generalizability to unseen data and its lack of explainability-critical for practical application environments.<n>This work explores a novel approach to MAD using zero-shot learning leveraged on Large Language Models (LLMs)
arXiv Detail & Related papers (2025-03-13T22:53:24Z) - Towards General Visual-Linguistic Face Forgery Detection(V2) [90.6600794602029]
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust.<n>Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection.<n>We propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification.
arXiv Detail & Related papers (2025-02-28T04:15:36Z) - Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks.
This paper proposes a novel NLP based approach for prompt injection detection.
It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z) - Making LLaMA SEE and Draw with SEED Tokenizer [69.1083058794092]
We introduce SEED, an elaborate image tokenizer that empowers Large Language Models with the ability to SEE and Draw.
With SEED tokens, LLM is able to perform scalable multimodal autoregression under its original training recipe.
SEED-LLaMA has exhibited compositional emergent abilities such as multi-turn in-context multimodal generation.
arXiv Detail & Related papers (2023-10-02T14:03:02Z) - Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting [37.161842673434705]
Face recognition systems are vulnerable to morphing attacks.
Most existing morphing attack detection methods require a large amount of training data and have only been tested on a few predefined attack models.
We propose to extend MAD from supervised learning to few-shot learning and from binary detection to multiclass fingerprinting.
arXiv Detail & Related papers (2022-10-27T14:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.