SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection
with Multimodal Large Language Models
- URL: http://arxiv.org/abs/2402.04178v1
- Date: Tue, 6 Feb 2024 17:31:36 GMT
- Title: SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection
with Multimodal Large Language Models
- Authors: Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He,
Jun Wan, Changsheng Chen, Zitong Yu, Xiaochun Cao
- Abstract summary: We introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and forgery detection.
We design true/false and multiple-choice questions to evaluate multimodal face data in these two face security tasks.
The results indicate that MLLMs hold substantial potential in the face security domain.
- Score: 63.946809247201905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal large language models (MLLMs) have demonstrated remarkable
problem-solving capabilities in various vision fields (e.g., generic object
recognition and grounding) based on strong visual semantic representation and
language reasoning ability. However, whether MLLMs are sensitive to subtle
visual spoof/forged clues and how they perform in the domain of face attack
detection (e.g., face spoofing and forgery detection) is still unexplored. In
this paper, we introduce a new benchmark, namely SHIELD, to evaluate the
ability of MLLMs on face spoofing and forgery detection. Specifically, we
design true/false and multiple-choice questions to evaluate multimodal face
data in these two face security tasks. For the face anti-spoofing task, we
evaluate three different modalities (i.e., RGB, infrared, depth) under four
types of presentation attacks (i.e., print attack, replay attack, rigid mask,
paper mask). For the face forgery detection task, we evaluate GAN-based and
diffusion-based data with both visual and acoustic modalities. Each question is
subjected to both zero-shot and few-shot tests under standard and chain of
thought (COT) settings. The results indicate that MLLMs hold substantial
potential in the face security domain, offering advantages over traditional
specific models in terms of interpretability, multimodal flexible reasoning,
and joint face spoof and forgery detection. Additionally, we develop a novel
Multi-Attribute Chain of Thought (MA-COT) paradigm for describing and judging
various task-specific and task-irrelevant attributes of face images, which
provides rich task-related knowledge for subtle spoof/forged clue mining.
Extensive experiments in separate face anti-spoofing, separate face forgery
detection, and joint detection tasks demonstrate the effectiveness of the
proposed MA-COT. The project is available at
https$:$//github.com/laiyingxin2/SHIELD
Related papers
- A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning [9.786907179872815]
The potential of vision and language remains underexplored in face forgery detection.
There is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task.
We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap.
arXiv Detail & Related papers (2024-10-01T08:16:40Z) - Pluralistic Salient Object Detection [108.74650817891984]
We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image.
We present two new SOD datasets "DUTS-MM" and "DUS-MQ", along with newly designed evaluation metrics.
arXiv Detail & Related papers (2024-09-04T01:38:37Z) - COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time.
Most face forgery recognition methods can only process one face at a time.
We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications.
We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z) - MAFER: a Multi-resolution Approach to Facial Expression Recognition [9.878384185493623]
We propose a two-step learning procedure, named MAFER, to train Deep Learning models tasked with recognizing facial expressions.
A relevant feature of MAFER is that it is task-agnostic, i.e., it can be used complementarily to other objective-related techniques.
arXiv Detail & Related papers (2021-05-06T07:26:58Z) - Face Anti-Spoofing with Human Material Perception [76.4844593082362]
Face anti-spoofing (FAS) plays a vital role in securing the face recognition systems from presentation attacks.
We rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception.
We propose the Bilateral Convolutional Networks (BCN), which is able to capture intrinsic material-based patterns.
arXiv Detail & Related papers (2020-07-04T18:25:53Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.