Fugu-MT 論文翻訳(概要): FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

論文の概要: FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

arxiv url: http://arxiv.org/abs/2505.09415v1
Date: Wed, 14 May 2025 14:10:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-15 21:44:09.48812
Title: FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
Title（参考訳）: FaceShield: マルチモーダルな大規模言語モデルによる説明可能な顔の偽造
Authors: Hongyang Wang, Yichen Shi, Zhuofu Tao, Yuhao Gao, Liepiao Zhang, Xun Lin, Jun Feng, Xiaochen Yuan, Zitong Yu, Xiaochun Cao,
Abstract要約: 対面防止(FAS)は、提示攻撃から顔認識システムを保護するために不可欠である。現在、FASタスク用に特別に設計された、普遍的で包括的なMLLMとデータセットは存在しない。 FASのためのMLLMであるFaceShieldと、それに対応する事前学習および教師付き微調整データセットを提案する。命令データセット、プロトコル、コードはまもなくリリースされます。
参考スコア（独自算出の注目度）: 51.858371492494456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face anti-spoofing (FAS) is crucial for protecting facial recognition systems from presentation attacks. Previous methods approached this task as a classification problem, lacking interpretability and reasoning behind the predicted results. Recently, multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and decision-making in visual tasks. However, there is currently no universal and comprehensive MLLM and dataset specifically designed for FAS task. To address this gap, we propose FaceShield, a MLLM for FAS, along with the corresponding pre-training and supervised fine-tuning (SFT) datasets, FaceShield-pre10K and FaceShield-sft45K. FaceShield is capable of determining the authenticity of faces, identifying types of spoofing attacks, providing reasoning for its judgments, and detecting attack areas. Specifically, we employ spoof-aware vision perception (SAVP) that incorporates both the original image and auxiliary information based on prior knowledge. We then use an prompt-guided vision token masking (PVTM) strategy to random mask vision tokens, thereby improving the model's generalization ability. We conducted extensive experiments on three benchmark datasets, demonstrating that FaceShield significantly outperforms previous deep learning models and general MLLMs on four FAS tasks, i.e., coarse-grained classification, fine-grained classification, reasoning, and attack localization. Our instruction datasets, protocols, and codes will be released soon.
Abstract（参考訳）: 対面防止(FAS)は、提示攻撃から顔認識システムを保護するために不可欠である。従来の手法では、予測された結果の背後にある解釈可能性や推論を欠いた分類問題として、このタスクにアプローチしていた。近年,マルチモーダル大規模言語モデル (MLLM) は視覚タスクにおける知覚,推論,意思決定において強力な能力を示している。しかし、現在、FASタスク用に特別に設計された、普遍的で包括的なMLLMとデータセットは存在しない。このギャップを解決するために,FASのためのMLLMであるFaceShieldと,それに対応するSFTデータセット,FaceShield-pre10K,FaceShield-sft45Kを提案する。 FaceShieldは、顔の正当性を判断し、攻撃の種類を特定し、判断の理由を与え、攻撃領域を検出することができる。具体的には、従来の知識に基づいて、元の画像と補助情報の両方を組み込んだ、スプーフ・アウェア・ビジョン・インセプション(SAVP)を採用する。次に、プロンプト誘導型視覚トークンマスキング(PVTM)戦略を用いてランダムなマスク視覚トークンを探索し、モデルの一般化能力を向上する。その結果,FaceShieldは4つのFASタスク,すなわち粗粒度分類,きめ細粒度分類,推論,アタックローカライゼーションにおいて,従来のディープラーニングモデルと一般的なMLLMを大幅に上回っていることがわかった。命令データセット、プロトコル、コードはまもなくリリースされます。

論文の概要: FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

関連論文リスト