Fugu-MT 論文翻訳(概要): MLLM-based Textual Explanations for Face Comparison

論文の概要: MLLM-based Textual Explanations for Face Comparison

arxiv url: http://arxiv.org/abs/2603.16629v1
Date: Tue, 17 Mar 2026 15:01:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.3585
Title: MLLM-based Textual Explanations for Face Comparison
Title（参考訳）: MLLMによる顔比較のためのテキスト記述法
Authors: Redwan Sony, Anil K Jain, Ross Arun,
Abstract要約: 本研究では,MLLMが生成した顔認証タスクに関する説明を系統的に分析する。以上の結果から,MLLMが正しい検証判断を下しても,伴う説明は検証不能あるいは幻覚的顔面属性に依存することが多いことが示唆された。
参考スコア（独自算出の注目度）: 9.423763930383755
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multimodal Large Language Models (MLLMs) have recently been proposed as a means to generate natural-language explanations for face recognition decisions. While such explanations facilitate human interpretability, their reliability on unconstrained face images remains underexplored. In this work, we systematically analyze MLLM-generated explanations for the unconstrained face verification task on the challenging IJB-S dataset, with a particular focus on extreme pose variation and surveillance imagery. Our results show that even when MLLMs produce correct verification decisions, the accompanying explanations frequently rely on non-verifiable or hallucinated facial attributes that are not supported by visual evidence. We further study the effect of incorporating information from traditional face recognition systems, viz., scores and decisions, alongside the input images. Although such information improves categorical verification performance, it does not consistently lead to faithful explanations. To evaluate the explanations beyond decision accuracy, we introduce a likelihood-ratio-based framework that measures the evidential strength of textual explanations. Our findings highlight fundamental limitations of current MLLMs for explainable face recognition and underscore the need for a principled evaluation of reliable and trustworthy explanations in biometric applications. Code is available at https://github.com/redwankarimsony/LR-MLLMFR-Explainability.
Abstract（参考訳）: マルチモーダル大規模言語モデル(MLLM)は、顔認識決定のための自然言語説明を生成する手段として最近提案されている。このような説明は人間の解釈可能性を促進するが、制約のない顔画像に対する信頼性は未解明のままである。本研究では,難易度の高いIJB-Sデータセット上で,MLLMが生成した顔認証タスクに関する説明を系統的に分析し,極端なポーズの変動と監視画像に焦点を当てた。以上の結果から,MLLMが正しい検証判断を下す場合でも,視覚的証拠に支えられていない非検証的・幻覚的顔面特性にしばしば依存することが示唆された。さらに、入力画像とともに、従来の顔認識システムからの情報、ビズ、スコア、決定を組み込むことの効果について検討する。このような情報は分類的検証性能を向上させるが、一貫して忠実な説明につながるわけではない。判定精度以上の説明を評価するために,テキスト説明の明らかな強度を測定する可能性比に基づくフレームワークを提案する。本研究は,現在のMLLMの基本的限界を浮き彫りにし,バイオメトリックス応用における信頼性・信頼性評価の必要性を浮き彫りにした。コードはhttps://github.com/redwankarimsony/LR-MLLMFR-Explainabilityで公開されている。

論文の概要: MLLM-based Textual Explanations for Face Comparison

関連論文リスト