Fugu-MT 論文翻訳(概要): ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

論文の概要: ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

arxiv url: http://arxiv.org/abs/2604.22841v1
Date: Tue, 21 Apr 2026 12:46:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:06.979465
Title: ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers
Title（参考訳）: ATTN-FIQA:視覚変換器による意図に基づく顔画像品質評価
Authors: Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros,
Abstract要約: 顔画像品質評価(FIQA)は、顔サンプルの認識能力を評価することを目的としており、信頼性の高い顔認識(FR)システムに必須である。近年の研究では、これらのアーキテクチャは本質的に、空間的重要性を自然にコードする注意パターンを持つ有能な学習者として機能していることが強調されている。 ATTN-FIQAは,事前学習したビジョントランスフォーマーに基づく顔認識モデルから得られたソフトマックス前注目スコアが品質指標として機能するかどうかを調査する,新しいトレーニングフリーアプローチである。
参考スコア（独自算出の注目度）: 19.095360516976847
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has focused on the use of Vision Transformers. Recent studies highlighted that these architectures inherently function as saliency learners with attention patterns naturally encoding spatial importance. This work proposes ATTN-FIQA, a novel training-free approach that investigates whether pre-softmax attention scores from pre-trained Vision Transformer-based face recognition models can serve as quality indicators. We hypothesize that attention magnitudes intrinsically encode quality: high-quality images with discriminative facial features enable strong query-key alignments producing focused, high-magnitude attention patterns, while degraded images generate diffuse, low-magnitude patterns. ATTN-FIQA extracts pre-softmax attention matrices from the final transformer block, aggregate multi-head attention information across all patches, and compute image-level quality scores through simple averaging, requiring only a single forward pass through pre-trained models without architectural modifications, backpropagation, or additional training. Through comprehensive evaluation across eight benchmark datasets and four FR models, this work demonstrates that attention-based quality scores effectively correlate with face image quality and provide spatial interpretability, revealing which facial regions contribute most to quality determination.
Abstract（参考訳）: 顔画像品質評価(FIQA)は、顔サンプルの認識能力を評価することを目的としており、信頼性の高い顔認識(FR)システムに必須である。既存のアプローチでは、複数のフォワードパス、バックプロパゲーション、追加のトレーニングなどの計算コストのかかる手順が必要であり、ビジョントランスフォーマーの使用に焦点が当てられているのは最近の研究のみである。近年の研究では、これらのアーキテクチャは本質的に、空間的重要性を自然にコードする注意パターンを持つ有能な学習者として機能していることが強調されている。 ATTN-FIQAは,事前学習したビジョントランスフォーマーに基づく顔認識モデルから得られたソフトマックス前注目スコアが品質指標として機能するかどうかを調査する,新しいトレーニングフリーアプローチである。顔の特徴を識別する高品質な画像は、集中した高輝度の注目パターンを生成する強力なクエリキーアライメントを可能にする一方、劣化した画像は、拡散して低輝度のパターンを生成する。 ATTN-FIQAは、最終トランスフォーマーブロックから事前の注意行列を抽出し、全パッチにわたってマルチヘッドの注意情報を集約し、単純な平均化によって画像レベルの品質スコアを計算する。 8つのベンチマークデータセットと4つのFRモデルにわたる総合的な評価を通じて、注意に基づく品質スコアが顔画像の品質と効果的に相関し、空間的解釈可能性を提供し、どの顔領域が品質決定に最も寄与しているかを明らかにする。

論文の概要: ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

関連論文リスト