Fugu-MT 論文翻訳(概要): PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution

論文の概要: PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution

arxiv url: http://arxiv.org/abs/2606.18008v1
Date: Tue, 16 Jun 2026 14:53:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.496805
Title: PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution
Title（参考訳）: PhaseWin: 忠実な視覚属性の効率的な探索アルゴリズム
Authors: Zihan Gu, Ruoyu Chen, Junchi Zhang, Li Liu, Xiaochun Cao, Hua Zhang,
Abstract要約: 忠実な視覚帰属のための効率的なサブセット探索アルゴリズムであるフェイズウィンを提案する。 PhaseWinは、greedy領域の選択をフェーズドウィンドウ検索手順に再編成する。グローバルな候補スクリーニング、適応プルーニング、および局所的なウィンドウリファインメントを交互に行う。
参考スコア（独自算出の注目度）: 47.17749653856941
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual attribution is a fundamental tool for interpreting modern vision and vision-language models, particularly when their decisions must be inspected, diagnosed, or audited. Its goal is to explain how a model's decision depends on local regions of the visual input, typically by assigning an importance ordering over candidate image regions. Given an image partitioned into $n$ regions, faithful attribution can be cast as an ordered subset-search problem, in which progressively inserting the selected regions should recover the target model response as early as possible. Exhaustive search over region subsets incurs exponential cost, while the widely used greedy search still requires a quadratic number of model evaluations, because every selection step rescores all remaining candidates. We propose PhaseWin, an efficient subset-search algorithm for faithful visual attribution. PhaseWin reorganizes greedy region selection into a phased window-search procedure: rather than re-evaluating the full candidate set at every step, it alternates between global candidate screening, adaptive pruning, and localized window refinement, while preserving the essential region-ranking behavior of greedy search. We analyze PhaseWin under monotone evidence-accumulation conditions and show that, under feature-level structural assumptions, it attains controllable linear evaluation complexity together with near-greedy faithfulness guarantees. Extensive experiments on image classification, object detection, visual grounding, and image captioning show that, among all compared attribution methods, PhaseWin reaches high faithfulness with the fewest forward passes, empirically realizing the predicted reduction from $O(n^2)$ to $O(n)$. The code is available at https://github.com/Qihuai27/phasewin-va.
Abstract（参考訳）: 視覚属性は、現代の視覚と視覚言語モデルを解釈するための基本的なツールであり、特にその決定を検査、診断、監査しなければならない場合である。その目的は、モデルの決定が視覚入力の局所領域にどのように依存するかを説明することである。画像が$n$の領域に分割された場合、忠実な帰属は順序付けられたサブセット探索問題としてキャストされ、選択された領域を段階的に挿入すると、できるだけ早くターゲットモデルの応答が回復される。領域部分集合に対する排他的探索は指数的なコストを発生させるが、広く使われている欲求探索は、すべての選択ステップが残りの候補を再スコアするので、2次的なモデル評価を必要とする。忠実な視覚帰属のための効率的なサブセット探索アルゴリズムであるフェイズウィンを提案する。フェーズWinは、greedy領域の選択を段階的なウィンドウ検索手順に再編成する: 全てのステップで設定された完全な候補を再評価する代わりに、グローバルな候補スクリーニング、適応プルーニング、局所的なウィンドウリファインメントを交互に行い、greedy検索の本質的な領域レベルの動作を保存する。単調なエビデンス・蓄積条件下でのフェイズウィンドの解析を行い,特徴レベルの構造的仮定の下では,制御可能な線形評価の複雑さと,ほぼ灰色の忠実性の保証が得られることを示した。画像分類, 物体検出, 視覚的接地, 画像キャプションの広範な実験により, 相Winは最も少ない前方通過で高い忠実度を達成し, 予測されたO(n^2)$から$O(n)$への還元を実証的に実現した。コードはhttps://github.com/Qihuai27/phasewin-vaで公開されている。

論文の概要: PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution

関連論文リスト