Fugu-MT 論文翻訳(概要): UNBOX: Unveiling Black-box visual models with Natural-language

論文の概要: UNBOX: Unveiling Black-box visual models with Natural-language

arxiv url: http://arxiv.org/abs/2603.08639v1
Date: Mon, 09 Mar 2026 17:16:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:16.597916
Title: UNBOX: Unveiling Black-box visual models with Natural-language
Title（参考訳）: UNBOX: 自然言語でブラックボックスのビジュアルモデルを公開する
Authors: Simone Carnemolla, Chiara Russo, Simone Palazzo, Quentin Bouniot, Daniela Giordano, Zeynep Akata, Matteo Pennisi, Concetto Spampinato,
Abstract要約: 完全データフリー、勾配フリー、バックプロパゲーションフリー制約下でのクラスワイドモデル解離のためのフレームワークUNBOXを紹介する。我々は,画像Net-1K,Waterbirds,CelebAのUNBOXを,意味的忠実度テスト,視覚的特徴相関分析,スライス発見監査を通じて評価した。
参考スコア（独自算出の注目度）: 50.433977345786055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. Existing explanation methods assume white- or gray-box access or knowledge of the training distribution, making them unusable in these real-world settings. We introduce UNBOX, a framework for class-wise model dissection under fully data-free, gradient-free, and backpropagation-free constraints. UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities. The method produces human-interpretable text descriptors that maximally activate each class, revealing the concepts a model has implicitly learned, the training distribution it reflects, and potential sources of bias. We evaluate UNBOX on ImageNet-1K, Waterbirds, and CelebA through semantic fidelity tests, visual-feature correlation analyses and slice-discovery auditing. Despite operating under the strictest black-box constraints, UNBOX performs competitively with state-of-the-art white-box interpretability methods. This demonstrates that meaningful insight into a model's internal reasoning can be recovered without any internal access, enabling more trustworthy and accountable visual recognition systems.
Abstract（参考訳）: オープンワールドの視覚認識における信頼性を保証するには、解釈可能で公平で、分散シフトに対して堅牢なモデルが必要である。しかし、現代のビジョンシステムはプロプライエタリなブラックボックスAPIとしてデプロイされ、出力確率のみを公開し、アーキテクチャ、パラメータ、勾配、トレーニングデータを隠蔽する。この不透明さは、意味のある監査、バイアス検出、障害解析を防ぐ。既存の説明手法では、ホワイトボックスやグレーボックスへのアクセスやトレーニングディストリビューションの知識を前提としており、実際の環境では使用できない。完全データフリー、勾配フリー、バックプロパゲーションフリー制約下でのクラスワイドモデル解離のためのフレームワークUNBOXを紹介する。 UNBOXは大規模言語モデルとテキストから画像への拡散モデルを利用して、出力確率によって駆動される純粋意味探索としてアクティベーションの最大化をリキャストする。この方法は、モデルが暗黙的に学んだ概念、それが反映するトレーニング分布、潜在的なバイアス源を明らかにするために、各クラスを最大限に活性化する人間解釈可能なテキスト記述子を生成する。我々は,画像Net-1K,Waterbirds,CelebAのUNBOXを意味的忠実度テスト,視覚的特徴相関分析,スライス発見監査により評価した。最も厳格なブラックボックス制約の下で運用されているにもかかわらず、UNBOXは最先端のホワイトボックス解釈方法と競争的に動作する。これは、モデルの内部推論に対する有意義な洞察が、内部アクセスなしで回復できることを示し、より信頼性が高く説明可能な視覚認識システムを可能にする。

論文の概要: UNBOX: Unveiling Black-box visual models with Natural-language

関連論文リスト