Fugu-MT 論文翻訳(概要): Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models

論文の概要: Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models

arxiv url: http://arxiv.org/abs/2605.20337v1
Date: Tue, 19 May 2026 18:00:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.302742
Title: Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models
Title（参考訳）: Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models
Authors: Julien Colin, Lore Goetschalckx, Nuria Oliver, Thomas Serre,
Abstract要約: 視覚モデルの人間の解釈可能性を測定し比較するための枠組みを構築した。チャンスアンコールされたスコアリング関数は、すべてのモデルを共通のスケールに配置する。ファンデーションモデルは、監督されたモデルよりも、*無関係に解釈可能である。
参考スコア（独自算出の注目度）: 24.250649573452666
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How interpretable are the features of leading vision models? The question is increasingly pressing as these models move from research benchmarks into high-stakes deployments, yet existing methods cannot answer it reliably. We close this gap with a framework for measuring and comparing the human interpretability of vision models, built around two complementary psychophysics protocols: (1) localizability -- can an observer predict where a feature fires on a novel image? -- and (2) nameability -- can an observer accurately describe what the feature represents? Features are recovered via sparse autoencoders, and a chance-anchored scoring function places every model on a common scale. Applying the framework to six vision transformers -- two supervised ViTs and four foundation models (DINOv2, DINOv3, CLIP, SigLIP) -- we collected more than $15{,}000$ behavioral responses, analyzing the $13{,}400$ responses from the $377$ participants who passed our pre-specified quality checks. Foundation models are consistently *less* interpretable than their supervised counterparts, and the gap is not a capability tradeoff: interpretability does not correlate with downstream task performance on any benchmark we examine. What does correlate is the locality of a feature's activations and coarse-grained semantic alignment with humans -- models with focal activations and representations that reflect the world's broad categorical structure produce more interpretable features, whereas fine-grained perceptual alignment does not. The two protocols yield strongly correlated rankings and share the same predictors, establishing interpretability as an independent, measurable dimension of representation quality -- and, surprisingly, one on which every foundation model we tested falls below the supervised baselines that came before. Capability alone cannot close that gap; locality and coarse-grained alignment can.
Abstract（参考訳）: 主要な視覚モデルの特徴はどの程度解釈可能か? これらのモデルが研究ベンチマークから高い評価のデプロイメントに移行するにつれ、疑問はますます強くなっている。視覚モデルの人間の解釈可能性を測定するための枠組みによって、このギャップを埋める。(1) 局所性 -- 観察者は、新しい画像に特徴が発火する場所を予測できるか?(2) 命名可能性 -- は、その特徴が何を意味するのかを正確に記述できるのか? 機能はスパースオートエンコーダを介して回収され、チャンスアンコールされたスコアリング関数は、すべてのモデルを共通のスケールで配置する。フレームワークを6つのビジョントランスフォーマー – 教師付き ViT と4つの基盤モデル (DINOv2,DINOv3,CLIP,SigLIP) に適用することで,事前に特定された品質チェックをパスした377ドルの参加者から,13ドル,}400ドルのレスポンスを解析して,15ドル以上の行動応答を収集しました。ファンデーションモデルは、監督されたモデルよりも**解釈可能で、ギャップは機能的なトレードオフではありません。特徴の活性化の局所性と人間との粗い粒度のセマンティックアライメント -- 世界の広いカテゴリー構造を反映した焦点のアクティベーションと表現モデルは、より解釈可能な特徴を生み出すが、微粒度の知覚アライメントはそうではない。 2つのプロトコルは、強く相関したランク付けと、同じ予測子を共有し、独立して測定可能な表現品質の次元として解釈可能性を確立します。能力だけではそのギャップを埋めることはできない。

関連論文リスト

Epistemic Observability in Language Models [0.0]
製造時に高い信頼性を報告できるモデルがあることがわかりました。正式な仮定では、これは能力ギャップではなく観察的なギャップである。我々は,計算副産物を輸出することで不合理性から逃れるテンソルインタフェースを構築した。
論文参考訳（メタデータ） (2026-03-20T21:59:34Z)
Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models [50.99097734404912]
RLフレンドリなモデルでは, クラス内コンパクト性やクラス間分離が, 正誤応答に対する確率割当に現れることを示す。 6つの数学ベンチマークによる実験では、すべてのモデルファミリで一貫した改善が見られ、AIME24では5.9ポイントまで向上した。
論文参考訳（メタデータ） (2026-01-11T13:34:44Z)
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent [58.90049897180927]
視覚モデルにおける視覚的特徴の意図しない依存を検出するための自動フレームワークを提案する。自己反射エージェントは、モデルが依存する可能性のある視覚特性に関する仮説を生成し、テストする。我々は,視覚特性の多様さを示すために設計された130モデルの新しいベンチマークに対して,我々のアプローチを評価した。
論文参考訳（メタデータ） (2025-10-24T17:59:02Z)
Test-Time Consistency in Vision Language Models [26.475993408532304]
VLM(Vision-Language Models)は、様々なマルチモーダルタスクにおいて優れたパフォーマンスを実現している。 MM-R3のような最近のベンチマークでは、最先端のVLMでさえ意味論的に等価な入力にまたがって分岐予測をもたらすことが強調されている。教師付き再学習なしにセマンティックな一貫性を高める,シンプルで効果的なテスト時間一貫性フレームワークを提案する。
論文参考訳（メタデータ） (2025-06-27T17:09:44Z)
Unsupervised Model Diagnosis [49.36194740479798]
本稿では,ユーザガイドを使わずに,意味論的対実的説明を生成するために,Unsupervised Model Diagnosis (UMO)を提案する。提案手法は意味論における変化を特定し可視化し,その変化を広範囲なテキストソースの属性と照合する。
論文参考訳（メタデータ） (2024-10-08T17:59:03Z)
Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
InterpretCCは、人間の理解の容易さと忠実さの説明を最適化する、本質的に解釈可能なニューラルネットワークのファミリーである。本報告では,InterpretCCの説明は,他の本質的な解釈可能なアプローチよりも,行動性や有用性が高いことを示す。
論文参考訳（メタデータ） (2024-02-05T11:55:50Z)
DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision [73.80009454050858]
この研究は、DualFairと呼ばれる自己教師型モデルを提示し、学習された表現から性別や人種などのセンシティブな属性をデバイアスすることができる。我々のモデルは、グループフェアネスと対実フェアネスという2つのフェアネス基準を共同で最適化する。
論文参考訳（メタデータ） (2023-03-15T07:13:54Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
自動対物生成は、生成した対物インスタンスのいくつかの側面を考慮すべきである。本稿では, 対実例生成のための新しい枠組みを提案する。
論文参考訳（メタデータ） (2022-05-20T15:02:53Z)
DoLFIn: Distributions over Latent Features for Interpretability [8.807587076209568]
ニューラルネットワークモデルにおける解釈可能性を実現するための新しい戦略を提案する。我々のアプローチは、確率を中心量として使う成功に基づいている。 DoLFInは解釈可能なソリューションを提供するだけでなく、古典的なCNNやBiLSTMテキスト分類よりも若干優れています。
論文参考訳（メタデータ） (2020-11-10T18:32:53Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
ニューラルネットワークの一般化能力を改善するための補助的学習目標を提案する。我々は、異なるラベルを持つ最小差の例のペア、すなわち反ファクトまたはコントラストの例を使用し、タスクの根底にある因果構造を示す信号を与える。このテクニックで訓練されたモデルは、配布外テストセットのパフォーマンスを向上させる。
論文参考訳（メタデータ） (2020-04-20T02:47:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。