Fugu-MT 論文翻訳(概要): From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

論文の概要: From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

arxiv url: http://arxiv.org/abs/2603.24653v1
Date: Wed, 25 Mar 2026 17:59:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:47.916856
Title: From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
Title（参考訳）: 重みから概念へ:特異ベクトル分解によるCLIPのデータフリー解釈可能性
Authors: Francesco Gentile, Nicola Dall'Asen, Francesco Tonini, Massimiliano Mancini, Lorenzo Vaquero, Elisa Ricci,
Abstract要約: SITHは、CLIPのビジョントランスフォーマーを重み空間で解析する、完全にデータフリーで、トレーニング不要なフレームワークである。各アテンションヘッドに対して、その値出力行列を特異ベクトルに分解し、Compoを介して各行列を解釈する。 SITHは, 整合性, 忠実な頭蓋内説明を与え, 再現性, 解釈可能性実験により検証した。
参考スコア（独自算出の注目度）: 33.4228178732749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability methods predominantly rely on activations, making them dataset-dependent, vulnerable to data bias, and often restricted to coarse head-level explanations. We introduce SITH (Semantic Inspection of Transformer Heads), a fully data-free, training-free framework that directly analyzes CLIP's vision transformer in weight space. For each attention head, we decompose its value-output matrix into singular vectors and interpret each one via COMP (Coherent Orthogonal Matching Pursuit), a new algorithm that explains them as sparse, semantically coherent combinations of human-interpretable concepts. We show that SITH yields coherent, faithful intra-head explanations, validated through reconstruction fidelity and interpretability experiments. This allows us to use SITH for precise, interpretable weight-space model edits that amplify or suppress specific concepts, improving downstream performance without retraining. Furthermore, we use SITH to study model adaptation, showing how fine-tuning primarily reweights a stable semantic basis rather than learning entirely new features.
Abstract（参考訳）: 視覚言語モデルが大規模に展開されるにつれて、その内部メカニズムの理解がますます重要になる。既存の解釈可能性メソッドは主にアクティベーションに依存しており、データセットに依存し、データバイアスに弱い。 SITH(Semantic Inspection of Transformer Heads)は,CLIPの視覚変換を重み空間で直接解析する,データフリーでトレーニング不要なフレームワークである。各注目ヘッドに対して、その値出力行列を特異ベクトルに分解し、これらを人間の解釈可能な概念のスパースでセマンティックにコヒーレントな組み合わせとして説明するアルゴリズムであるCompo(Coherent Orthogonal Matching Pursuit)を介して解釈する。 SITHは, 整合性, 忠実な頭蓋内説明を与え, 再現性, 解釈可能性実験により検証した。これにより、SITHを精度よく解釈可能な重み空間モデル編集に利用し、特定の概念を増幅または抑制し、再トレーニングすることなく下流のパフォーマンスを向上させることができる。さらに、SITHを用いてモデル適応の研究を行い、微調整が完全に新しい特徴を学習するのではなく、主に安定したセマンティックベースを再重み付けすることを示す。

論文の概要: From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

関連論文リスト