Fugu-MT 論文翻訳(概要): Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks

論文の概要: Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks

arxiv url: http://arxiv.org/abs/2509.23926v1
Date: Sun, 28 Sep 2025 15:02:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.537027
Title: Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks
Title（参考訳）: ディープ・ビジョン・ネットワークにおけるエンコーディング・デコード・ディレクティブ・ペアの学習
Authors: Alexandros Doumanoglou, Kurt Driessens, Dimitrios Zarpalas,
Abstract要約: Empical evidence shows that deep vision network presented concept as direction in latent space, vectors called concept embeddeds。与えられたパッチに対して、複数の潜伏因子は、概念埋め込みを係数として線形に結合することでコンパクト表現に符号化される。潜在因子は、デコード方向と呼ばれるベクトルであるフィルタで内部積を介して回収することができる。
参考スコア（独自算出の注目度）: 43.473390101413166
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Empirical evidence shows that deep vision networks represent concepts as directions in latent space, vectors we call concept embeddings. Each concept has a latent factor-a scalar-indicating its presence in an input patch. For a given patch, multiple latent factors are encoded into a compact representation by linearly combining concept embeddings, with the factors as coefficients. Since these embeddings enable such encoding, we call them encoding directions. A latent factor can be recovered via the inner product with a filter, a vector we call a decoding direction. These encoding-decoding direction pairs are not directly accessible, but recovering them helps open the black box of deep networks, enabling understanding, debugging, and improving models. Decoder directions attribute meaning to latent codes, while encoding directions assess concept influence on predictions, with both enabling model correction by unlearning irrelevant concepts. Unlike prior matrix decomposition, autoencoder, or dictionary learning methods that rely on feature reconstruction, we propose a new perspective: decoding directions are identified via directional clustering of activations, and encoding directions are estimated with signal vectors under a probabilistic view. We further leverage network weights through a novel technique, Uncertainty Region Alignment, which reveals interpretable directions affecting predictions. Our analysis shows that (a) on synthetic data, our method recovers ground-truth direction pairs; (b) on real data, decoding directions map to monosemantic, interpretable concepts and outperform unsupervised baselines; and (c) signal vectors faithfully estimate encoding directions, validated via activation maximization. Finally, we demonstrate applications in understanding global model behavior, explaining individual predictions, and intervening to produce counterfactuals or correct errors.
Abstract（参考訳）: 実験的な証拠は、ディープ・ビジョン・ネットワークが概念を潜在空間の方向、つまり概念埋め込み(concept embeddeds)と呼ぶベクトルとして表すことを示している。各概念は、入力パッチにその存在を示す潜在因子、スカラーを持つ。与えられたパッチに対して、複数の潜伏因子は、概念埋め込みを係数として線形に結合することでコンパクト表現に符号化される。これらの埋め込みはそのような符号化を可能にするので、符号化方向と呼ぶ。潜在因子は、デコード方向と呼ばれるベクトルであるフィルタで内部積を介して回収することができる。これらのエンコーディング/デコーディングの方向ペアは直接アクセスできないが、それらを復元することで深層ネットワークのブラックボックスを開き、理解、デバッグ、モデルの改善を可能にする。デコーダの指示は遅延符号に意味があり、エンコーディングの指示は予測に概念の影響を評価し、どちらも無関係な概念を学習することでモデル修正を可能にする。特徴再構成に依存する以前の行列分解やオートエンコーダ、辞書学習とは異なり、復号方向はアクティベーションの方向クラスタリングによって同定され、符号化方向は確率的ビューの下で信号ベクトルで推定される。さらに、予測に影響を及ぼす解釈可能な方向を明らかにする新しい手法である不確かさ領域アライメントにより、ネットワークの重みをさらに活用する。私たちの分析は (a) 合成データに基づき, 地対-地対を復元する。 b) 実データ上では、デコード方向は、単意味で解釈可能な概念にマップされ、教師なしのベースラインを上回る。 (c)信号ベクトルは、アクティベーション最大化により検証された符号化方向を忠実に推定する。最後に,大域的モデル行動の理解,個人の予測の説明,介入による偽造行為や誤りの正しさを実証する。

論文の概要: Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks

関連論文リスト