Fugu-MT 論文翻訳(概要): Probing the Representational Power of Sparse Autoencoders in Vision Models

論文の概要: Probing the Representational Power of Sparse Autoencoders in Vision Models

arxiv url: http://arxiv.org/abs/2508.11277v1
Date: Fri, 15 Aug 2025 07:29:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-18 14:51:23.780232
Title: Probing the Representational Power of Sparse Autoencoders in Vision Models
Title（参考訳）: 視覚モデルにおけるスパースオートエンコーダの表現力の探索
Authors: Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng,
Abstract要約: スパースオートエンコーダ(SAE)は,大規模言語モデル(LLM)の隠蔽状態を解釈するための一般的なツールとして登場した。言語モデルに人気があるにもかかわらず、SAEは依然として視覚領域で研究されている。我々は、幅広い画像ベースタスクを用いて、視覚モデルに対するSAEの表現力を広範囲に評価する。
参考スコア（独自算出の注目度）: 6.7161402871287645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs). By learning to reconstruct activations from a sparse bottleneck layer, SAEs discover interpretable features from the high-dimensional internal representations of LLMs. Despite their popularity with language models, SAEs remain understudied in the visual domain. In this work, we provide an extensive evaluation the representational power of SAEs for vision models using a broad range of image-based tasks. Our experimental results demonstrate that SAE features are semantically meaningful, improve out-of-distribution generalization, and enable controllable generation across three vision model architectures: vision embedding models, multi-modal LMMs and diffusion models. In vision embedding models, we find that learned SAE features can be used for OOD detection and provide evidence that they recover the ontological structure of the underlying model. For diffusion models, we demonstrate that SAEs enable semantic steering through text encoder manipulation and develop an automated pipeline for discovering human-interpretable attributes. Finally, we conduct exploratory experiments on multi-modal LLMs, finding evidence that SAE features reveal shared representations across vision and language modalities. Our study provides a foundation for SAE evaluation in vision models, highlighting their strong potential improving interpretability, generalization, and steerability in the visual domain.
Abstract（参考訳）: スパースオートエンコーダ(SAE)は、大規模言語モデル(LLM)の隠蔽状態を解釈するための一般的なツールとして登場した。スパースボトルネック層からの活性化を再構築することにより、SAEはLLMの高次元内部表現から解釈可能な特徴を発見する。言語モデルに人気があるにもかかわらず、SAEは依然として視覚領域で研究されている。本研究では,広い範囲のイメージベースタスクを用いて,視覚モデルに対するSAEの表現力について広範囲に評価する。実験の結果,SAEの機能は意味論的に意味を持ち,分布外一般化を改善し,視覚モデルアーキテクチャ,マルチモーダルLMM,拡散モデルという3つのアーキテクチャで制御可能な生成を可能にすることがわかった。視覚埋め込みモデルでは、学習されたSAE特徴をOOD検出に利用することができ、基礎となるモデルの存在論的構造を復元する証拠を提供する。拡散モデルでは,SAEがテキストエンコーダ操作によるセマンティックステアリングを可能にし,人間の解釈可能な属性を発見するための自動パイプラインを開発することを実証する。最後に,多モードLLMの探索実験を行い,SAE特徴が視覚と言語モダリティの共通表現を明らかにする証拠を見出した。本研究は、視覚モデルにおけるSAE評価の基礎を提供し、視覚領域における解釈可能性、一般化、操舵性を向上させる可能性を強調した。

論文の概要: Probing the Representational Power of Sparse Autoencoders in Vision Models

関連論文リスト