Related papers: Probing the Representational Power of Sparse Autoencoders in Vision Models

Probing the Representational Power of Sparse Autoencoders in Vision Models

URL: http://arxiv.org/abs/2508.11277v2
Date: Thu, 18 Sep 2025 16:45:16 GMT
Title: Probing the Representational Power of Sparse Autoencoders in Vision Models
Authors: Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng,
Abstract summary: Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs)<n>Despite their popularity with language models, SAEs remain understudied in the visual domain.<n>We provide an extensive evaluation the representational power of SAEs for vision models using a broad range of image-based tasks.
Score: 16.82204018033778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs). By learning to reconstruct activations from a sparse bottleneck layer, SAEs discover interpretable features from the high-dimensional internal representations of LLMs. Despite their popularity with language models, SAEs remain understudied in the visual domain. In this work, we provide an extensive evaluation the representational power of SAEs for vision models using a broad range of image-based tasks. Our experimental results demonstrate that SAE features are semantically meaningful, improve out-of-distribution generalization, and enable controllable generation across three vision model architectures: vision embedding models, multi-modal LMMs and diffusion models. In vision embedding models, we find that learned SAE features can be used for OOD detection and provide evidence that they recover the ontological structure of the underlying model. For diffusion models, we demonstrate that SAEs enable semantic steering through text encoder manipulation and develop an automated pipeline for discovering human-interpretable attributes. Finally, we conduct exploratory experiments on multi-modal LLMs, finding evidence that SAE features reveal shared representations across vision and language modalities. Our study provides a foundation for SAE evaluation in vision models, highlighting their strong potential improving interpretability, generalization, and steerability in the visual domain.

Related papers

Attention Guided Alignment in Efficient Vision-Language Models [56.20286899428444]
Large Vision-Language Models (VLMs) rely on effective multimodal alignment between pre-trained vision encoders and Large Language Models (LLMs)<n>This paper presents a comprehensive analysis of attention patterns in efficient VLMs.<n>We introduce Attention-Guided Efficient Vision-Language Models (AGE-VLM), a novel framework that enhances visual grounding through interleaved cross-attention layers.
arXiv Detail & Related papers (2025-11-21T21:36:48Z)
Can World Models Benefit VLMs for World Dynamics? [59.73433292793044]
We investigate the capabilities when world model priors are transferred into Vision-Language Models.<n>We name our best-performing variant Dynamic Vision Aligner (DyVA)<n>We find DyVA to surpass both open-source and proprietary baselines, achieving state-of-the-art or comparable performance.
arXiv Detail & Related papers (2025-10-01T13:07:05Z)
ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders [30.219733023958188]
Sparse Autoencoder (SAE) has emerged as a powerful tool for mechanistic interpretability of large language models.<n>We propose a semantically-guided SAE, called ProtSAE.<n>We show that ProtSAE learns more biologically relevant and interpretable hidden features compared to previous methods.
arXiv Detail & Related papers (2025-08-26T11:20:31Z)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [50.587868616659826]
We introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in vision representations.<n>Our experimental results reveal that SAEs trained on Vision-Language Models significantly enhance the monosemanticity of individual neurons.
arXiv Detail & Related papers (2025-04-03T17:58:35Z)
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models [27.806966289284528]
We present a unified framework using sparse autoencoders (SAEs) to discover human-interpretable visual features.<n>We show that SAEs can reliably identify and manipulate interpretable visual features without model re-training.
arXiv Detail & Related papers (2025-02-10T18:32:41Z)
Toward universal steering and monitoring of AI models [16.303681959333883]
We develop a scalable approach for extracting linear representations of general concepts in large-scale AI models.<n>We show how these representations enable model steering, through which we expose vulnerabilities, misaligned behaviors, and improve model capabilities.
arXiv Detail & Related papers (2025-02-06T01:41:48Z)
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs [61.143381152739046]
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.<n>Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations.<n>We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes.
arXiv Detail & Related papers (2024-06-24T17:59:42Z)
Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual tokens, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary. We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.