Related papers: Sparse Autoencoders are Topic Models

Sparse Autoencoders are Topic Models

URL: http://arxiv.org/abs/2511.16309v1
Date: Thu, 20 Nov 2025 12:37:54 GMT
Title: Sparse Autoencoders are Topic Models
Authors: Leander Girrbach, Zeynep Akata,
Abstract summary: We show thatparse autoencoders (SAEs) can be naturally understood as topic models.<n>We introduce SAE-TM, a topic modeling framework that trains an SAE to learn reusable topic atoms.<n>We analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints.
Score: 47.62628339598771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We extend Latent Dirichlet Allocation to embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. Based on this, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints. Our work positions SAEs as effective tools for large-scale thematic analysis across modalities. Code and data will be released upon publication.

Related papers

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit [16.056849135589324]
Analyzing large-scale text corpora is a core challenge in machine learning.<n>We propose using sparse autoencoders (SAEs) to create SAE embeddings.<n>We show that SAE embeddings are more cost-effective and reliable than LLMs and more controllable than dense embeddings.
arXiv Detail & Related papers (2025-12-10T21:26:24Z)
ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders [30.219733023958188]
Sparse Autoencoder (SAE) has emerged as a powerful tool for mechanistic interpretability of large language models.<n>We propose a semantically-guided SAE, called ProtSAE.<n>We show that ProtSAE learns more biologically relevant and interpretable hidden features compared to previous methods.
arXiv Detail & Related papers (2025-08-26T11:20:31Z)
Segment Any Vehicle: Semantic and Visual Context Driven SAM and A Benchmark [12.231630639022335]
We propose SAV, a novel framework comprising three core components: a SAM-based encoder-decoder, a vehicle part knowledge graph, and a context sample retrieval encoding module.<n>The knowledge graph explicitly models the spatial and geometric relationships among vehicle parts through a structured ontology, effectively encoding prior structural knowledge.<n>We introduce a new large-scale benchmark dataset for vehicle part segmentation, named VehicleSeg10K, which contains 11,665 high-quality pixel-level annotations.
arXiv Detail & Related papers (2025-08-06T09:46:49Z)
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies [3.709351921096894]
We propose FaithfulSAE, a method that trains SAEs on the model's own synthetic dataset.<n>We demonstrate that training SAEs on less-OOD instruction datasets results in SAEs being more stable across seeds.
arXiv Detail & Related papers (2025-06-21T10:18:25Z)
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders [115.34050914216665]
Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models. We introduce a suite of 256 SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features. We assess the generalizability of SAEs trained on base models to longer contexts and fine-tuned models.
arXiv Detail & Related papers (2024-10-27T17:33:49Z)
Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z)
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning [19.43430577960824]
This paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios.
arXiv Detail & Related papers (2023-09-12T20:51:07Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
FETA: Towards Specializing Foundation Models for Expert Task Applications [49.57393504125937]
Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. We show in this paper that FMs still have poor out-of-the-box performance on expert tasks. We propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation.
arXiv Detail & Related papers (2022-09-08T08:47:57Z)
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis [67.41078214475341]
We propose Dynamic Re-weighting BERT (DR-BERT) to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence. We then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA)
arXiv Detail & Related papers (2022-03-30T14:48:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.