Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
- URL: http://arxiv.org/abs/2506.05239v1
- Date: Thu, 05 Jun 2025 16:57:58 GMT
- Title: Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
- Authors: Valérie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba,
- Abstract summary: Sparse autoencoders (SAEs) have recently become central tools for interpretability.<n>This paper evaluates SAEs in a controlled setting using MNIST.<n>We introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE)
- Score: 16.996218963146788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse autoencoders (SAEs) have recently become central tools for interpretability, leveraging dictionary learning principles to extract sparse, interpretable features from neural representations whose underlying structure is typically unknown. This paper evaluates SAEs in a controlled setting using MNIST, which reveals that current shallow architectures implicitly rely on a quasi-orthogonality assumption that limits the ability to extract correlated features. To move beyond this, we introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE), enabling the residual-guided extraction of correlated features that arise in hierarchical settings such as handwritten digit generation while guaranteeing monotonic improvement of the reconstruction as more atoms are selected.
Related papers
- Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders [0.0]
We present a novel SAE-based architecture tailored for text classification.<n>We benchmark this architecture against established methods such as ConceptShap, Independent Component Analysis, and other SAE-based concept extraction techniques.<n>Our empirical results show that our architecture improves both the causality and interpretability of the extracted features.
arXiv Detail & Related papers (2025-06-30T15:18:50Z) - From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit [16.996218963146788]
We show that MP-SAE unrolls its encoder into a sequence of residual-guided steps, allowing it to capture hierarchical and nonlinearly accessible features.<n>We also show that the sequential encoder principle of MP-SAE affords an additional benefit of adaptive sparsity at inference time.
arXiv Detail & Related papers (2025-06-03T17:24:55Z) - Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval [13.31210969917096]
We propose a novel interpretability framework for Dense Passage Retrieval (DPR) models.<n>We generate natural language descriptions for each latent concept, enabling human interpretations of both the dense embeddings and the query-document similarity scores of DPR models.<n>We show that Concept-Level Sparse Retrieval (CL-SR) achieves high index-space and computational efficiency while maintaining robust performance across vocabulary and semantic mismatches.
arXiv Detail & Related papers (2025-05-28T02:50:17Z) - Interpreting CLIP with Hierarchical Sparse Autoencoders [8.692675181549117]
Matryoshka SAE (MSAE) learns hierarchical representations at multiple granularities simultaneously.<n>MSAE establishes a new state-of-the-art frontier between reconstruction quality and sparsity for CLIP.
arXiv Detail & Related papers (2025-02-27T22:39:13Z) - The Complexity of Learning Sparse Superposed Features with Feedback [0.9838799448847586]
We investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent.<n>We analyze the feedback complexity associated with learning a feature matrix in sparse settings.<n>Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios.
arXiv Detail & Related papers (2025-02-08T01:54:23Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Equivariant Transduction through Invariant Alignment [71.45263447328374]
We introduce a novel group-equivariant architecture that incorporates a group-in hard alignment mechanism.
We find that our network's structure allows it to develop stronger equivariant properties than existing group-equivariant approaches.
We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task.
arXiv Detail & Related papers (2022-09-22T11:19:45Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.