Enforcing Orderedness to Improve Feature Consistency
- URL: http://arxiv.org/abs/2512.02194v1
- Date: Mon, 01 Dec 2025 20:39:19 GMT
- Title: Enforcing Orderedness to Improve Feature Consistency
- Authors: Sophie L. Wang, Alex Quach, Nithin Parsan, John J. Yang,
- Abstract summary: We introduce Ordered Sparse Autoencoders (OSAE), which extend Matryoshka SAEs by establishing a strict ordering of latent features and deterministically using every feature dimension.<n>We show that OSAEs resolve permutation non-identifiability in settings of sparse dictionary learning where solutions are unique (up to natural symmetries)
- Score: 0.3499870393443268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse autoencoders (SAEs) have been widely used for interpretability of neural networks, but their learned features often vary across seeds and hyperparameter settings. We introduce Ordered Sparse Autoencoders (OSAE), which extend Matryoshka SAEs by (1) establishing a strict ordering of latent features and (2) deterministically using every feature dimension, avoiding the sampling-based approximations of prior nested SAE methods. Theoretically, we show that OSAEs resolve permutation non-identifiability in settings of sparse dictionary learning where solutions are unique (up to natural symmetries). Empirically on Gemma2-2B and Pythia-70M, we show that OSAEs can help improve consistency compared to Matryoshka baselines.
Related papers
- Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders [3.7894019466201274]
Sparse autoencoders (SAEs) have proven useful in disentangling the opaque activations of neural networks.<n>We show that incorporating such group symmetries into the SAEs yields features more useful in downstream tasks.
arXiv Detail & Related papers (2025-11-12T15:48:38Z) - SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning [5.546260420622416]
Symbolic Matrix Preconditioning (SymMaP) learns efficient symbolic expressions for preconditioning parameters.<n>We employ a neural network to search the high-dimensional discrete space for expressions that can accurately predict the optimal parameters.<n> Experimental results show that SymMaP consistently outperforms traditional strategies across various benchmarks.
arXiv Detail & Related papers (2025-10-28T08:25:03Z) - Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z) - Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit [23.806945495163774]
Sparse autoencoders (SAEs) have recently become central tools for interpretability.<n>This paper evaluates SAEs in a controlled setting using MNIST.<n>We compare them with an iterative SAE that unrolls Matching Pursuit (MP-SAE)
arXiv Detail & Related papers (2025-06-05T16:57:58Z) - Interpreting CLIP with Hierarchical Sparse Autoencoders [8.692675181549117]
Matryoshka SAE (MSAE) learns hierarchical representations at multiple granularities simultaneously.<n>MSAE establishes a new state-of-the-art frontier between reconstruction quality and sparsity for CLIP.
arXiv Detail & Related papers (2025-02-27T22:39:13Z) - On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer.<n>We analyze their expressiveness and length generalization performance on regular language tasks.<n>We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z) - SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders [7.618223798662929]
We propose SA-DVAE -- Semantic Alignment via Disentangled Variational Autoencoders.
We implement this idea via a pair of modality-specific variational autoencoders coupled with a total correction penalty.
Experiments show that SA-DAVE produces improved performance over existing methods.
arXiv Detail & Related papers (2024-07-18T12:35:46Z) - Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Compositional ADAM: An Adaptive Compositional Solver [69.31447856853833]
C-ADAM is the first adaptive solver for compositional problems involving a non-linear functional nesting of expected values.
We prove that C-ADAM converges to a stationary point in $mathcalO(delta-2.25)$ with $delta$ being a precision parameter.
arXiv Detail & Related papers (2020-02-10T14:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.