Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
- URL: http://arxiv.org/abs/2411.02124v2
- Date: Thu, 07 Nov 2024 21:36:54 GMT
- Title: Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
- Authors: Kola Ayonrinde,
- Abstract summary: Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks.
We propose two novel SAE variants, Feature Choice SAEs and Mutual Choice SAEs.
Our methods result in SAEs with fewer dead features and improved reconstruction loss at equivalent sparsity levels.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks, enabling model interpretability as well as causal interventions on model internals. SAEs generate sparse feature representations using a sparsifying activation function that implicitly defines a set of token-feature matches. We frame the token-feature matching as a resource allocation problem constrained by a total sparsity upper bound. For example, TopK SAEs solve this allocation problem with the additional constraint that each token matches with at most $k$ features. In TopK SAEs, the $k$ active features per token constraint is the same across tokens, despite some tokens being more difficult to reconstruct than others. To address this limitation, we propose two novel SAE variants, Feature Choice SAEs and Mutual Choice SAEs, which each allow for a variable number of active features per token. Feature Choice SAEs solve the sparsity allocation problem under the additional constraint that each feature matches with at most $m$ tokens. Mutual Choice SAEs solve the unrestricted allocation problem where the total sparsity budget can be allocated freely between tokens and features. Additionally, we introduce a new auxiliary loss function, $\mathtt{aux\_zipf\_loss}$, which generalises the $\mathtt{aux\_k\_loss}$ to mitigate dead and underutilised features. Our methods result in SAEs with fewer dead features and improved reconstruction loss at equivalent sparsity levels as a result of the inherent adaptive computation. More accurate and scalable feature extraction methods provide a path towards better understanding and more precise control of foundation models.
Related papers
- Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing [52.124267908936396]
The model is composed of $M$ arms and $K$ plays.<n>Each arm has a number of capacities, and each unit of capacity is associated with a reward function.<n>When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner.
arXiv Detail & Related papers (2025-12-25T11:19:09Z) - AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features [19.58274892471746]
Sparse autoencoders (SAEs) have emerged as powerful techniques for interpretability of large language models.<n>We introduce such a framework by unrolling the proximal gradient method for sparse coding.<n>We show that a single-step update naturally recovers common SAE variants, including ReLU, JumpReLU, and TopK.
arXiv Detail & Related papers (2025-10-01T01:29:31Z) - Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z) - Distribution-Aware Feature Selection for SAEs [1.2396474483677118]
TopK SAE reconstructs each token from its K most active latents.<n> BatchTopK addresses this limitation by selecting top activations across a batch of tokens.<n>This improves average reconstruction but risks an "activation lottery"
arXiv Detail & Related papers (2025-08-29T04:42:17Z) - Foundations of Top-$k$ Decoding For Language Models [19.73575905188064]
We develop a theoretical framework that both explains and generalizes top-$k$ decoding.<n>We show how to optimize it efficiently for a large class of divergences.
arXiv Detail & Related papers (2025-05-25T23:46:34Z) - SAND: One-Shot Feature Selection with Additive Noise Distortion [3.5976830118932583]
We introduce a novel, non-intrusive feature selection layer that automatically identifies and selects the $k$ most informative features during neural network training.<n>Our method is uniquely simple, requiring no alterations to the loss function, network architecture, or post-selection retraining.<n>Our work demonstrates that simplicity and performance are not mutually exclusive, offering a powerful yet straightforward tool for feature selection in machine learning.
arXiv Detail & Related papers (2025-05-06T18:59:35Z) - Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality [3.9230690073443166]
We introduce a novel activation function, top-AFA, which builds upon our formulation of approximate feature activation (AFA)<n>By training SAEs on three intermediate layers to reconstruct GPT2 hidden embeddings for over 80 million tokens from the OpenWebText dataset, we demonstrate the empirical merits of this approach.
arXiv Detail & Related papers (2025-03-31T16:22:11Z) - Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models [50.214593234229255]
We introduce the novel task of Extreme Short Token Reduction, which aims to represent entire videos using a minimal set of discrete tokens.<n>On the Extreme Short Token Reduction task, our VQToken compresses sequences to just 0.07 percent of their original length while incurring only a 0.66 percent drop in accuracy on the NextQA-MC benchmark.
arXiv Detail & Related papers (2025-03-21T09:46:31Z) - When Less is Enough: Adaptive Token Reduction for Efficient Image Representation [2.2120851074630177]
We introduce a new method for determining feature utility based on the idea that less valuable features can be reconstructed from more valuable ones.
We implement this concept by integrating an autoencoder with a Gumbel-Softmax selection mechanism.
Our results highlight a promising direction towards adaptive and efficient multimodal pruning.
arXiv Detail & Related papers (2025-03-20T19:17:08Z) - Arbitrary-Threshold Fully Homomorphic Encryption with Lower Complexity [8.228450733641122]
We develop a new primitive called textitapproximate secret sharing (ApproxSS)
We prove the correctness and security of AThFHE on top of arbitrary-threshold (ATh)-ApproxSS's properties.
ATASSES achieves a speedup of $3.83times$ -- $15.4times$ over baselines.
arXiv Detail & Related papers (2025-01-20T02:46:08Z) - Training a neural netwok for data reduction and better generalization [7.545668088790516]
The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization.
We show a remarkable phase transition from ignoring irrelevant features to retrieving them well as good thanks to the choice of artificial features.
This approach can be seen as a form of sensing for compressed features to interpret high-dimensional data into a compact, interpretable subset of meaningful penalties.
arXiv Detail & Related papers (2024-11-26T07:41:15Z) - Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning [4.051777802443125]
Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations.
We introduce Gradient SAEs, which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function.
We find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.
arXiv Detail & Related papers (2024-11-15T18:03:52Z) - S-CFE: Simple Counterfactual Explanations [21.975560789792073]
We tackle the problem of finding manifold-aligned counterfactual explanations for sparse data.
Our approach effectively produces sparse, manifold-aligned counterfactual explanations.
arXiv Detail & Related papers (2024-10-21T07:42:43Z) - The Balanced-Pairwise-Affinities Feature Transform [2.3020018305241337]
TheBPA feature transform is designed to upgrade the features of a set of input items to facilitate downstream matching or grouping related tasks.
A particular min-cost-max-flow fractional matching problem leads to a transform which is efficient, differentiable, equivariant, parameterless and probabilistically interpretable.
Empirically, the transform is highly effective and flexible in its use and consistently improves networks it is inserted into, in a variety of tasks and training schemes.
arXiv Detail & Related papers (2024-06-25T14:28:05Z) - Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs.
computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging.
We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z) - STEERER: Resolving Scale Variations for Counting and Localization via
Selective Inheritance Learning [74.2343877907438]
Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms.
We propose a novel method termed STEERER that addresses the issue of scale variations in object counting.
STEERER selects the most suitable scale for patch objects to boost feature extraction and only inherits discriminative features from lower to higher resolution progressively.
arXiv Detail & Related papers (2023-08-21T05:09:07Z) - On the Interplay Between Misspecification and Sub-optimality Gap in
Linear Contextual Bandits [76.2262680277608]
We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class.
We show that our algorithm enjoys the same gap-dependent regret bound $tilde O (d2/Delta)$ as in the well-specified setting up to logarithmic factors.
arXiv Detail & Related papers (2023-03-16T15:24:29Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Can contrastive learning avoid shortcut solutions? [88.249082564465]
implicit feature modification (IFM) is a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features.
IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.
arXiv Detail & Related papers (2021-06-21T16:22:43Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Interpretable feature subset selection: A Shapley value based approach [1.511944009967492]
We introduce the notion of classification game, a cooperative game with features as players and hinge loss based characteristic function.
Our major contribution is ($star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant.
arXiv Detail & Related papers (2020-01-12T16:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.