Related papers: Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning

Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning

URL: http://arxiv.org/abs/2411.10397v1
Date: Fri, 15 Nov 2024 18:03:52 GMT
Title: Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
Authors: Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vin Howe, David Wingate,
Abstract summary: Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations. We introduce Gradient SAEs, which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function. We find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.
Score: 4.051777802443125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs. To address this, we introduce Gradient SAEs (g-SAEs), which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the $k$ elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network. Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts. By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both $\textit{representations}$, retrospectively, and $\textit{actions}$, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.

Related papers

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z)
Active Learning Using Aggregated Acquisition Functions: Accuracy and Sustainability Analysis [14.398823059302279]
Active learning (AL) is a machine learning approach that strategically selects the most informative samples for annotation during training.<n>This strategy not only reduces labeling expenses but also results in energy savings during neural network training.<n>We implement and evaluate various state-of-the-art acquisition functions, analyzing their accuracy and computational costs.
arXiv Detail & Related papers (2026-02-07T08:42:12Z)
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation [24.639936266140385]
Existing static pruning methods degrade performance by permanently removing weights or tokens.<n>We propose Act VAR, a dynamic activation framework that introduces dual sparsity across model weights and token sequences.<n>Experiments on the ImageNet $256times 256$ benchmark demonstrate that Act VAR achieves up to $21.2%$ FLOPs reduction with minimal performance degradation.
arXiv Detail & Related papers (2025-11-17T02:28:06Z)
Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Learnable polynomial, trigonometric, and tropical activations [1.534667887016089]
This paper investigates scalable neural networks with learnable activation functions based on function bases and tropicals. We propose a scheme that preserves unitary variance in transformers and convolutional networks, ensuring stable gradient flow even in deep architectures. Experiments demonstrate that networks with Hermite, Fourier, and Tropical-based learnable activations significantly improve over GPT-2 and ConvNeXt networks in terms of accuracy and perplexity in train and test.
arXiv Detail & Related papers (2025-02-03T11:13:58Z)
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated. We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric. We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z)
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD [2.05602972069314]
We investigate the ability of deep neural networks to identify the support of the target function. Mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input.
arXiv Detail & Related papers (2024-06-17T00:19:16Z)
Manipulating Feature Visualizations with Gradient Slingshots [53.94925202421929]
Feature Visualization (FV) is a widely used technique for interpreting the concepts learned by Deep Neural Networks (DNNs)<n>We introduce a novel method, Gradient Slingshots, that enables manipulation of FV without modifying the model architecture or significantly degrading its performance.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Data-aware customization of activation functions reduces neural network error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error. A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z)
Adaptive Recursive Circle Framework for Fine-grained Action Recognition [95.51097674917851]
How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. Most existing methods generate features of a layer in a pure feedforward manner. We propose an Adaptive Recursive Circle framework, a fine-grained decorator for pure feedforward layers.
arXiv Detail & Related papers (2021-07-25T14:24:29Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions. We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z)
Lightweight Single-Image Super-Resolution Network with Attentive Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR. Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
Soft-Root-Sign Activation Function [21.716884634290516]
"Soft-Root-Sign" (SRS) is smooth, non-monotonic, and bounded. In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters. Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities.
arXiv Detail & Related papers (2020-03-01T18:38:11Z)
Investigating the interaction between gradient-only line searches and different activation functions [0.0]
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
arXiv Detail & Related papers (2020-02-23T12:28:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.