Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
- URL: http://arxiv.org/abs/2411.10397v1
- Date: Fri, 15 Nov 2024 18:03:52 GMT
- Title: Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
- Authors: Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vin Howe, David Wingate,
- Abstract summary: Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations.
We introduce Gradient SAEs, which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function.
We find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.
- Score: 4.051777802443125
- License:
- Abstract: Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs. To address this, we introduce Gradient SAEs (g-SAEs), which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the $k$ elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network. Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts. By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both $\textit{representations}$, retrospectively, and $\textit{actions}$, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.
Related papers
- Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.
We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.
We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD [2.05602972069314]
We investigate the ability of deep neural networks to identify the support of the target function.
Mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input.
arXiv Detail & Related papers (2024-06-17T00:19:16Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Data-aware customization of activation functions reduces neural network
error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error.
A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z) - Adaptive Recursive Circle Framework for Fine-grained Action Recognition [95.51097674917851]
How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition.
Most existing methods generate features of a layer in a pure feedforward manner.
We propose an Adaptive Recursive Circle framework, a fine-grained decorator for pure feedforward layers.
arXiv Detail & Related papers (2021-07-25T14:24:29Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions.
We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z) - Soft-Root-Sign Activation Function [21.716884634290516]
"Soft-Root-Sign" (SRS) is smooth, non-monotonic, and bounded.
In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters.
Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities.
arXiv Detail & Related papers (2020-03-01T18:38:11Z) - Investigating the interaction between gradient-only line searches and
different activation functions [0.0]
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training.
We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
arXiv Detail & Related papers (2020-02-23T12:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.