Related papers: Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

URL: http://arxiv.org/abs/2411.13117v1
Date: Wed, 20 Nov 2024 08:21:53 GMT
Title: Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
Authors: Charles O'Neill, David Klindt,
Abstract summary: We investigate sparse inference and learning in SAEs through the lens of sparse coding. We show that SAEs perform amortised sparse inference with a computationally restricted encoder. We empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. In this paper, we investigate sparse inference and learning in SAEs through the lens of sparse coding. Specifically, we show that SAEs perform amortised sparse inference with a computationally restricted encoder and, using compressed sensing theory, we prove that this mapping is inherently insufficient for accurate sparse inference, even in solvable cases. Building on this theory, we empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our key contribution is the decoupling of the encoding and decoding processes, which allows for a comparison of various sparse encoding strategies. We evaluate these strategies on two dimensions: alignment with true underlying sparse features and correct inference of sparse codes, while also accounting for computational costs during training and inference. Our results reveal that substantial performance gains can be achieved with minimal increases in compute cost. We demonstrate that this generalises to SAEs applied to large language models (LLMs), where advanced encoders achieve similar interpretability. This work opens new avenues for understanding neural network representations and offers important implications for improving the tools we use to analyse the activations of large language models.

Related papers

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios [76.85739138203014]
We present SpecFormer, a novel architecture that accelerates unidirectional and attention mechanisms.<n>We demonstrate that SpecFormer achieves lower training demands and reduced computational costs.
arXiv Detail & Related papers (2025-11-25T14:20:08Z)
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures [12.466522376751811]
Hyperdimensional Probe is a novel paradigm for decoding information from the Large Language Models vector space.<n>It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts.<n>Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
arXiv Detail & Related papers (2025-09-29T16:59:07Z)
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation [3.588453140011797]
We propose KronSAE, a novel architecture that factorizes the latent representation via Kronecker product decomposition.<n>We also introduce mAND, a differentiable activation function approximating the binary AND operation, which improves interpretability and performance.
arXiv Detail & Related papers (2025-05-28T11:41:11Z)
SplInterp: Improving our Understanding and Training of Sparse Autoencoders [10.800240155402417]
Sparse autoencoders (SAEs) have received considerable recent attention as tools for mechanistic interpretability.<n>There have been recent doubts about the true utility of SAEs.<n>We develop a novel proximal alternating method SGD (PAM-SGD) algorithm for training SAEs.
arXiv Detail & Related papers (2025-05-17T04:51:26Z)
A Theoretical Perspective for Speculative Decoding Algorithm [60.79447486066416]
One effective way to accelerate inference is emphSpeculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate. This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, emphoutput quality and inference acceleration, from a theoretical perspective.
arXiv Detail & Related papers (2024-10-30T01:53:04Z)
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs [0.0]
We present an information-theoretic framework for interpreting SAEs as lossy compression algorithms. We argue that using MDL rather than sparsity may avoid potential pitfalls with naively maximising sparsity.
arXiv Detail & Related papers (2024-10-15T01:38:03Z)
Sample what you cant compress [6.24979299238534]
We show how to learn a continuous encoder and decoder under a diffusion-based loss. This approach yields better reconstruction quality as compared to GAN-based autoencoders. We also show that the resulting representation is easier to model with a latent diffusion model as compared to the representation obtained from a state-of-the-art GAN-based loss.
arXiv Detail & Related papers (2024-09-04T08:42:42Z)
Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks. We present one of the first applications of SAEs to dense text embeddings from large language models. We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z)
Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias. Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z)
Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z)
In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z)
Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning. Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z)
Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes [59.55193427277134]
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels. RM codes only admit limited sets of rates. Efficient decoders are available for RM codes at finite lengths.
arXiv Detail & Related papers (2023-01-16T04:11:14Z)
Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z)
Benign Autoencoders [0.0]
We formalize the problem of finding the optimal encoder-decoder pair and characterize its solution, which we name the "benign autoencoder" (BAE) We prove that BAE projects data onto a manifold whose dimension is the optimal compressibility dimension of the generative problem. As an illustration, we show how BAE can find optimal, low-dimensional latent representations that improve the performance of a discriminator under a distribution shift.
arXiv Detail & Related papers (2022-10-02T21:36:27Z)
Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z)
Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models. We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words. Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z)
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD) As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z)
Variational Autoencoders: A Harmonic Perspective [79.49579654743341]
We study Variational Autoencoders (VAEs) from the perspective of harmonic analysis. We show that the encoder variance of a VAE controls the frequency content of the functions parameterised by the VAE encoder and decoder neural networks.
arXiv Detail & Related papers (2021-05-31T10:39:25Z)
The Interpretable Dictionary in Sparse Coding [4.205692673448206]
In our work, we illustrate that an ANN, trained using sparse coding under specific sparsity constraints, yields a more interpretable model than the standard deep learning model. The dictionary learned by sparse coding can be more easily understood and the activations of these elements creates a selective feature output.
arXiv Detail & Related papers (2020-11-24T00:26:40Z)
A New Modal Autoencoder for Functionally Independent Feature Extraction [6.690183908967779]
A new modal autoencoder (MAE) is proposed by othogonalising the columns of the readout weight matrix. The results were validated on the MNIST variations and USPS classification benchmark suite. The new MAE introduces a very simple training principle for autoencoders and could be promising for the pre-training of deep neural networks.
arXiv Detail & Related papers (2020-06-25T13:25:10Z)
MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space. We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.