Evolution of SAE Features Across Layers in LLMs
- URL: http://arxiv.org/abs/2410.08869v1
- Date: Fri, 11 Oct 2024 14:46:49 GMT
- Title: Evolution of SAE Features Across Layers in LLMs
- Authors: Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim,
- Abstract summary: We analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass.
We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers.
- Score: 1.5728609542259502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse Autoencoders for transformer-based language models are typically defined independently per layer. In this work we analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass. We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers. We find that a considerable amount of features are passed through from a previous layer, some features can be expressed as quasi-boolean combinations of previous features, and some features become more specialized in later layers.
Related papers
- Mechanistic Permutability: Match Features Across Layers [4.2056926734482065]
We introduce SAE Match, a novel, data-free method for aligning SAE features across different layers of a neural network.
Our work advances the understanding of feature dynamics in neural networks and provides a new tool for mechanistic interpretability studies.
arXiv Detail & Related papers (2024-10-10T06:55:38Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - The geometry of hidden representations of large transformer models [43.16765170255552]
Large transformers are powerful architectures used for self-supervised data analysis across various data types.
We show that the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next.
We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets.
arXiv Detail & Related papers (2023-02-01T07:50:26Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Simplifying approach to Node Classification in Graph Neural Networks [7.057970273958933]
We decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance.
We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model.
We present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models.
arXiv Detail & Related papers (2021-11-12T14:53:22Z) - EigenGAN: Layer-Wise Eigen-Learning for GANs [84.33920839885619]
EigenGAN is able to unsupervisedly mine interpretable and controllable dimensions from different generator layers.
By traversing the coefficient of a specific eigen-dimension, the generator can produce samples with continuous changes corresponding to a specific semantic attribute.
arXiv Detail & Related papers (2021-04-26T11:14:37Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in
Pixel Labeling [92.90448357454274]
We propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-supervised features to another scale.
By plugging GSTO into HRNet, we get a more powerful backbone for pixel labeling.
Experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules.
arXiv Detail & Related papers (2020-05-27T13:46:58Z) - Associating Multi-Scale Receptive Fields for Fine-grained Recognition [5.079292308180334]
We propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations.
CNL computes correlations between features of a query layer and all response layers.
Our model builds spatial dependencies among multi-level layers and learns more discriminative features.
arXiv Detail & Related papers (2020-05-19T01:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.