Evolution of SAE Features Across Layers in LLMs
- URL: http://arxiv.org/abs/2410.08869v2
- Date: Sun, 17 Nov 2024 22:45:45 GMT
- Title: Evolution of SAE Features Across Layers in LLMs
- Authors: Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim,
- Abstract summary: We analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass.
We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers.
- Score: 1.5728609542259502
- License:
- Abstract: Sparse Autoencoders for transformer-based language models are typically defined independently per layer. In this work we analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass. We provide a graph visualization interface for features and their most similar next-layer neighbors (https://stefanhex.com/spar-2024/feature-browser/), and build communities of related features across layers. We find that a considerable amount of features are passed through from a previous layer, some features can be expressed as quasi-boolean combinations of previous features, and some features become more specialized in later layers.
Related papers
- The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis [0.0]
Large language models (LLMs) represent and recall multi-associated attributes across transformer layers.
intermediate layers encode factual knowledge by superimposing related attributes in overlapping spaces.
later layers refine linguistic patterns and progressively separate attribute representations.
arXiv Detail & Related papers (2025-02-15T18:08:51Z) - Layer by Layer: Uncovering Hidden Representations in Language Models [28.304269706993942]
We show that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks.
Our framework highlights how each model layer balances information compression and signal preservation.
These findings challenge the standard focus on final-layer embeddings and open new directions for model analysis and optimization.
arXiv Detail & Related papers (2025-02-04T05:03:42Z) - Optimizing Speech Multi-View Feature Fusion through Conditional Computation [51.23624575321469]
Self-supervised learning (SSL) features provide lightweight and versatile multi-view speech representations.
SSL features conflict with traditional spectral features like FBanks in terms of update directions.
We propose a novel generalized feature fusion framework grounded in conditional computation.
arXiv Detail & Related papers (2025-01-14T12:12:06Z) - Multi-field Visualization: Trait design and trait-induced merge trees [2.862576303934634]
Feature level sets (FLS) have shown significant potential in the analysis of multi-field data by using traits defined in attribute space to specify features.
In this work, we address key challenges in the practical use of FLS: trait design and feature selection for rendering.
We propose a decomposition of traits into simpler components, making the process more intuitive and computationally efficient.
arXiv Detail & Related papers (2025-01-08T10:13:32Z) - Mechanistic Permutability: Match Features Across Layers [4.2056926734482065]
We introduce SAE Match, a novel, data-free method for aligning SAE features across different layers of a neural network.
Our work advances the understanding of feature dynamics in neural networks and provides a new tool for mechanistic interpretability studies.
arXiv Detail & Related papers (2024-10-10T06:55:38Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Improving Semantic Segmentation in Transformers using Hierarchical
Inter-Level Attention [68.7861229363712]
Hierarchical Inter-Level Attention (HILA) is an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels.
HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder.
We show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS.
arXiv Detail & Related papers (2022-07-05T15:47:31Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in
Pixel Labeling [92.90448357454274]
We propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-supervised features to another scale.
By plugging GSTO into HRNet, we get a more powerful backbone for pixel labeling.
Experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules.
arXiv Detail & Related papers (2020-05-27T13:46:58Z) - Associating Multi-Scale Receptive Fields for Fine-grained Recognition [5.079292308180334]
We propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations.
CNL computes correlations between features of a query layer and all response layers.
Our model builds spatial dependencies among multi-level layers and learns more discriminative features.
arXiv Detail & Related papers (2020-05-19T01:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.