Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
- URL: http://arxiv.org/abs/2410.06672v2
- Date: Thu, 10 Oct 2024 16:51:42 GMT
- Title: Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
- Authors: Junxuan Wang, Xuyang Ge, Wentao Shu, Qiong Tang, Yunhua Zhou, Zhengfu He, Xipeng Qiu,
- Abstract summary: We investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity.
We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features from these models and show that most features are similar in these two models.
- Score: 49.24097977047392
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The hypothesis of Universality in interpretability suggests that different neural networks may converge to implement similar algorithms on similar tasks. In this work, we investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity. We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features from these models and show that most features are similar in these two models. We also validate the correlation between feature similarity and Universality. We then delve into the circuit-level analysis of Mamba models and find that the induction circuits in Mamba are structurally analogous to those in Transformers. We also identify a nuanced difference we call \emph{Off-by-One motif}: The information of one token is written into the SSM state in its next position. Whilst interaction between tokens in Transformers does not exhibit such trend.
Related papers
- Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs)
Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens.
Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers [2.8554857235549753]
Transformer architecture's core mechanism -- attention requires $O(n2)$ time complexity in training and $O(n)$ time complexity in inference.
A notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in sequence modeling tasks.
We find that Mamba models achieve competitive performance compared to transformer-based models with the same training recipe.
arXiv Detail & Related papers (2024-03-27T06:07:05Z) - Merging Text Transformer Models from Different Initializations [7.768975909119287]
We investigate the extent to which separate Transformer minima learn similar features.
We propose a model merging technique to investigate the relationship between these minima in the loss landscape.
Our results show that the minima of these models are less sharp and isolated than previously understood.
arXiv Detail & Related papers (2024-03-01T21:16:29Z) - Mamba: Linear-Time Sequence Modeling with Selective State Spaces [31.985243136674146]
Foundation models are almost universally based on the Transformer architecture and its core attention module.
We identify that a key weakness of such models is their inability to perform content-based reasoning.
We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even blocks (Mamba)
As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.
arXiv Detail & Related papers (2023-12-01T18:01:34Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - What do Toothbrushes do in the Kitchen? How Transformers Think our World
is Structured [137.83584233680116]
We investigate what extent transformer-based language models allow for extracting knowledge about object relations.
We show that the models combined with the different similarity measures differ greatly in terms of the amount of knowledge they allow for extracting.
Surprisingly, static models perform almost as well as contextualized models -- in some cases even better.
arXiv Detail & Related papers (2022-04-12T10:00:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.