Related papers: Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

URL: http://arxiv.org/abs/2502.00264v1
Date: Sat, 01 Feb 2025 01:44:55 GMT
Title: Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Authors: Binchi Zhang, Zaiyi Zheng, Zhengzhang Chen, Jundong Li,
Abstract summary: We introduce rotation symmetry, a novel form of parameter space symmetry for transformers.<n>Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers.<n>We propose a theoretically optimal matching algorithm as a plug-and-play module to enhance model fusion.
Score: 43.299430093251736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry.

Related papers

Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Permutation Equivariant Neural Networks for Symmetric Tensors [0.0]
We present two different characterisations of all linear permutation equivariant functions between symmetric power spaces of $mathbbRn$. We show that these functions are highly efficient compared to standard tensors and have potential to generalise well to symmetrics of different sizes.
arXiv Detail & Related papers (2025-03-14T10:33:13Z)
Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size [6.9060054915724]
Transformers are a type of neural network that have demonstrated remarkable performance across various domains. In this study, we investigate the ability of transformers to approximate column-symmetrics.
arXiv Detail & Related papers (2025-02-17T05:56:11Z)
Learning functions on symmetric matrices and point clouds via lightweight invariant features [26.619014249559942]
We present a formulation for machine learning of functions on symmetric matrices that are invariant with respect to the action of permutations. We show that these invariant features can separate all distinct orbits of symmetric matrices except for a measure zero set. For point clouds in a fixed dimension, we prove that the number of invariant features can be reduced, generically without losing expressivity.
arXiv Detail & Related papers (2024-05-13T18:24:03Z)
Towards Understanding Inductive Bias in Transformers: A View From Infinity [9.00214539845063]
We argue transformers tend to be biased towards more permutation symmetric functions in sequence space. We show that the representation theory of the symmetric group can be used to give quantitative analytical predictions. We argue WikiText dataset, does indeed possess a degree of permutation symmetry.
arXiv Detail & Related papers (2024-02-07T19:00:01Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance [16.49488981364657]
We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. We use an arbitrary base model such as anvariant or a transformer and symmetrize it to be equivariant to the given group. Empirical tests show competitive results against tailored equivariant architectures.
arXiv Detail & Related papers (2023-06-05T13:40:54Z)
Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset. The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function. The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z)
Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the transformations symmetry and the corresponding generators. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z)
Learning Symmetric Embeddings for Equivariant World Models [9.781637768189158]
We propose learning symmetric embedding networks (SENs) that encode an input space (e.g. images) This network can be trained end-to-end with an equivariant task network to learn an explicitly symmetric representation. Our experiments demonstrate that SENs facilitate the application of equivariant networks to data with complex symmetry representations.
arXiv Detail & Related papers (2022-04-24T22:31:52Z)
Meta-Learning Symmetries by Reparameterization [63.85144439337671]
We present a method for learning and encoding equivariances into networks by learning corresponding parameter sharing patterns from data. Our experiments suggest that it can automatically learn to encode equivariances to common transformations used in image processing tasks.
arXiv Detail & Related papers (2020-07-06T17:59:54Z)
Inverse Learning of Symmetries [71.62109774068064]
We learn the symmetry transformation with a model consisting of two latent subspaces. Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser. Our model outperforms state-of-the-art methods on artificial and molecular datasets.
arXiv Detail & Related papers (2020-02-07T13:48:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.