Related papers: Permutation Equivariance of Transformers and Its Applications

Permutation Equivariance of Transformers and Its Applications

URL: http://arxiv.org/abs/2304.07735v3
Date: Sun, 31 Mar 2024 11:35:46 GMT
Title: Permutation Equivariance of Transformers and Its Applications
Authors: Hengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, Baochun Li,
Abstract summary: Transformer-based models are robust to shuffling but are limited to inter-token permutation in the forward propagation. We propose permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. As a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property.
Score: 25.666783258054465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.

Related papers

PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Probabilistic Topic Modelling with Transformer Representations [0.9999629695552195]
We propose the Transformer-Representation Neural Topic Model (TNTM) This approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling. Experimental results show that our proposed model achieves results on par with various state-of-the-art approaches in terms of embedding coherence.
arXiv Detail & Related papers (2024-03-06T14:27:29Z)
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations [69.3461199976959]
We propose a model based on invertible neural networks, BERT-INN, to learn the Bijection Hypothesis. We show the advantage of BERT-INN both theoretically and through extensive experiments.
arXiv Detail & Related papers (2023-05-23T22:30:43Z)
Self-Supervised Learning for Group Equivariant Neural Networks [75.62232699377877]
Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input. We propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss. Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed self-supervised tasks.
arXiv Detail & Related papers (2023-03-08T08:11:26Z)
Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks. Group-equivariant convolutions are a popular approach to obtain equivariant representations. We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z)
GAMMT: Generative Ambiguity Modeling Using Multiple Transformers [0.0]
We introduce a novel model called GAMMT (Generative Ambiguity Models using Multiple Transformers) for sequential data that is based on sets of probabilities. Our approach acknowledges that the data generation process of a sequence is not deterministic, but rather ambiguous and influenced by a set of probabilities.
arXiv Detail & Related papers (2022-11-16T06:24:26Z)
Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks. We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems. We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z)
Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model. VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE. We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.