EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention
- URL: http://arxiv.org/abs/2403.17729v2
- Date: Thu, 4 Apr 2024 14:29:34 GMT
- Title: EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention
- Authors: Zhen Tian, Wayne Xin Zhao, Changwang Zhang, Xin Zhao, Zhongrui Ma, Ji-Rong Wen,
- Abstract summary: We propose a novel transformer variant with complex vector attention, named EulerFormer.
It provides a unified theoretical framework to formulate both semantic difference and positional difference.
It is more robust to semantic variations and possesses moresuperior theoretical properties in principle.
- Score: 88.45459681677369
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the isotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.
Related papers
- Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data [0.0]
directional regularity is a new definition of anisotropy for multivariate functional data.
We show that faster rates of convergence can be obtained through a change-of-basis.
We discuss two possible applications of the directional regularity approach.
arXiv Detail & Related papers (2024-09-01T19:09:00Z) - EqMotion: Equivariant Multi-agent Motion Prediction with Invariant
Interaction Reasoning [83.11657818251447]
We propose EqMotion, an efficient equivariant motion prediction model with invariant interaction reasoning.
We conduct experiments for the proposed model on four distinct scenarios: particle dynamics, molecule dynamics, human skeleton motion prediction and pedestrian trajectory prediction.
Our method achieves state-of-the-art prediction performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%.
arXiv Detail & Related papers (2023-03-20T05:23:46Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Consistency Regularization for Variational Auto-Encoders [14.423556966548544]
Variational auto-encoders (VAEs) are a powerful approach to unsupervised learning.
We propose a regularization method to enforce consistency in VAEs.
arXiv Detail & Related papers (2021-05-31T10:26:32Z) - Building powerful and equivariant graph neural networks with structural
message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas.
First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node.
Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z) - Autoencoding Pixies: Amortised Variational Inference with Graph
Convolutions for Functional Distributional Semantics [12.640283469603355]
Pixie Autoencoder augments the generative model of Functional Distributional Semantics with a graph-convolutional neural network to perform amortised variational inference.
arXiv Detail & Related papers (2020-05-06T17:46:40Z) - The general theory of permutation equivarant neural networks and higher
order graph variational encoders [6.117371161379209]
We derive formulae for general permutation equivariant layers, including the case where the layer acts on matrices by permuting their rows and columns simultaneously.
This case arises naturally in graph learning and relation learning applications.
We present a second order graph variational encoder, and show that the latent distribution of equivariant generative models must be exchangeable.
arXiv Detail & Related papers (2020-04-08T13:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.