An Attentive Inductive Bias for Sequential Recommendation beyond the
Self-Attention
- URL: http://arxiv.org/abs/2312.10325v2
- Date: Sat, 17 Feb 2024 23:49:20 GMT
- Title: An Attentive Inductive Bias for Sequential Recommendation beyond the
Self-Attention
- Authors: Yehjin Shin, Jeongwhan Choi, Hyowon Wi, Noseong Park
- Abstract summary: We present pioneering investigations that reveal the low-pass filtering nature of self-attention in Sequential recommendation (SR) models.
We propose a novel method calledBSARec, which injects an inductive bias by considering fine-grained sequential patterns.
Our discovery shows significant advancements in the SR domain and is expected to bridge the gap for existing Transformer-based SR models.
- Score: 23.610204672115195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential recommendation (SR) models based on Transformers have achieved
remarkable successes. The self-attention mechanism of Transformers for computer
vision and natural language processing suffers from the oversmoothing problem,
i.e., hidden representations becoming similar to tokens. In the SR domain, we,
for the first time, show that the same problem occurs. We present pioneering
investigations that reveal the low-pass filtering nature of self-attention in
the SR, which causes oversmoothing. To this end, we propose a novel method
called $\textbf{B}$eyond $\textbf{S}$elf-$\textbf{A}$ttention for Sequential
$\textbf{Rec}$ommendation (BSARec), which leverages the Fourier transform to i)
inject an inductive bias by considering fine-grained sequential patterns and
ii) integrate low and high-frequency information to mitigate oversmoothing. Our
discovery shows significant advancements in the SR domain and is expected to
bridge the gap for existing Transformer-based SR models. We test our proposed
approach through extensive experiments on 6 benchmark datasets. The
experimental results demonstrate that our model outperforms 7 baseline methods
in terms of recommendation performance. Our code is available at
https://github.com/yehjin-shin/BSARec.
Related papers
- Continuous Speculative Decoding for Autoregressive Image Generation [33.05392461723613]
Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts.
speculative decoding has proven effective in accelerating Large Language Models (LLMs)
This work generalizes the speculative decoding algorithm from discrete tokens to continuous space.
arXiv Detail & Related papers (2024-11-18T09:19:15Z) - On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability [34.43255978863601]
Several suggest that transformers learn a mesa-optimizer during autorere training.
We show that a stronger assumption related to the moments of data is the sufficient necessary condition that the learned mesa-optimizer can perform.
arXiv Detail & Related papers (2024-05-27T05:41:06Z) - Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval [50.47192086219752]
$texttABEL$ is a simple but effective unsupervised method to enhance passage retrieval in zero-shot settings.
By either fine-tuning $texttABEL$ on labelled data or integrating it with existing supervised dense retrievers, we achieve state-of-the-art results.
arXiv Detail & Related papers (2023-11-27T06:22:57Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Sequential Recommendation via Stochastic Self-Attention [68.52192964559829]
Transformer-based approaches embed items as vectors and use dot-product self-attention to measure the relationship between items.
We propose a novel textbfSTOchastic textbfSelf-textbfAttention(STOSA) to overcome these issues.
We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences.
arXiv Detail & Related papers (2022-01-16T12:38:45Z) - Augmenting Sequential Recommendation with Pseudo-Prior Items via
Reversely Pre-training Transformer [61.818320703583126]
Sequential Recommendation characterizes the evolving patterns by modeling item sequences chronologically.
Recent developments of transformer inspire the community to design effective sequence encoders.
We introduce a new framework for textbfAugmenting textbfSequential textbfRecommendation with textbfPseudo-prior items(ASReP)
arXiv Detail & Related papers (2021-05-02T18:06:23Z) - Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs [11.1546439770774]
We present a new type of acquisition functions for online decision making in bandit problems with extreme payoffs.
We formulate a novel type of upper confidence bound (UCB) acquisition function that guides exploration towards the bandits that are deemed most relevant.
arXiv Detail & Related papers (2021-02-19T18:36:03Z) - Stratified Rule-Aware Network for Abstract Visual Reasoning [46.015682319351676]
Raven's Progressive Matrices (RPM) test is typically used to examine the capability of abstract reasoning.
Recent studies, taking advantage of Convolutional Neural Networks (CNNs), have achieved encouraging progress to accomplish the RPM test.
We propose a Stratified Rule-Aware Network (SRAN) to generate the rule embeddings for two input sequences.
arXiv Detail & Related papers (2020-02-17T08:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.