Related papers: Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation

Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation

URL: http://arxiv.org/abs/2508.02050v1
Date: Mon, 04 Aug 2025 04:33:26 GMT
Title: Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation
Authors: Yuli Liu, Wenjun Kong, Cheng Luo, Weizhi Ma,
Abstract summary: Sequential Recommendation (SR) focuses on personalizing user experiences by predicting future preferences based on historical interactions.<n> Transformer models, with their attention mechanisms, have become the dominant architecture in SR tasks.<n>We introduce two generative attention models for SR, each grounded in the principles of Variational Autoencoders (VAE) and Diffusion Models (DMs)
Score: 9.365893765448366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequential Recommendation (SR) focuses on personalizing user experiences by predicting future preferences based on historical interactions. Transformer models, with their attention mechanisms, have become the dominant architecture in SR tasks due to their ability to capture dependencies in user behavior sequences. However, traditional attention mechanisms, where attention weights are computed through query-key transformations, are inherently linear and deterministic. This fixed approach limits their ability to account for the dynamic and non-linear nature of user preferences, leading to challenges in capturing evolving interests and subtle behavioral patterns. Given that generative models excel at capturing non-linearity and probabilistic variability, we argue that generating attention distributions offers a more flexible and expressive alternative compared to traditional attention mechanisms. To support this claim, we present a theoretical proof demonstrating that generative attention mechanisms offer greater expressiveness and stochasticity than traditional deterministic approaches. Building upon this theoretical foundation, we introduce two generative attention models for SR, each grounded in the principles of Variational Autoencoders (VAE) and Diffusion Models (DMs), respectively. These models are designed specifically to generate adaptive attention distributions that better align with variable user preferences. Extensive experiments on real-world datasets show our models significantly outperform state-of-the-art in both accuracy and diversity.

Related papers

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z)
Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z)
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model [66.91323540178739]
Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior. We revisit SR from a novel information-theoretic perspective and find that sequential modeling methods fail to adequately capture randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests.
arXiv Detail & Related papers (2024-10-31T14:52:01Z)
WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting [9.114664059026767]
We propose a weighted Autoregressive Varying gatE attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components.<n>It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data.
arXiv Detail & Related papers (2024-10-04T05:45:50Z)
Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD) [0.0]
This study introduces the Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD), an innovative approach that integrates attention mechanisms into sequence prediction. VAPAAD combines data augmentation, ConvLSTM2D layers, and a custom-built self-attention mechanism to effectively focus on salient features within a sequence, enhancing predictive accuracy and context-aware analysis.
arXiv Detail & Related papers (2024-04-15T19:06:58Z)
On the Optimization and Generalization of Multi-head Attention [28.33164313549433]
We investigate the potential optimization and generalization advantages of using multiple attention heads. We derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model.
arXiv Detail & Related papers (2023-10-19T12:18:24Z)
Refined Mechanism Design for Approximately Structured Priors via Active Regression [50.71772232237571]
We consider the problem of a revenue-maximizing seller with a large number of items for sale to $n$ strategic bidders. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute.
arXiv Detail & Related papers (2023-10-11T20:34:17Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER) ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities. The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z)
Learning Consistent Deep Generative Models from Sparse Data via Prediction Constraints [16.48824312904122]
We develop a new framework for learning variational autoencoders and other deep generative models. We show that these two contributions -- prediction constraints and consistency constraints -- lead to promising image classification performance.
arXiv Detail & Related papers (2020-12-12T04:18:50Z)
Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles modeling [0.0]
We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring. We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization. We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors.
arXiv Detail & Related papers (2020-10-16T21:48:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.