FAIR: Focused Attention Is All You Need for Generative Recommendation
- URL: http://arxiv.org/abs/2512.11254v2
- Date: Wed, 17 Dec 2025 02:57:06 GMT
- Title: FAIR: Focused Attention Is All You Need for Generative Recommendation
- Authors: Longtao Xiao, Haolin Zhang, Guohao Cai, Jieming Zhu, Yifan Wang, Heng Chang, Zhenhua Dong, Xiu Li, Ruixuan Li,
- Abstract summary: We propose the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones.<n>Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores.<n>We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.
- Score: 43.65370600297507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, transformer-based generative recommendation has garnered significant attention for user behavior modeling. However, it often requires discretizing items into multi-code representations (e.g., typically four code tokens or more), which sharply increases the length of the original item sequence. This expansion poses challenges to transformer-based models for modeling user behavior sequences with inherent noises, since they tend to overallocate attention to irrelevant or noisy context. To mitigate this issue, we propose FAIR, the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones. Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores to eliminate attention noise while focusing on relevant contexts; (2) a noise-robustness objective, which encourages the model to maintain stable attention patterns under stochastic perturbations, preventing undesirable shifts toward irrelevant context due to noise; and (3) a mutual information maximization objective, which guides the model to identify contexts that are most informative for next-item prediction. We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.
Related papers
- Enhancing guidance for missing data in diffusion-based sequential recommendation [10.673207423895747]
We propose a novel Counterfactual Attention Regulation Diffusion model (CARD)<n>CARD focuses on amplifying the signal from key interest-turning-point items while concurrently identifying and suppressing noise within the user sequence.<n>Our method works well on real-world data without being computationally expensive.
arXiv Detail & Related papers (2026-01-22T05:55:21Z) - Lost in the Noise: How Reasoning Models Fail with Contextual Distractors [57.31788955167306]
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information.<n>We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks.<n>Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors.
arXiv Detail & Related papers (2026-01-12T05:43:51Z) - Indirect Attention: Turning Context Misalignment into a Feature [2.3425919199730694]
This work explores a less conventional scenario, when keys and values originate from different sequences or modalities.<n>We first analyze the attention mechanism's behavior under noisy value features, establishing a critical noise threshold.<n>We then model context (key, value) misalignment as an effective form of structured noise within the value features, demonstrating that the noise induced by such misalignment can substantially exceed this critical threshold.<n>Motivated by this, we introduce Indirect Attention, a modified attention mechanism that infers relevance indirectly in scenarios with misaligned context.
arXiv Detail & Related papers (2025-09-30T09:44:00Z) - Semantic Item Graph Enhancement for Multimodal Recommendation [49.66272783945571]
Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information.<n>Prior methods often build modality-specific item-item semantic graphs from raw modality features.<n>These semantic graphs suffer from semantic deficiencies, including insufficient modeling of collaborative signals among items.
arXiv Detail & Related papers (2025-08-08T09:20:50Z) - Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation [62.14692332209628]
"Interaction Distillation" is a novel training framework for more adequate preference modeling through attention-level optimization.<n>It provides more stable and generalizable reward signals compared to state-of-the-art RM optimization methods.
arXiv Detail & Related papers (2025-08-04T17:06:23Z) - Focus What Matters: Matchability-Based Reweighting for Local Feature Matching [6.361840891399624]
We propose a novel attention reweighting mechanism that simultaneously incorporates a learnable bias term into the attention logits.<n>Experiments conducted on three benchmark datasets validate the effectiveness of our method.
arXiv Detail & Related papers (2025-05-04T15:50:28Z) - Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z) - Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.