DiffuReason: Bridging Latent Reasoning and Generative Refinement for Sequential Recommendation
- URL: http://arxiv.org/abs/2602.09744v2
- Date: Thu, 12 Feb 2026 07:47:35 GMT
- Title: DiffuReason: Bridging Latent Reasoning and Generative Refinement for Sequential Recommendation
- Authors: Jie Jiang, Yang Wu, Qian Li, Yuling Xiong, Yihang Su, Junbang Huo, Longfei Lu, Jun Zhang, Huan Yu,
- Abstract summary: We propose DiffuReason, a unified "Think-then-Diffuse" framework for sequential recommendation.<n>It integrates multi-step Thinking Tokens for latent reasoning, diffusion-based refinement for denoising intermediate representations, and end-to-end Group Relative Policy Optimization.<n>Experiments on four benchmarks demonstrate that DiffuReason consistently improves diverse backbone architectures.
- Score: 20.756497463882763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent reasoning has emerged as a promising paradigm for sequential recommendation, enabling models to capture complex user intent through multi-step deliberation. Yet existing approaches often rely on deterministic latent chains that accumulate noise and overlook the uncertainty inherent in user intent, and they are typically trained in staged pipelines that hinder joint optimization and exploration. To address these challenges, we propose DiffuReason, a unified "Think-then-Diffuse" framework for sequential recommendation. It integrates multi-step Thinking Tokens for latent reasoning, diffusion-based refinement for denoising intermediate representations, and end-to-end Group Relative Policy Optimization (GRPO) alignment to optimize for ranking performance. In the Think stage, the model generates Thinking Tokens that reason over user history to form an initial intent hypothesis. In the Diffuse stage, rather than treating this hypothesis as the final output, we refine it through a diffusion process that models user intent as a probabilistic distribution, providing iterative denoising against reasoning noise. Finally, GRPO-based reinforcement learning enables the reasoning and refinement modules to co-evolve throughout training, without the constraints of staged optimization. Extensive experiments on four benchmarks demonstrate that DiffuReason consistently improves diverse backbone architectures. Online A/B tests on a large-scale industrial platform further validate its practical effectiveness.
Related papers
- PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations [52.67948063133533]
Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs.<n>Existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces.<n>We propose Promise, a novel framework that integrates dense, step-by-step verification into generative models.
arXiv Detail & Related papers (2026-01-08T07:38:46Z) - Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction [41.53271688465831]
We formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models long-term user preferences.<n>We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimize structured preferences over entire item sequences.<n>To rigorously evaluate multi-step prediction quality, we propose the task-specific metric Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity.
arXiv Detail & Related papers (2025-11-01T12:16:24Z) - Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z) - Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z) - SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion [0.8594140167290097]
SPREAD is a generative framework based on Denoising Diffusion Probabilistic Models (DDPMs)<n>It learns a conditional diffusion process over points sampled from the decision space.<n>It refines candidates via a sampling scheme that uses an adaptive multiple gradient descent-inspired update for fast convergence.
arXiv Detail & Related papers (2025-09-25T12:09:37Z) - REG4Rec: Reasoning-Enhanced Generative Model for Large-Scale Recommendation Systems [25.59169452367297]
Sequential recommendation aims to predict a user's next action in large-scale recommender systems.<n>Recent studies have introduced a reasoning process into generative recommendation, significantly improving recommendation performance.<n>These approaches are constrained by the singularity of item semantic representations.<n>We introduce REG4Rec, a reasoning-enhanced generative model that constructs multiple dynamic semantic reasoning paths.
arXiv Detail & Related papers (2025-08-21T07:02:51Z) - ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [74.37307916314407]
We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning.
arXiv Detail & Related papers (2025-06-23T16:20:44Z) - Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals [45.019257216564036]
This paper investigates extended inductive reasoning in large language models (LLMs)<n>We propose AlignXplore, a model that enables systematic preference inference from behavioral signals in users' interaction histories.<n>We show that AlignXplore achieves substantial improvements over the backbone model by an average of 15.49% on in-domain and out-of-domain benchmarks.
arXiv Detail & Related papers (2025-05-23T16:16:46Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - Diffusion Generative Recommendation with Continuous Tokens [21.222713476105195]
ContRec is a framework that seamlessly integrates continuous tokens into LLM-based RecSys.<n>We show that ContRec consistently outperforms both traditional and SOTA LLM-based recommender systems.<n>Our results highlight the potential of continuous tokenization and generative modeling for advancing the next generation of recommender systems.
arXiv Detail & Related papers (2025-04-16T12:01:03Z) - Slow Thinking for Sequential Recommendation [88.46598279655575]
We present a novel slow thinking recommendation model, named STREAM-Rec.<n>Our approach is capable of analyzing historical user behavior, generating a multi-step, deliberative reasoning process, and delivering personalized recommendations.<n>In particular, we focus on two key challenges: (1) identifying the suitable reasoning patterns in recommender systems, and (2) exploring how to effectively stimulate the reasoning capabilities of traditional recommenders.
arXiv Detail & Related papers (2025-04-13T15:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.