OxygenREC: An Instruction-Following Generative Framework for E-commerce Recommendation
- URL: http://arxiv.org/abs/2512.22386v2
- Date: Wed, 31 Dec 2025 14:32:49 GMT
- Title: OxygenREC: An Instruction-Following Generative Framework for E-commerce Recommendation
- Authors: Xuegang Hao, Ming Zhang, Alex Li, Xiangyu Qian, Zhi Ma, Yanlong Zang, Shijie Yang, Zhongxuan Han, Xiaolong Ma, Jinguang Liu, Zhen Li, Zhida Jiang, Shusheng Wang, Ning Tang, Yanchen Qiao, Chenxiang Yang, Chen Sun, Jincheng Yuan, Chunhua Peng, Heng Hu, Peijun Yang, Baopeng Yuan, Caiyun Qiu, Zhaolong Xing, Haofei Yuan, Haipeng Zhang, Yuzhang Guo, Weijie Ding, Jiahua Gao, Hao Huang, Zhen Chen, Tongxuan Liu, Pinghua Gong,
- Abstract summary: We present OxygenREC, an industrial recommendation system that leverages Fast-Slow Thinking to deliver deep reasoning with strict latency and multi-scenario requirements.<n>We introduce a semantic alignment mechanism with Instruction-Guided Retrieval (IGR) to filter intent-relevant historical behaviors.<n>We transform scenario information into controllable instructions, using unified reward mapping and Soft Adaptive Group Clip Policy Optimization.
- Score: 19.602480244527182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional recommendation systems suffer from inconsistency in multi-stage optimization objectives. Generative Recommendation (GR) mitigates them through an end-to-end framework; however, existing methods still rely on matching mechanisms based on inductive patterns. Although responsive, they lack the ability to uncover complex user intents that require deductive reasoning based on world knowledge. Meanwhile, LLMs show strong deep reasoning capabilities, but their latency and computational costs remain challenging for industrial applications. More critically, there are performance bottlenecks in multi-scenario scalability: as shown in Figure 1, existing solutions require independent training and deployment for each scenario, leading to low resource utilization and high maintenance costs-a challenge unaddressed in GR literature. To address these, we present OxygenREC, an industrial recommendation system that leverages Fast-Slow Thinking to deliver deep reasoning with strict latency and multi-scenario requirements of real-world environments. First, we adopt a Fast-Slow Thinking architecture. Slow thinking uses a near-line LLM pipeline to synthesize Contextual Reasoning Instructions, while fast thinking employs a high-efficiency encoder-decoder backbone for real-time generation. Second, to ensure reasoning instructions effectively enhance recommendation generation, we introduce a semantic alignment mechanism with Instruction-Guided Retrieval (IGR) to filter intent-relevant historical behaviors and use a Query-to-Item (Q2I) loss for instruction-item consistency. Finally, to resolve multi-scenario scalability, we transform scenario information into controllable instructions, using unified reward mapping and Soft Adaptive Group Clip Policy Optimization (SA-GCPO) to align policies with diverse business objectives, realizing a train-once-deploy-everywhere paradigm.
Related papers
- Is Agentic RAG worth it? An experimental comparison of RAG approaches [0.07777489763207261]
"Retrieval-Augmented Generation" systems are usually defined by the combination of a generator and a retrieval component.<n>These shortcomings have motivated the development of "Enhanced" RAG.<n>The growing self-reflective capabilities of Large Language Models have enabled a new paradigm, which we refer to as "Agentic" RAG.
arXiv Detail & Related papers (2026-01-12T16:43:44Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - Reinforced Latent Reasoning for LLM-based Recommendation [92.56166822197919]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z) - AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models [38.052380726704335]
AdaReasoner is an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations.<n>AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy.<n>It consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.
arXiv Detail & Related papers (2025-05-22T22:06:11Z) - Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation has emerged as a powerful approach to mitigate large language model hallucinations.<n>Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.<n>We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z) - Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG [0.8463972278020965]
Large Language Models (LLMs) have revolutionized artificial intelligence (AI) by enabling human like text generation and natural language understanding.<n>Retrieval Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real time data retrieval to provide contextually relevant responses.<n>Agentic Retrieval-Augmented Generation (RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline.
arXiv Detail & Related papers (2025-01-15T20:40:25Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation [52.55639178180821]
The study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance.<n>Existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to sub-optimal performance.<n>We propose a large language model (LLM)-enhanced paradigm LLM4MSR to fill these gaps.
arXiv Detail & Related papers (2024-06-18T11:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.