Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation
- URL: http://arxiv.org/abs/2602.15005v1
- Date: Mon, 16 Feb 2026 18:45:40 GMT
- Title: Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation
- Authors: Mengdan Zhu, Yufan Zhao, Tao Di, Yulan Yan, Liang Zhao,
- Abstract summary: News recommendation plays a critical role in online news platforms by helping users discover relevant content.<n>Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals.<n>We present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries.
- Score: 7.070021001906444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: News recommendation plays a critical role in online news platforms by helping users discover relevant content. Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals that often extend beyond direct news consumption. A key challenge lies in moving beyond surface-level behaviors to capture deeper, reusable user interests while maintaining scalability in large-scale production systems. In this paper, we present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries from cross-domain user signals. We formulate query-list generation as a policy optimization problem and employ GRPO with multiple reward signals. We systematically study two compute dimensions: inference-time sampling and model capacity, and empirically observe consistent improvements with increased compute that exhibit scaling-like behavior. Finally, we perform on-policy distillation to transfer the learned policy from a large, compute-intensive teacher to a compact student model suitable for scalable deployment. Extensive offline experiments, ablation studies and large-scale online A/B tests in a production news recommendation system demonstrate consistent gains in both interest modeling quality and downstream recommendation performance.
Related papers
- CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search [10.310885252492925]
CroPS (Cross-Perspective Positive Samples) is a novel retrieval data engine.<n>It enhances training with positive signals derived from user query reformulation behavior.<n>CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.
arXiv Detail & Related papers (2025-11-19T13:57:40Z) - From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization [7.531052649961168]
Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs)<n>We investigate RLVR from a sample-centric perspective and introduce LPPO, a framework of progressive optimization techniques.<n>Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume.
arXiv Detail & Related papers (2025-07-09T06:05:28Z) - Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining [7.523158123940574]
Recommendation systems provide users with content that meets their needs.
Traditional click-through rate prediction and TOP-K recommendation mechanisms are unable to meet the recommendations needs.
This paper proposes a recommendations system model based on a separation embedding cross-network.
arXiv Detail & Related papers (2024-10-19T07:49:21Z) - Self-Supervised Hypergraph Transformer for Recommender Systems [25.07482350586435]
Self-Supervised Hypergraph Transformer (SHT)
Self-Supervised Hypergraph Transformer (SHT)
Cross-view generative self-supervised learning component is proposed for data augmentation over the user-item interaction graph.
arXiv Detail & Related papers (2022-07-28T18:40:30Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Self-supervised Learning for Large-scale Item Recommendations [18.19202958502061]
Large scale recommender models find most relevant items from huge catalogs.
With millions to billions of items in the corpus, users tend to provide feedback for a very small set of them.
We propose a multi-task self-supervised learning framework for large-scale item recommendations.
arXiv Detail & Related papers (2020-07-25T06:21:43Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.