Related papers: Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation

Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation

URL: http://arxiv.org/abs/2602.15005v1
Date: Mon, 16 Feb 2026 18:45:40 GMT
Title: Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation
Authors: Mengdan Zhu, Yufan Zhao, Tao Di, Yulan Yan, Liang Zhao,
Abstract summary: News recommendation plays a critical role in online news platforms by helping users discover relevant content.<n>Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals.<n>We present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries.
Score: 7.070021001906444
License: http://creativecommons.org/licenses/by/4.0/
Abstract: News recommendation plays a critical role in online news platforms by helping users discover relevant content. Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals that often extend beyond direct news consumption. A key challenge lies in moving beyond surface-level behaviors to capture deeper, reusable user interests while maintaining scalability in large-scale production systems. In this paper, we present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries from cross-domain user signals. We formulate query-list generation as a policy optimization problem and employ GRPO with multiple reward signals. We systematically study two compute dimensions: inference-time sampling and model capacity, and empirically observe consistent improvements with increased compute that exhibit scaling-like behavior. Finally, we perform on-policy distillation to transfer the learned policy from a large, compute-intensive teacher to a compact student model suitable for scalable deployment. Extensive offline experiments, ablation studies and large-scale online A/B tests in a production news recommendation system demonstrate consistent gains in both interest modeling quality and downstream recommendation performance.

Related papers

CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search [10.310885252492925]
CroPS (Cross-Perspective Positive Samples) is a novel retrieval data engine.<n>It enhances training with positive signals derived from user query reformulation behavior.<n>CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.
arXiv Detail & Related papers (2025-11-19T13:57:40Z)
From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization [7.531052649961168]
Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs)<n>We investigate RLVR from a sample-centric perspective and introduce LPPO, a framework of progressive optimization techniques.<n>Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume.
arXiv Detail & Related papers (2025-07-09T06:05:28Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining [7.523158123940574]
Recommendation systems provide users with content that meets their needs. Traditional click-through rate prediction and TOP-K recommendation mechanisms are unable to meet the recommendations needs. This paper proposes a recommendations system model based on a separation embedding cross-network.
arXiv Detail & Related papers (2024-10-19T07:49:21Z)
Self-Supervised Hypergraph Transformer for Recommender Systems [25.07482350586435]
Self-Supervised Hypergraph Transformer (SHT) Self-Supervised Hypergraph Transformer (SHT) Cross-view generative self-supervised learning component is proposed for data augmentation over the user-item interaction graph.
arXiv Detail & Related papers (2022-07-28T18:40:30Z)
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning. CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner. We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z)
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation. In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor. Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z)
Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling. Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z)
Self-supervised Learning for Large-scale Item Recommendations [18.19202958502061]
Large scale recommender models find most relevant items from huge catalogs. With millions to billions of items in the corpus, users tend to provide feedback for a very small set of them. We propose a multi-task self-supervised learning framework for large-scale item recommendations.
arXiv Detail & Related papers (2020-07-25T06:21:43Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.