LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation
- URL: http://arxiv.org/abs/2601.19585v1
- Date: Tue, 27 Jan 2026 13:22:30 GMT
- Title: LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation
- Authors: Chongjun Xia, Yanchun Peng, Xianzhi Wang,
- Abstract summary: We propose LLM-Enhanced Reinforcement Learning (LERL), a novel hierarchical recommendation framework.<n>LERL consists of a high-level LLM-based planner that selects semantically diverse content categories, and a low-level RL policy that recommends personalized items.<n>LERL significantly improves long-term user satisfaction when compared with state-of-the-art baselines.
- Score: 3.247395557141079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive recommender systems can dynamically adapt to user feedback, but often suffer from content homogeneity and filter bubble effects due to overfitting short-term user preferences. While recent efforts aim to improve content diversity, they predominantly operate in static or one-shot settings, neglecting the long-term evolution of user interests. Reinforcement learning provides a principled framework for optimizing long-term user satisfaction by modeling sequential decision-making processes. However, its application in recommendation is hindered by sparse, long-tailed user-item interactions and limited semantic planning capabilities. In this work, we propose LLM-Enhanced Reinforcement Learning (LERL), a novel hierarchical recommendation framework that integrates the semantic planning power of LLM with the fine-grained adaptability of RL. LERL consists of a high-level LLM-based planner that selects semantically diverse content categories, and a low-level RL policy that recommends personalized items within the selected semantic space. This hierarchical design narrows the action space, enhances planning efficiency, and mitigates overexposure to redundant content. Extensive experiments on real-world datasets demonstrate that LERL significantly improves long-term user satisfaction when compared with state-of-the-art baselines. The implementation of LERL is available at https://anonymous.4open.science/r/code3-18D3/.
Related papers
- Balancing Fine-tuning and RAG: A Hybrid Strategy for Dynamic LLM Recommendation Updates [11.974496007403694]
Large Language Models (LLMs) empower recommendation systems through their advanced reasoning and planning capabilities.<n>This paper investigates strategies for updating LLM-powered recommenders, focusing on the trade-offs between ongoing fine-tuning and Retrieval-Augmented Generation (RAG)<n>We propose a hybrid update strategy that leverages the long-term knowledge adaptation of periodic fine-tuning with the agility of low-cost RAG.
arXiv Detail & Related papers (2025-10-23T06:31:00Z) - Using LLMs to Capture Users' Temporal Context for Recommendation [3.719862246745416]
This paper presents an assessment of Large Language Models (LLMs) for generating semantically rich, time-aware user profiles.<n>We do not propose a novel end-to-end recommendation architecture, but the core contribution is a systematic investigation into the degree of LLM effectiveness.<n>The evaluation across Movies&TV and Video Games domains suggests that while LLM-generated profiles offer semantic depth and temporal structure, their effectiveness for context-aware recommendations is notably contingent on the richness of user interaction histories.
arXiv Detail & Related papers (2025-08-11T22:48:31Z) - Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations [3.719862246745416]
We propose a novel method for user profiling that explicitly models short-term and long-term preferences.<n>LLM-TUP achieves substantial improvements over several baselines.
arXiv Detail & Related papers (2025-08-11T20:28:24Z) - DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z) - User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems [26.652050105571206]
Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems.<n>This paper introduces a novel approach that combines hierarchical planning with LLM inference-time scaling.<n>We show significant gains in both user satisfaction (measured by watch activity and active user counts) and exploration diversity.
arXiv Detail & Related papers (2025-04-07T21:44:12Z) - Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning [16.654435148168172]
Large Language Models (LLMs) have shown remarkable promise in reasoning and decision-making.<n>We propose an LLM-guided hierarchical RL framework, termed LDSC, to enhance sample efficiency, generalization, and multi-task adaptability.
arXiv Detail & Related papers (2025-03-24T15:49:56Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.<n>Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.<n>Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Beyond Inter-Item Relations: Dynamic Adaption for Enhancing LLM-Based Sequential Recommendation [83.87767101732351]
Sequential recommender systems (SRS) predict the next items that users may prefer based on user historical interaction sequences.
Inspired by the rise of large language models (LLMs) in various AI applications, there is a surge of work on LLM-based SRS.
We propose DARec, a sequential recommendation model built on top of coarse-grained adaption for capturing inter-item relations.
arXiv Detail & Related papers (2024-08-14T10:03:40Z) - Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation.
We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users.
We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z) - LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation [52.55639178180821]
The study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance.<n>Existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to sub-optimal performance.<n>We propose a large language model (LLM)-enhanced paradigm LLM4MSR to fill these gaps.
arXiv Detail & Related papers (2024-06-18T11:59:36Z) - Large Language Models are Learnable Planners for Long-Term Recommendation [59.167795967630305]
Planning for both immediate and long-term benefits becomes increasingly important in recommendation.
Existing methods apply Reinforcement Learning to learn planning capacity by maximizing cumulative reward for long-term recommendation.
We propose to leverage the remarkable planning capabilities over sparse data of Large Language Models for long-term recommendation.
arXiv Detail & Related papers (2024-02-29T13:49:56Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.