Related papers: Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation

Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation

URL: http://arxiv.org/abs/2508.12645v3
Date: Wed, 20 Aug 2025 04:07:07 GMT
Title: Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation
Authors: Hongyang Liu, Zhu Sun, Tianjun Wei, Yan Wang, Jiajie Zhu, Xinghua Qu,
Abstract summary: DGDPO is a novel framework that constructs user profile through a dynamic and iterative optimization process.<n>Unlike existing LLM-based user simulators that are limited to single-round interactions, we are the first to integrate DGDPO with sequential recommenders.
Score: 15.61963892566877
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have enabled realistic user simulators for developing and evaluating recommender systems (RSs). However, existing LLM-based simulators for RSs face two major limitations: (1) static and single-step prompt-based inference that leads to inaccurate and incomplete user profile construction; (2) unrealistic and single-round recommendation-feedback interaction pattern that fails to capture real-world scenarios. To address these limitations, we propose DGDPO (Diagnostic-Guided Dynamic Profile Optimization), a novel framework that constructs user profile through a dynamic and iterative optimization process to enhance the simulation fidelity. Specifically, DGDPO incorporates two core modules within each optimization loop: firstly, a specialized LLM-based diagnostic module, calibrated through our novel training strategy, accurately identifies specific defects in the user profile. Subsequently, a generalized LLM-based treatment module analyzes the diagnosed defect and generates targeted suggestions to refine the profile. Furthermore, unlike existing LLM-based user simulators that are limited to single-round interactions, we are the first to integrate DGDPO with sequential recommenders, enabling a bidirectional evolution where user profiles and recommendation strategies adapt to each other over multi-round interactions. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of our proposed framework.

Related papers

SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations [25.18297372152296]
SOCRATES is a novel two-stage procedure that automates the design of tailored SO algorithms.<n>An ensemble of digital replicas of the real system is used as a testbed to evaluate a set of baseline SO algorithms.<n>An LLM acts as a meta-optimizer, analyzing the performance trajectories of these algorithms to iteratively revise and compose a final, hybrid optimization schedule.
arXiv Detail & Related papers (2025-11-01T19:57:38Z)
Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z)
Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation [18.40619735445983]
User simulation is increasingly vital to develop and evaluate recommender systems (RSs)<n>A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs.<n>We introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data.
arXiv Detail & Related papers (2025-08-25T15:51:24Z)
When Relevance Meets Novelty: Dual-Stable Periodic Optimization for Exploratory Recommendation [6.663356205396985]
Large language models (LLMs) demonstrate potential with their diverse content generation capabilities.<n>Existing LLM-enhanced dual-model frameworks face two major limitations.<n>First, they overlook long-term preferences driven by group identity, leading to biased interest modeling.<n>Second, they suffer from static optimization flaws, as a one-time alignment process fails to leverage incremental user data for closed-loop optimization.
arXiv Detail & Related papers (2025-08-01T09:10:56Z)
A Novel Self-Evolution Framework for Large Language Models [18.62332474172811]
We propose a novel Dual-Phase Self-Evolution framework to jointly optimize user preference adaptation and domain-specific competence.<n>Experiments across general NLP benchmarks and long-term dialogue tasks demonstrate that DPSE consistently outperforms Supervised Fine-Tuning, Preference Optimization, and Memory-Augmented baselines.
arXiv Detail & Related papers (2025-07-21T06:30:39Z)
RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z)
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models [83.8639566087953]
We propose a direct retrieval-augmented optimization framework, named DRO, that enables end-to-end training of two key components.<n>DRO alternates between two phases: (i) document permutation estimation and (ii) re-weighted, progressively improving RAG components.<n>Our theoretical analysis reveals that DRO is analogous to policy-gradient methods in reinforcement learning.
arXiv Detail & Related papers (2025-05-05T23:54:53Z)
Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training [60.38082979765664]
CPRec is an All-domain Continual Pre-Training framework for Recommendation.<n>It holistically align LLMs with universal user behaviors through the continual pre-training paradigm.<n>We conduct experiments on five real-world datasets from two distinct platforms.
arXiv Detail & Related papers (2025-04-11T20:01:25Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts [40.98057887166546]
Large language model (LLM) agents have emerged as a promising solution to automate the workflow of machine learning.<n>We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design inspired by how human ML experts iteratively refine models.<n>By systematically updating individual components based on real training feedback, Iterative Refinement improves overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.