HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
- URL: http://arxiv.org/abs/2504.14147v1
- Date: Sat, 19 Apr 2025 02:46:10 GMT
- Title: HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
- Authors: Jiakai Tang, Jingsen Zhang, Zihang Tian, Xueyang Feng, Lei Wang, Xu Chen,
- Abstract summary: We propose a novel human-like feedback-driven optimization framework for explainable recommendations.<n>This framework employs a dynamic interactive optimization mechanism for achieving human-centered explainable requirements without incurring high labor costs.<n>In particular, we propose to utilize large language models (LLMs) as human simulators to predict human-like feedback for guiding the learning process.
- Score: 8.532115411106068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in explainable recommendation have greatly bolstered user experience by elucidating the decision-making rationale. However, the existing methods actually fail to provide effective feedback signals for potentially better or worse generated explanations due to their reliance on traditional supervised learning paradigms in sparse interaction data. To address these issues, we propose a novel human-like feedback-driven optimization framework. This framework employs a dynamic interactive optimization mechanism for achieving human-centered explainable requirements without incurring high labor costs. Specifically, we propose to utilize large language models (LLMs) as human simulators to predict human-like feedback for guiding the learning process. To enable the LLMs to deeply understand the task essence and meet user's diverse personalized requirements, we introduce a human-induced customized reward scoring method, which helps stimulate the language understanding and logical reasoning capabilities of LLMs. Furthermore, considering the potential conflicts between different perspectives of explanation quality, we introduce a principled Pareto optimization that transforms the multi-perspective quality enhancement task into a multi-objective optimization problem for improving explanation performance. At last, to achieve efficient model training, we design an off-policy optimization pipeline. By incorporating a replay buffer and addressing the data distribution biases, we can effectively improve data utilization and enhance model generality. Extensive experiments on four datasets demonstrate the superiority of our approach.
Related papers
- Active Learning for Direct Preference Optimization [59.84525302418018]
Direct preference optimization (DPO) is a form of reinforcement learning from human feedback.<n>We propose an active learning framework for DPO, which can be applied to collect human feedback online or to choose the most informative subset of already collected feedback offline.
arXiv Detail & Related papers (2025-03-03T00:36:31Z) - Enhancing Recommendation Explanations through User-Centric Refinement [7.640281193938638]
We propose a novel paradigm that refines initial explanations generated by existing explainable recommender models.<n>Specifically, we introduce a multi-agent collaborative refinement framework based on large language models.
arXiv Detail & Related papers (2025-02-17T12:08:18Z) - Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment [69.11529841118671]
We propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal.<n>We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment.
arXiv Detail & Related papers (2025-02-04T07:17:54Z) - Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback [12.373566593905792]
Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences.<n>We propose a Low-rank Contextual RLHF framework that integrates contextual information to better model heterogeneous feedback.<n>Our approach builds on a contextual preference model, leveraging the intrinsic low-rank structure of the interaction between user contexts and query-answer pairs.
arXiv Detail & Related papers (2024-12-27T04:02:46Z) - MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback [6.578074497549894]
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning.
This paper introduces a novel linear programming (LP) framework tailored for offline reward learning.
arXiv Detail & Related papers (2024-05-20T23:59:26Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.