Personalized Reward Learning with Interaction-Grounded Learning (IGL)
- URL: http://arxiv.org/abs/2211.15823v1
- Date: Mon, 28 Nov 2022 23:18:10 GMT
- Title: Personalized Reward Learning with Interaction-Grounded Learning (IGL)
- Authors: Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker,
Akanksha Saran, Cheng Tan
- Abstract summary: Modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users.
We propose applying the recent Interaction Grounded Learning paradigm to address the challenge of learning representations of diverse user communication modalities.
- Score: 7.898208662809734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In an era of countless content offerings, recommender systems alleviate
information overload by providing users with personalized content suggestions.
Due to the scarcity of explicit user feedback, modern recommender systems
typically optimize for the same fixed combination of implicit feedback signals
across all users. However, this approach disregards a growing body of work
highlighting that (i) implicit signals can be used by users in diverse ways,
signaling anything from satisfaction to active dislike, and (ii) different
users communicate preferences in different ways. We propose applying the recent
Interaction Grounded Learning (IGL) paradigm to address the challenge of
learning representations of diverse user communication modalities. Rather than
taking a fixed, human-designed reward function, IGL is able to learn
personalized reward functions for different users and then optimize directly
for the latent user satisfaction. We demonstrate the success of IGL with
experiments using simulations as well as with real-world production traces.
Related papers
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries [13.187789731783095]
We present a novel framework that learns text-based summaries of each user's preferences, characteristics, and past conversations.<n>These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user.<n>We show that our method is robust to new users and diverse conversation topics.
arXiv Detail & Related papers (2025-07-17T23:48:51Z) - Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.
We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - LLM-Augmented Graph Neural Recommenders: Integrating User Reviews [2.087411180679868]
We propose a framework that employs a Graph Neural Network (GNN)-based model and an large language model (LLMs) to produce review-aware representations.
Our approach balances user-item interactions against text-derived features, ensuring that user's both behavioral and linguistic signals are effectively captured.
arXiv Detail & Related papers (2025-04-03T00:40:09Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.
Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.
We generate over 1M synthetic personalized preferences using publicly available LLMs.
We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z) - Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation [55.5687800992432]
We propose a plug-and-play framework that synergizes Large Language Models (LLMs) and Knowledge Graphs (KGs) to unveil user preferences.
This enables the LLM to transform KG entities into concise natural language descriptions, allowing them to comprehend domain-specific knowledge.
arXiv Detail & Related papers (2024-11-16T11:47:21Z) - Async Learned User Embeddings for Ads Delivery Optimization [24.104745716074262]
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance.
We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module.
arXiv Detail & Related papers (2024-06-09T19:35:20Z) - Explainable Active Learning for Preference Elicitation [0.0]
We employ Active Learning (AL) to solve the addressed problem with the objective of maximizing information acquisition with minimal user effort.
AL operates for selecting informative data from a large unlabeled set to inquire an oracle to label them.
It harvests user feedback (given for the system's explanations on the presented items) over informative samples to update an underlying machine learning (ML) model.
arXiv Detail & Related papers (2023-09-01T09:22:33Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - RESUS: Warm-Up Cold Users via Meta-Learning Residual User Preferences in
CTR Prediction [14.807495564177252]
Click-Through Rate (CTR) prediction on cold users is a challenging task in recommender systems.
We propose a novel and efficient approach named RESUS, which decouples the learning of global preference knowledge contributed by collective users from the learning of residual preferences for individual users.
Our approach is efficient and effective in improving CTR prediction accuracy on cold users, compared with various state-of-the-art methods.
arXiv Detail & Related papers (2022-10-28T11:57:58Z) - Intent Contrastive Learning for Sequential Recommendation [86.54439927038968]
We introduce a latent variable to represent users' intents and learn the distribution function of the latent variable via clustering.
We propose to leverage the learned intents into SR models via contrastive SSL, which maximizes the agreement between a view of sequence and its corresponding intent.
Experiments conducted on four real-world datasets demonstrate the superiority of the proposed learning paradigm.
arXiv Detail & Related papers (2022-02-05T09:24:13Z) - Learning from a Learning User for Optimal Recommendations [43.2268992294178]
We formalize a model to capture "learning users" and design an efficient system-side learning solution.
We prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse.
Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
arXiv Detail & Related papers (2022-02-03T22:45:12Z) - Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation [61.114580368455236]
User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems.
We propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user.
Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors.
arXiv Detail & Related papers (2021-09-07T04:28:09Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.