A Hybrid Bandit Framework for Diversified Recommendation
- URL: http://arxiv.org/abs/2012.13245v1
- Date: Thu, 24 Dec 2020 13:24:40 GMT
- Title: A Hybrid Bandit Framework for Diversified Recommendation
- Authors: Qinxu Ding, Yong Liu, Chunyan Miao, Fei Cheng, Haihong Tang
- Abstract summary: We propose the Linear Modular Dispersion Bandit (LMDB) framework for optimizing a combination of modular functions and dispersion functions.
Specifically, LMDB employs modular functions to model the relevance properties of each item, and dispersion functions to describe the diversity properties of an item set.
We also develop a learning algorithm, called Linear Modular Dispersion Hybrid (LMDH), to solve the LMDB problem and derive a gap-free bound on its n-step regret.
- Score: 42.516774050676254
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The interactive recommender systems involve users in the recommendation
procedure by receiving timely user feedback to update the recommendation
policy. Therefore, they are widely used in real application scenarios. Previous
interactive recommendation methods primarily focus on learning users'
personalized preferences on the relevance properties of an item set. However,
the investigation of users' personalized preferences on the diversity
properties of an item set is usually ignored. To overcome this problem, we
propose the Linear Modular Dispersion Bandit (LMDB) framework, which is an
online learning setting for optimizing a combination of modular functions and
dispersion functions. Specifically, LMDB employs modular functions to model the
relevance properties of each item, and dispersion functions to describe the
diversity properties of an item set. Moreover, we also develop a learning
algorithm, called Linear Modular Dispersion Hybrid (LMDH) to solve the LMDB
problem and derive a gap-free bound on its n-step regret. Extensive experiments
on real datasets are performed to demonstrate the effectiveness of the proposed
LMDB framework in balancing the recommendation accuracy and diversity.
Related papers
- Large Language Model Empowered Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the potential to understand the semantic connections between items, regardless of their popularity.
We present LLMEmb, an innovative technique that harnesses LLM to create item embeddings that bolster the performance of Sequential Recommender Systems.
arXiv Detail & Related papers (2024-09-30T03:59:06Z) - Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information [76.62949982303532]
We propose a parameter-efficient Large Language Model Bi-Tuning framework for sequential recommendation with collaborative information (Laser)
In our Laser, the prefix is utilized to incorporate user-item collaborative information and adapt the LLM to the recommendation task, while the suffix converts the output embeddings of the LLM from the language space to the recommendation space for the follow-up item recommendation.
M-Former is a lightweight MoE-based querying transformer that uses a set of query experts to integrate diverse user-specific collaborative information encoded by frozen ID-based sequential recommender systems.
arXiv Detail & Related papers (2024-09-03T04:55:03Z) - Customizing Language Models with Instance-wise LoRA for Sequential Recommendation [28.667247613039965]
Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences.
We propose Instance-wise LoRA (iLoRA) as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework.
iLoRA achieves an average relative improvement of 11.4% over basic LoRA in the hit ratio metric, with less than a 1% relative increase in trainable parameters.
arXiv Detail & Related papers (2024-08-19T17:09:32Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Bayesian preference elicitation for multiobjective combinatorial
optimization [12.96855751244076]
We introduce a new incremental preference elicitation procedure able to deal with noisy responses of a Decision Maker (DM)
We assume that the preferences of the DM are represented by an aggregation function whose parameters are unknown and that the uncertainty about them is represented by a density function on the parameter space.
arXiv Detail & Related papers (2020-07-29T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.