Related papers: A Hybrid Bandit Framework for Diversified Recommendation

A Hybrid Bandit Framework for Diversified Recommendation

URL: http://arxiv.org/abs/2012.13245v1
Date: Thu, 24 Dec 2020 13:24:40 GMT
Title: A Hybrid Bandit Framework for Diversified Recommendation
Authors: Qinxu Ding, Yong Liu, Chunyan Miao, Fei Cheng, Haihong Tang
Abstract summary: We propose the Linear Modular Dispersion Bandit (LMDB) framework for optimizing a combination of modular functions and dispersion functions. Specifically, LMDB employs modular functions to model the relevance properties of each item, and dispersion functions to describe the diversity properties of an item set. We also develop a learning algorithm, called Linear Modular Dispersion Hybrid (LMDH), to solve the LMDB problem and derive a gap-free bound on its n-step regret.
Score: 42.516774050676254
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The interactive recommender systems involve users in the recommendation procedure by receiving timely user feedback to update the recommendation policy. Therefore, they are widely used in real application scenarios. Previous interactive recommendation methods primarily focus on learning users' personalized preferences on the relevance properties of an item set. However, the investigation of users' personalized preferences on the diversity properties of an item set is usually ignored. To overcome this problem, we propose the Linear Modular Dispersion Bandit (LMDB) framework, which is an online learning setting for optimizing a combination of modular functions and dispersion functions. Specifically, LMDB employs modular functions to model the relevance properties of each item, and dispersion functions to describe the diversity properties of an item set. Moreover, we also develop a learning algorithm, called Linear Modular Dispersion Hybrid (LMDH) to solve the LMDB problem and derive a gap-free bound on its n-step regret. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed LMDB framework in balancing the recommendation accuracy and diversity.

Related papers

RecLM: Recommendation Instruction Tuning [17.780484832381994]
We propose a model-agnostic recommendation instruction-tuning paradigm that seamlessly integrates large language models with collaborative filtering. Our proposed $underlineRec$ommendation enhances the capture of user preference diversity through a carefully designed reinforcement learning reward function.
arXiv Detail & Related papers (2024-12-26T17:51:54Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Large Language Model Empowered Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the potential to understand the semantic connections between items, regardless of their popularity. We present LLMEmb, an innovative technique that harnesses LLM to create item embeddings that bolster the performance of Sequential Recommender Systems.
arXiv Detail & Related papers (2024-09-30T03:59:06Z)
Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information [76.62949982303532]
We propose a parameter-efficient Large Language Model Bi-Tuning framework for sequential recommendation with collaborative information (Laser) In our Laser, the prefix is utilized to incorporate user-item collaborative information and adapt the LLM to the recommendation task, while the suffix converts the output embeddings of the LLM from the language space to the recommendation space for the follow-up item recommendation. M-Former is a lightweight MoE-based querying transformer that uses a set of query experts to integrate diverse user-specific collaborative information encoded by frozen ID-based sequential recommender systems.
arXiv Detail & Related papers (2024-09-03T04:55:03Z)
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation [28.667247613039965]
Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. We propose Instance-wise LoRA (iLoRA) as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. iLoRA achieves an average relative improvement of 11.4% over basic LoRA in the hit ratio metric, with less than a 1% relative increase in trainable parameters.
arXiv Detail & Related papers (2024-08-19T17:09:32Z)
Beyond Inter-Item Relations: Dynamic Adaption for Enhancing LLM-Based Sequential Recommendation [83.87767101732351]
Sequential recommender systems (SRS) predict the next items that users may prefer based on user historical interaction sequences. Inspired by the rise of large language models (LLMs) in various AI applications, there is a surge of work on LLM-based SRS. We propose DARec, a sequential recommendation model built on top of coarse-grained adaption for capturing inter-item relations.
arXiv Detail & Related papers (2024-08-14T10:03:40Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Bayesian preference elicitation for multiobjective combinatorial optimization [12.96855751244076]
We introduce a new incremental preference elicitation procedure able to deal with noisy responses of a Decision Maker (DM) We assume that the preferences of the DM are represented by an aggregation function whose parameters are unknown and that the uncertainty about them is represented by a density function on the parameter space.
arXiv Detail & Related papers (2020-07-29T12:28:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.