Related papers: Diversified Recommendations for Agents with Adaptive Preferences

Diversified Recommendations for Agents with Adaptive Preferences

URL: http://arxiv.org/abs/2210.07773v1
Date: Tue, 20 Sep 2022 16:12:22 GMT
Title: Diversified Recommendations for Agents with Adaptive Preferences
Authors: Arpit Agarwal, William Brown
Abstract summary: We consider a problem where an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on fixed preferences, but also on their prior engagements with the platform. The Recommender presents a menu of $k$ items to the Agent, who selects one item in the menu according to their unknown preference model. The Recommender then observes the Agent's chosen item and receives bandit feedback of the item's reward. In addition to optimizing reward from selected items, the Recommender must also ensure that the total distribution of chosen items has sufficiently high entropy.
Score: 9.578114969867258
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on fixed preferences, but also on their prior engagements with the platform. The Recommender's primary objective is typically to encourage content consumption which optimizes some reward, such as ad revenue, but they often also aim to ensure that a wide variety of content is consumed by the Agent over time. We formalize this problem as an adversarial bandit task. At each step, the Recommender presents a menu of $k$ (out of $n$) items to the Agent, who selects one item in the menu according to their unknown preference model, which maps their history of past items to relative selection probabilities. The Recommender then observes the Agent's chosen item and receives bandit feedback of the item's reward. In addition to optimizing reward from selected items, the Recommender must also ensure that the total distribution of chosen items has sufficiently high entropy. We define a class of preference models which are locally learnable, i.e. behavior over the entire domain can be estimated by only observing behavior in a small region; this includes models representable by bounded-degree polynomials as well as functions with a sparse Fourier basis. For this class, we give an algorithm for the Recommender which obtains $\tilde{O}(T^{3/4})$ regret against all item distributions satisfying two conditions: they are sufficiently diversified, and they are instantaneously realizable at any history by some distribution over menus. We show that these conditions are closely connected: all sufficiently high-entropy distributions are instantaneously realizable at any item history. We also give a set of negative results justifying our assumptions, in the form of a runtime lower bound for non-local learning and linear regret lower bounds for alternate benchmarks.

Related papers

Preference Diffusion for Recommendation [50.8692409346126]
We propose PreferDiff, a tailored optimization objective for DM-based recommenders. PreferDiff transforms BPR into a log-likelihood ranking objective to better capture user preferences. It is the first personalized ranking loss designed specifically for DM-based recommenders.
arXiv Detail & Related papers (2024-10-17T01:02:04Z)
Dot Product is All You Need: Bridging the Gap Between Item Recommendation and Link Prediction [18.153652861826917]
We show that the item recommendation problem can be seen as an instance of the link prediction problem. We show that their predictive accuracy is competitive with ten state-of-the-art recommendation models.
arXiv Detail & Related papers (2024-09-11T17:27:04Z)
Robust Preference Optimization through Reward Model Distillation [68.65844394615702]
Language model (LM) post-training involves maximizing a reward function that is derived from preference annotations. DPO is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. We analyze this phenomenon and propose distillation to get a better proxy for the true preference distribution over generation pairs.
arXiv Detail & Related papers (2024-05-29T17:39:48Z)
Proxy-based Item Representation for Attribute and Context-aware Recommendation [8.669754546617293]
We propose a proxy-based item representation that allows each item to be expressed as a weighted sum of learnable proxy embeddings. The proxy-based method calculates the item representations compositionally, ensuring each representation resides inside a well-trained simplex. Our proposed method is a plug-and-play model that can replace the item encoding layer of any neural network-based recommendation model.
arXiv Detail & Related papers (2023-12-11T06:22:34Z)
Thou Shalt not Pick all Items if Thou are First: of Strategyproof and Fair Picking Sequences [7.2834950390171205]
We study how to balance priority in the sequence and number of items received. For several meaningful choices of parameters, we show that the optimal sequence can be computed in a simple way.
arXiv Detail & Related papers (2023-01-11T13:04:51Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
Models of human preference for learning reward functions [80.39289349661364]
We learn the reward function from human-generated preferences between pairs of trajectory segments. We find this assumption to be flawed and propose modeling human preferences as informed by each segment's regret. Our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned.
arXiv Detail & Related papers (2022-06-05T17:58:02Z)
Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation. Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z)
Adaptive Cascade Submodular Maximization [19.29174615532181]
We study the cascade submodular problem under the adaptive setting. Our objective is to identify the best sequence of selecting items so as to maximize the expected utility of the selected items.
arXiv Detail & Related papers (2020-07-07T16:21:56Z)
Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem. We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons. We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.