Diversified Recommendations for Agents with Adaptive Preferences
- URL: http://arxiv.org/abs/2210.07773v1
- Date: Tue, 20 Sep 2022 16:12:22 GMT
- Title: Diversified Recommendations for Agents with Adaptive Preferences
- Authors: Arpit Agarwal, William Brown
- Abstract summary: We consider a problem where an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on fixed preferences, but also on their prior engagements with the platform.
The Recommender presents a menu of $k$ items to the Agent, who selects one item in the menu according to their unknown preference model.
The Recommender then observes the Agent's chosen item and receives bandit feedback of the item's reward.
In addition to optimizing reward from selected items, the Recommender must also ensure that the total distribution of chosen items has sufficiently high entropy.
- Score: 9.578114969867258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When an Agent visits a platform recommending a menu of content to select
from, their choice of item depends not only on fixed preferences, but also on
their prior engagements with the platform. The Recommender's primary objective
is typically to encourage content consumption which optimizes some reward, such
as ad revenue, but they often also aim to ensure that a wide variety of content
is consumed by the Agent over time. We formalize this problem as an adversarial
bandit task. At each step, the Recommender presents a menu of $k$ (out of $n$)
items to the Agent, who selects one item in the menu according to their unknown
preference model, which maps their history of past items to relative selection
probabilities. The Recommender then observes the Agent's chosen item and
receives bandit feedback of the item's reward. In addition to optimizing reward
from selected items, the Recommender must also ensure that the total
distribution of chosen items has sufficiently high entropy.
We define a class of preference models which are locally learnable, i.e.
behavior over the entire domain can be estimated by only observing behavior in
a small region; this includes models representable by bounded-degree
polynomials as well as functions with a sparse Fourier basis. For this class,
we give an algorithm for the Recommender which obtains $\tilde{O}(T^{3/4})$
regret against all item distributions satisfying two conditions: they are
sufficiently diversified, and they are instantaneously realizable at any
history by some distribution over menus. We show that these conditions are
closely connected: all sufficiently high-entropy distributions are
instantaneously realizable at any item history. We also give a set of negative
results justifying our assumptions, in the form of a runtime lower bound for
non-local learning and linear regret lower bounds for alternate benchmarks.
Related papers
- Preference Diffusion for Recommendation [50.8692409346126]
We propose PreferDiff, a tailored optimization objective for DM-based recommenders.
PreferDiff transforms BPR into a log-likelihood ranking objective to better capture user preferences.
It is the first personalized ranking loss designed specifically for DM-based recommenders.
arXiv Detail & Related papers (2024-10-17T01:02:04Z) - Robust Preference Optimization through Reward Model Distillation [68.65844394615702]
Language model (LM) post-training involves maximizing a reward function that is derived from preference annotations.
DPO is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning.
We analyze this phenomenon and propose distillation to get a better proxy for the true preference distribution over generation pairs.
arXiv Detail & Related papers (2024-05-29T17:39:48Z) - Proxy-based Item Representation for Attribute and Context-aware
Recommendation [8.669754546617293]
We propose a proxy-based item representation that allows each item to be expressed as a weighted sum of learnable proxy embeddings.
The proxy-based method calculates the item representations compositionally, ensuring each representation resides inside a well-trained simplex.
Our proposed method is a plug-and-play model that can replace the item encoding layer of any neural network-based recommendation model.
arXiv Detail & Related papers (2023-12-11T06:22:34Z) - Thou Shalt not Pick all Items if Thou are First: of Strategyproof and
Fair Picking Sequences [7.2834950390171205]
We study how to balance priority in the sequence and number of items received.
For several meaningful choices of parameters, we show that the optimal sequence can be computed in a simple way.
arXiv Detail & Related papers (2023-01-11T13:04:51Z) - Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items.
Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate.
We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z) - Models of human preference for learning reward functions [80.39289349661364]
We learn the reward function from human-generated preferences between pairs of trajectory segments.
We find this assumption to be flawed and propose modeling human preferences as informed by each segment's regret.
Our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned.
arXiv Detail & Related papers (2022-06-05T17:58:02Z) - Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback
based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation.
Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z) - Dynamic-K Recommendation with Personalized Decision Boundary [41.70842736417849]
We develop a dynamic-K recommendation task as a joint learning problem with both ranking and classification objectives.
We extend two state-of-the-art ranking-based recommendation methods, i.e., BPRMF and HRM, to the corresponding dynamic-K versions.
Our experimental results on two datasets show that the dynamic-K models are more effective than the original fixed-N recommendation methods.
arXiv Detail & Related papers (2020-12-25T13:02:57Z) - Adaptive Cascade Submodular Maximization [19.29174615532181]
We study the cascade submodular problem under the adaptive setting.
Our objective is to identify the best sequence of selecting items so as to maximize the expected utility of the selected items.
arXiv Detail & Related papers (2020-07-07T16:21:56Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.