Related papers: Learning with Exposure Constraints in Recommendation Systems

Learning with Exposure Constraints in Recommendation Systems

URL: http://arxiv.org/abs/2302.01377v2
Date: Fri, 10 Nov 2023 09:59:16 GMT
Title: Learning with Exposure Constraints in Recommendation Systems
Authors: Omer Ben-Porat and Rotem Torkan
Abstract summary: We propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.
Score: 7.397067779113841
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.

Related papers

Algorithmic Content Selection and the Impact of User Disengagement [19.14804091327051]
We introduce a model for the content selection problem where dissatisfied users may disengage. We show that when the relationship between each arm's expected reward and effect on user satisfaction are linearly related, an optimal content selection policy can be computed efficiently.
arXiv Detail & Related papers (2024-10-17T00:43:06Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
User Welfare Optimization in Recommender Systems with Competing Content Creators [65.25721571688369]
In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies.
arXiv Detail & Related papers (2024-04-28T21:09:52Z)
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms [12.368291979686122]
We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states. The workhorse applications of our model are learning for recommendation systems and learning for online ads.
arXiv Detail & Related papers (2023-07-21T15:43:32Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Information-Gathering in Latent Bandits [79.6953033727455]
We present a method for information-gathering in latent bandits. We show that choosing the best arm given the agent's beliefs about the states incurs higher regret. We also show that by choosing arms carefully, we obtain an improved estimation of the state distribution.
arXiv Detail & Related papers (2022-07-08T01:15:12Z)
Modeling Content Creator Incentives on Algorithm-Curated Platforms [76.53541575455978]
We study how algorithmic choices affect the existence and character of (Nash) equilibria in exposure games. We propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets.
arXiv Detail & Related papers (2022-06-27T08:16:59Z)
Incentivized Bandit Learning with Self-Reinforcing User Preferences [9.233886766950054]
We investigate a new multi-armed bandit (MAB) online learning model that considers real-world phenomena in many recommender systems. We propose two MAB policies termed "At-Least-$n$ Explore-Then-Commit" and "UCB-List" We prove that both policies achieve $O(log T)$ expected regret with $O(log T)$ expected payment over a time horizon $T$.
arXiv Detail & Related papers (2021-05-19T01:06:32Z)
Incentivizing Exploration in Linear Bandits under Information Gap [50.220743323750035]
We study the problem of incentivizing exploration for myopic users in linear bandits. In order to maximize the long-term reward, the system offers compensation to incentivize the users to pull the exploratory arms.
arXiv Detail & Related papers (2021-04-08T16:01:56Z)
Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach [36.54379845220444]
We study settings in which content providers cannot remain viable unless they receive a certain level of user engagement. Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers. We draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
arXiv Detail & Related papers (2020-07-31T22:40:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.