Learning with Exposure Constraints in Recommendation Systems
- URL: http://arxiv.org/abs/2302.01377v2
- Date: Fri, 10 Nov 2023 09:59:16 GMT
- Title: Learning with Exposure Constraints in Recommendation Systems
- Authors: Omer Ben-Porat and Rotem Torkan
- Abstract summary: We propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure.
We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.
- Score: 7.397067779113841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recommendation systems are dynamic economic systems that balance the needs of
multiple stakeholders. A recent line of work studies incentives from the
content providers' point of view. Content providers, e.g., vloggers and
bloggers, contribute fresh content and rely on user engagement to create
revenue and finance their operations. In this work, we propose a contextual
multi-armed bandit setting to model the dependency of content providers on
exposure. In our model, the system receives a user context in every round and
has to select one of the arms. Every arm is a content provider who must receive
a minimum number of pulls every fixed time period (e.g., a month) to remain
viable in later rounds; otherwise, the arm departs and is no longer available.
The system aims to maximize the users' (content consumers) welfare. To that
end, it should learn which arms are vital and ensure they remain viable by
subsidizing arm pulls if needed. We develop algorithms with sub-linear regret,
as well as a lower bound that demonstrates that our algorithms are optimal up
to logarithmic factors.
Related papers
- Algorithmic Content Selection and the Impact of User Disengagement [19.14804091327051]
We introduce a model for the content selection problem where dissatisfied users may disengage.
We show that when the relationship between each arm's expected reward and effect on user satisfaction are linearly related, an optimal content selection policy can be computed efficiently.
arXiv Detail & Related papers (2024-10-17T00:43:06Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Preferences Evolve And So Should Your Bandits: Bandits with Evolving
States for Online Platforms [12.368291979686122]
We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states.
The workhorse applications of our model are learning for recommendation systems and learning for online ads.
arXiv Detail & Related papers (2023-07-21T15:43:32Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Information-Gathering in Latent Bandits [79.6953033727455]
We present a method for information-gathering in latent bandits.
We show that choosing the best arm given the agent's beliefs about the states incurs higher regret.
We also show that by choosing arms carefully, we obtain an improved estimation of the state distribution.
arXiv Detail & Related papers (2022-07-08T01:15:12Z) - Modeling Content Creator Incentives on Algorithm-Curated Platforms [76.53541575455978]
We study how algorithmic choices affect the existence and character of (Nash) equilibria in exposure games.
We propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets.
arXiv Detail & Related papers (2022-06-27T08:16:59Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Incentivized Bandit Learning with Self-Reinforcing User Preferences [9.233886766950054]
We investigate a new multi-armed bandit (MAB) online learning model that considers real-world phenomena in many recommender systems.
We propose two MAB policies termed "At-Least-$n$ Explore-Then-Commit" and "UCB-List"
We prove that both policies achieve $O(log T)$ expected regret with $O(log T)$ expected payment over a time horizon $T$.
arXiv Detail & Related papers (2021-05-19T01:06:32Z) - Incentivizing Exploration in Linear Bandits under Information Gap [50.220743323750035]
We study the problem of incentivizing exploration for myopic users in linear bandits.
In order to maximize the long-term reward, the system offers compensation to incentivize the users to pull the exploratory arms.
arXiv Detail & Related papers (2021-04-08T16:01:56Z) - Optimizing Long-term Social Welfare in Recommender Systems: A
Constrained Matching Approach [36.54379845220444]
We study settings in which content providers cannot remain viable unless they receive a certain level of user engagement.
Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers.
We draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
arXiv Detail & Related papers (2020-07-31T22:40:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.