Leveraging heterogeneous spillover in maximizing contextual bandit rewards
- URL: http://arxiv.org/abs/2310.10259v2
- Date: Fri, 24 Jan 2025 18:30:45 GMT
- Title: Leveraging heterogeneous spillover in maximizing contextual bandit rewards
- Authors: Ahmed Sayeed Faruk, Elena Zheleva,
- Abstract summary: We present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers.
Our framework leads to significantly higher rewards than existing state-of-the-art solutions.
- Score: 10.609670658904562
- License:
- Abstract: Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.
Related papers
- Envious Explore and Exploit [8.029049649310213]
We study the societal effects of explore-and-exploit mechanisms using the economic notion of envy.
We present a multi-armed bandit-like model in which every round consists of several sessions, and rewards are realized once per round.
On the downside, doing so also generates envy, as late-to-arrive users enjoy the information gathered by early-to-arrive users.
arXiv Detail & Related papers (2025-02-18T12:00:35Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - $\alpha$-Fair Contextual Bandits [10.74025233418392]
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection.
One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round.
In this paper, we consider the $alpha$-Fairtextual Con Bandits problem, where the objective is to maximize the global $alpha$-fair utility function.
arXiv Detail & Related papers (2023-10-22T03:42:59Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Selectively Contextual Bandits [11.438194383787604]
We propose a new online learning algorithm that preserves benefits of personalization while increasing the commonality in treatments across users.
Our approach selectively interpolates between a contextual bandit algorithm and a context-free multi-arm bandit.
We evaluate our approach in a classification setting using public datasets and show the benefits of the hybrid policy.
arXiv Detail & Related papers (2022-05-09T19:47:46Z) - Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z) - Coordinated Attacks against Contextual Bandits: Fundamental Limits and
Defense Mechanisms [75.17357040707347]
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in contextual bandits.
The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible.
We show we can achieve an $tildeO(min(S,A)cdot alpha/epsilon2)$ upper-bound, by employing efficient robust mean estimators.
arXiv Detail & Related papers (2022-01-30T01:45:13Z) - Fairness-Aware Explainable Recommendation over Knowledge Graphs [73.81994676695346]
We analyze different groups of users according to their level of activity, and find that bias exists in recommendation performance between different groups.
We show that inactive users may be more susceptible to receiving unsatisfactory recommendations, due to insufficient training data for the inactive users.
We propose a fairness constrained approach via re-ranking to mitigate this problem in the context of explainable recommendation over knowledge graphs.
arXiv Detail & Related papers (2020-06-03T05:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.