Leveraging heterogeneous spillover in maximizing contextual bandit rewards
- URL: http://arxiv.org/abs/2310.10259v2
- Date: Fri, 24 Jan 2025 18:30:45 GMT
- Title: Leveraging heterogeneous spillover in maximizing contextual bandit rewards
- Authors: Ahmed Sayeed Faruk, Elena Zheleva,
- Abstract summary: We present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers.<n>Our framework leads to significantly higher rewards than existing state-of-the-art solutions.
- Score: 10.609670658904562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors' actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.
Related papers
- Envious Explore and Exploit [8.029049649310213]
We study the societal effects of explore-and-exploit mechanisms using the economic notion of envy.
We present a multi-armed bandit-like model in which every round consists of several sessions, and rewards are realized once per round.
On the downside, doing so also generates envy, as late-to-arrive users enjoy the information gathered by early-to-arrive users.
arXiv Detail & Related papers (2025-02-18T12:00:35Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Learning Recommender Systems with Soft Target: A Decoupled Perspective [49.83787742587449]
We propose a novel decoupled soft label optimization framework to consider the objectives as two aspects by leveraging soft labels.
We present a sensible soft-label generation algorithm that models a label propagation algorithm to explore users' latent interests in unobserved feedback via neighbors.
arXiv Detail & Related papers (2024-10-09T04:20:15Z) - The Nah Bandit: Modeling User Non-compliance in Recommendation Systems [2.421459418045937]
Expert with Clustering (EWC) is a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning.
EWC outperforms both supervised learning and traditional contextual bandit approaches.
This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.
arXiv Detail & Related papers (2024-08-15T03:01:02Z) - Relevance meets Diversity: A User-Centric Framework for Knowledge Exploration through Recommendations [15.143224593682012]
We propose a novel recommendation strategy that combines relevance and diversity by a copula function.
We use diversity as a surrogate of the amount of knowledge obtained by the user while interacting with the system.
Our strategy outperforms several state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-07T13:48:24Z) - Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Beyond Item Dissimilarities: Diversifying by Intent in Recommender Systems [20.04619904064599]
We develop a probabilistic intent-based whole-page diversification framework for the final stage of a recommender system.
Live experiments on a diverse set of intents show that our framework increases Daily Active Users and overall user enjoyment.
arXiv Detail & Related papers (2024-05-20T18:52:33Z) - Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement [5.734747179463411]
We propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL)
In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions.
We propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs.
arXiv Detail & Related papers (2024-04-28T15:13:36Z) - $\alpha$-Fair Contextual Bandits [10.74025233418392]
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection.
One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round.
In this paper, we consider the $alpha$-Fairtextual Con Bandits problem, where the objective is to maximize the global $alpha$-fair utility function.
arXiv Detail & Related papers (2023-10-22T03:42:59Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Selectively Contextual Bandits [11.438194383787604]
We propose a new online learning algorithm that preserves benefits of personalization while increasing the commonality in treatments across users.
Our approach selectively interpolates between a contextual bandit algorithm and a context-free multi-arm bandit.
We evaluate our approach in a classification setting using public datasets and show the benefits of the hybrid policy.
arXiv Detail & Related papers (2022-05-09T19:47:46Z) - Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z) - Coordinated Attacks against Contextual Bandits: Fundamental Limits and
Defense Mechanisms [75.17357040707347]
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in contextual bandits.
The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible.
We show we can achieve an $tildeO(min(S,A)cdot alpha/epsilon2)$ upper-bound, by employing efficient robust mean estimators.
arXiv Detail & Related papers (2022-01-30T01:45:13Z) - BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender
System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.
collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system.
BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z) - Fairness-Aware Explainable Recommendation over Knowledge Graphs [73.81994676695346]
We analyze different groups of users according to their level of activity, and find that bias exists in recommendation performance between different groups.
We show that inactive users may be more susceptible to receiving unsatisfactory recommendations, due to insufficient training data for the inactive users.
We propose a fairness constrained approach via re-ranking to mitigate this problem in the context of explainable recommendation over knowledge graphs.
arXiv Detail & Related papers (2020-06-03T05:04:38Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.