Related papers: Uplifting Bandits

Uplifting Bandits

URL: http://arxiv.org/abs/2206.04091v1
Date: Wed, 8 Jun 2022 18:00:56 GMT
Title: Uplifting Bandits
Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton
Abstract summary: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline.
Score: 23.262188897812475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure.

Related papers

Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures [7.959080260803575]
Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model. We propose a novel strategy to find the counterfactual for an ensemble of models using the perspective of entropic risk measure. We study the trade-off between the cost (effort) for the counterfactual and its validity for an ensemble by varying degrees of risk aversion.
arXiv Detail & Related papers (2025-03-11T00:25:28Z)
Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied. We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z)
Optimal Classification under Performative Distribution Shift [13.508249764979075]
We propose a novel view in which performative effects are modelled as push-forward measures. We prove the convexity of the performative risk under a new set of assumptions. We also establish a connection with adversarially robust classification by reformulating the minimization of the performative risk as a min-max variational problem.
arXiv Detail & Related papers (2024-11-04T12:20:13Z)
Enhancement of Approximation Spaces by the Use of Primals and Neighborhood [0.0]
We introduce four new generalized rough set models that draw inspiration from "neighborhoods and primals" We claim that the current models can preserve nearly all significant aspects associated with the rough set model. We also demonstrate that the new strategy we define for our everyday health-related problem yields more accurate findings.
arXiv Detail & Related papers (2024-10-23T18:49:13Z)
Combinatorial Causal Bandits [25.012065471684025]
In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2022-06-04T14:14:58Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version) [73.59962178534361]
We study an influence problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate. In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaign. We describe and compare two methods of contextual multi-armed bandits, with upper-confidence bounds on the remaining potential of influencers.
arXiv Detail & Related papers (2022-01-13T22:06:10Z)
Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand [11.935419090901524]
In this paper, we model repeated Cournot games with non-stationary demand. The set of arms/actions that an agent can choose from represents discrete production quantities. We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $epsilon$-greedy' which is remotely based on the well-known $epsilon$-greedy approach.
arXiv Detail & Related papers (2022-01-03T05:51:47Z)
A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling. We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk. We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z)
Adapting Neural Networks for Uplift Models [0.0]
Uplift is estimated using either i) conditional mean regression or ii) transformed outcome regression. Most existing approaches are adaptations of classification and regression trees for the uplift case. Here we propose a new method using neural networks.
arXiv Detail & Related papers (2020-10-30T18:42:56Z)
Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.