Uplifting Bandits
- URL: http://arxiv.org/abs/2206.04091v1
- Date: Wed, 8 Jun 2022 18:00:56 GMT
- Title: Uplifting Bandits
- Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton
- Abstract summary: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them.
This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers.
We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline.
- Score: 23.262188897812475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a multi-armed bandit model where the reward is a sum of multiple
random variables, and each action only alters the distributions of some of
them. After each action, the agent observes the realizations of all the
variables. This model is motivated by marketing campaigns and recommender
systems, where the variables represent outcomes on individual customers, such
as clicks. We propose UCB-style algorithms that estimate the uplifts of the
actions over a baseline. We study multiple variants of the problem, including
when the baseline and affected variables are unknown, and prove sublinear
regret bounds for all of these. We also provide lower bounds that justify the
necessity of our modeling assumptions. Experiments on synthetic and real-world
datasets show the benefit of methods that estimate the uplifts over policies
that do not use this structure.
Related papers
- Optimal Classification under Performative Distribution Shift [13.508249764979075]
We propose a novel view in which performative effects are modelled as push-forward measures.
We prove the convexity of the performative risk under a new set of assumptions.
We also establish a connection with adversarially robust classification by reformulating the minimization of the performative risk as a min-max variational problem.
arXiv Detail & Related papers (2024-11-04T12:20:13Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Combinatorial Causal Bandits [25.012065471684025]
In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$.
We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models.
We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2022-06-04T14:14:58Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Contextual Bandits for Advertising Campaigns: A Diffusion-Model
Independent Approach (Extended Version) [73.59962178534361]
We study an influence problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate.
In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaign.
We describe and compare two methods of contextual multi-armed bandits, with upper-confidence bounds on the remaining potential of influencers.
arXiv Detail & Related papers (2022-01-13T22:06:10Z) - Using Non-Stationary Bandits for Learning in Repeated Cournot Games with
Non-Stationary Demand [11.935419090901524]
In this paper, we model repeated Cournot games with non-stationary demand.
The set of arms/actions that an agent can choose from represents discrete production quantities.
We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $epsilon$-greedy' which is remotely based on the well-known $epsilon$-greedy approach.
arXiv Detail & Related papers (2022-01-03T05:51:47Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z) - Adapting Neural Networks for Uplift Models [0.0]
Uplift is estimated using either i) conditional mean regression or ii) transformed outcome regression.
Most existing approaches are adaptations of classification and regression trees for the uplift case.
Here we propose a new method using neural networks.
arXiv Detail & Related papers (2020-10-30T18:42:56Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Influence Diagram Bandits: Variational Thompson Sampling for Structured
Bandit Problems [40.957688390621385]
Our framework captures complex statistical dependencies between actions, latent variables, and observations.
We develop novel online learning algorithms that learn to act efficiently in our models.
arXiv Detail & Related papers (2020-07-09T16:25:40Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.