Online Batch Decision-Making with High-Dimensional Covariates
- URL: http://arxiv.org/abs/2002.09438v2
- Date: Thu, 27 Feb 2020 20:43:39 GMT
- Title: Online Batch Decision-Making with High-Dimensional Covariates
- Authors: Chi-Hua Wang, Guang Cheng
- Abstract summary: We propose and investigate a class of new algorithms for sequential decision making that interact with textita batch of users simultaneously instead of textita user at each decision epoch.
We deliver a solution, named textitTeamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between stage and selfish stage during the whole decision process.
- Score: 20.06690325969748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and investigate a class of new algorithms for sequential decision
making that interacts with \textit{a batch of users} simultaneously instead of
\textit{a user} at each decision epoch. This type of batch models is motivated
by interactive marketing and clinical trial, where a group of people are
treated simultaneously and the outcomes of the whole group are collected before
the next stage of decision. In such a scenario, our goal is to allocate a batch
of treatments to maximize treatment efficacy based on observed high-dimensional
user covariates. We deliver a solution, named \textit{Teamwork LASSO Bandit
algorithm}, that resolves a batch version of explore-exploit dilemma via
switching between teamwork stage and selfish stage during the whole decision
process. This is made possible based on statistical properties of LASSO
estimate of treatment efficacy that adapts to a sequence of batch observations.
In general, a rate of optimal allocation condition is proposed to delineate the
exploration and exploitation trade-off on the data collection scheme, which is
sufficient for LASSO to identify the optimal treatment for observed user
covariates. An upper bound on expected cumulative regret of the proposed
algorithm is provided.
Related papers
- Optimizing Treatment Allocation in the Presence of Interference [6.404584255185188]
In Influence Maximization (IM), the objective is to select the optimal set of entities in a network to target with a treatment.
In Uplift Modeling (UM), entities are ranked according to estimated treatment effect, and the top entities are allocated treatment.
We show how a causal estimator is trained to predict treatment effects in a network setting and integrated into classic IM algorithms.
arXiv Detail & Related papers (2024-09-30T15:48:22Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - A new fuzzy multi-attribute group decision-making method based on TOPSIS
and optimization models [3.697049647195136]
A new method is proposed for multi-attribute group decision-making in interval-valued intuitionistic fuzzy sets.
By minimizing the sum of differences between individual evaluations and the overallconsistent evaluations of all experts, a new optimization model is established for determining expert weights.
The complete fuzzy multi-attribute group decision-making algorithm is formulated, which can give full play to the advantages of subjective and objective weighting methods.
arXiv Detail & Related papers (2023-11-27T15:41:30Z) - Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study [1.5936659933030128]
Mobile health interventions aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions.
Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts.
The current work addresses this challenge by leveraging count data models into online decision-making approaches.
arXiv Detail & Related papers (2023-11-24T09:02:24Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - TCFimt: Temporal Counterfactual Forecasting from Individual Multiple
Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt)
TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions.
The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Multicriteria Group Decision-Making Under Uncertainty Using Interval
Data and Cloud Models [0.0]
We propose a multicriteria group decision making (MCGDM) algorithm under uncertainty where data is collected as intervals.
The proposed MCGDM algorithm aggregates the data, determines the optimal weights for criteria and ranks alternatives with no further input.
The proposed MCGDM algorithm is implemented on a case study of a cybersecurity problem to illustrate its feasibility and effectiveness.
arXiv Detail & Related papers (2020-12-01T06:34:48Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Robust Grouped Variable Selection Using Distributionally Robust
Optimization [11.383869751239166]
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations.
We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator.
We show that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level.
arXiv Detail & Related papers (2020-06-10T22:32:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.