Online Batch Decision-Making with High-Dimensional Covariates
- URL: http://arxiv.org/abs/2002.09438v2
- Date: Thu, 27 Feb 2020 20:43:39 GMT
- Title: Online Batch Decision-Making with High-Dimensional Covariates
- Authors: Chi-Hua Wang, Guang Cheng
- Abstract summary: We propose and investigate a class of new algorithms for sequential decision making that interact with textita batch of users simultaneously instead of textita user at each decision epoch.
We deliver a solution, named textitTeamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between stage and selfish stage during the whole decision process.
- Score: 20.06690325969748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and investigate a class of new algorithms for sequential decision
making that interacts with \textit{a batch of users} simultaneously instead of
\textit{a user} at each decision epoch. This type of batch models is motivated
by interactive marketing and clinical trial, where a group of people are
treated simultaneously and the outcomes of the whole group are collected before
the next stage of decision. In such a scenario, our goal is to allocate a batch
of treatments to maximize treatment efficacy based on observed high-dimensional
user covariates. We deliver a solution, named \textit{Teamwork LASSO Bandit
algorithm}, that resolves a batch version of explore-exploit dilemma via
switching between teamwork stage and selfish stage during the whole decision
process. This is made possible based on statistical properties of LASSO
estimate of treatment efficacy that adapts to a sequence of batch observations.
In general, a rate of optimal allocation condition is proposed to delineate the
exploration and exploitation trade-off on the data collection scheme, which is
sufficient for LASSO to identify the optimal treatment for observed user
covariates. An upper bound on expected cumulative regret of the proposed
algorithm is provided.
Related papers
- Pareto Optimization with Robust Evaluation for Noisy Subset Selection [34.83487850400559]
Subset selection is a fundamental problem in optimization, which has a wide range of applications such as influence and sparse regression.
Previous algorithms, including the greedy algorithm and evolutionary evolutionary POSS, either struggle in noisy environments or consume excessive computational resources.
We propose a novel approach based on Pareto Optimization with Robust Evaluation for noisy subset selection (PORE), which maximizes a robust evaluation function and minimizes the subset size simultaneously.
arXiv Detail & Related papers (2025-01-12T14:04:20Z) - Uplift modeling with continuous treatments: A predict-then-optimize approach [4.132346971686944]
The goal of uplift modeling is to recommend actions that optimize specific outcomes by determining which entities should receive treatment.
While uplift modeling typically focuses on binary treatments, many real-world applications are characterized by continuousvalued treatments.
This paper presents a predictthenoptimize framework to allow for continuous treatments in uplift modeling.
arXiv Detail & Related papers (2024-12-12T12:43:42Z) - Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control [0.0]
Traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem.
We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure.
We show that our approach achieves baseline performance with approximately half the resources and outperforms the baseline in terms of final closed-loop performance.
arXiv Detail & Related papers (2024-12-03T12:38:53Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - A new fuzzy multi-attribute group decision-making method based on TOPSIS
and optimization models [3.697049647195136]
A new method is proposed for multi-attribute group decision-making in interval-valued intuitionistic fuzzy sets.
By minimizing the sum of differences between individual evaluations and the overallconsistent evaluations of all experts, a new optimization model is established for determining expert weights.
The complete fuzzy multi-attribute group decision-making algorithm is formulated, which can give full play to the advantages of subjective and objective weighting methods.
arXiv Detail & Related papers (2023-11-27T15:41:30Z) - Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study [1.5936659933030128]
Mobile health interventions aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions.
Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts.
The current work addresses this challenge by leveraging count data models into online decision-making approaches.
arXiv Detail & Related papers (2023-11-24T09:02:24Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - TCFimt: Temporal Counterfactual Forecasting from Individual Multiple
Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt)
TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions.
The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.