Regret Analysis of the Stochastic Direct Search Method for Blind
Resource Allocation
- URL: http://arxiv.org/abs/2210.05222v1
- Date: Tue, 11 Oct 2022 07:40:45 GMT
- Title: Regret Analysis of the Stochastic Direct Search Method for Blind
Resource Allocation
- Authors: Juliette Achddou (PSL, DI-ENS), Olivier Cappe (CNRS, DI-ENS, PSL),
Aur\'elien Garivier (UMPA-ENSL, CNRS)
- Abstract summary: We study direct search (aka pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise.
We provide a regret upper-bound of the order of T 2/3 in the general case.
Our mathematical analysis also establishes, as a by-product, time-independent regret bounds in the deterministic, unconstrained case.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by programmatic advertising optimization, we consider the task of
sequentially allocating budget across a set of resources. At every time step, a
feasible allocation is chosen and only a corresponding random return is
observed. The goal is to maximize the cumulative expected sum of returns. This
is a realistic model for budget allocation across subdivisions of marketing
campaigns, when the objective is to maximize the number of conversions. We
study direct search (aka pattern search) methods for linearly constrained and
derivative-free optimization in the presence of noise. Those algorithms are
easy to implement and particularly suited to constrained optimization. They
have not yet been analyzed from the perspective of cumulative regret. We
provide a regret upper-bound of the order of T 2/3 in the general case. Our
mathematical analysis also establishes, as a by-product, time-independent
regret bounds in the deterministic, unconstrained case. We also propose an
improved version of the method relying on sequential tests to accelerate the
identification of descent directions.
Related papers
- Constrained Online Two-stage Stochastic Optimization: Algorithm with
(and without) Predictions [19.537289123577022]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2024-01-02T07:46:33Z) - Random Exploration in Bayesian Optimization: Order-Optimal Regret and
Computational Efficiency [18.17090625880964]
We study the methodology of exploring the domain using random samples drawn from a distribution.
We show that this random exploration approach achieves the optimal error rates.
arXiv Detail & Related papers (2023-10-23T20:30:44Z) - Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning [1.994307489466967]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2023-02-02T10:33:09Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Alternating Direction Method of Multipliers for Quantization [15.62692130672419]
We study the performance of the Alternating Direction Method of Multipliers for Quantization ($texttADMM-Q$) algorithm.
We develop a few variants of $texttADMM-Q$ that can handle inexact update rules.
We empirically evaluate the efficacy of our proposed approaches.
arXiv Detail & Related papers (2020-09-08T01:58:02Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.