Related papers: Regret Analysis of the Stochastic Direct Search Method for Blind Resource Allocation

Regret Analysis of the Stochastic Direct Search Method for Blind Resource Allocation

URL: http://arxiv.org/abs/2210.05222v1
Date: Tue, 11 Oct 2022 07:40:45 GMT
Title: Regret Analysis of the Stochastic Direct Search Method for Blind Resource Allocation
Authors: Juliette Achddou (PSL, DI-ENS), Olivier Cappe (CNRS, DI-ENS, PSL), Aur\'elien Garivier (UMPA-ENSL, CNRS)
Abstract summary: We study direct search (aka pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise. We provide a regret upper-bound of the order of T 2/3 in the general case. Our mathematical analysis also establishes, as a by-product, time-independent regret bounds in the deterministic, unconstrained case.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, when the objective is to maximize the number of conversions. We study direct search (aka pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise. Those algorithms are easy to implement and particularly suited to constrained optimization. They have not yet been analyzed from the perspective of cumulative regret. We provide a regret upper-bound of the order of T 2/3 in the general case. Our mathematical analysis also establishes, as a by-product, time-independent regret bounds in the deterministic, unconstrained case. We also propose an improved version of the method relying on sequential tests to accelerate the identification of descent directions.

Related papers

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees [39.287388288477096]
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. We propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces.
arXiv Detail & Related papers (2022-10-04T00:11:38Z)
Budgeted Classification with Rejection: An Evolutionary Method with Multiple Objectives [0.0]
Budgeted, sequential classifiers (BSCs) process inputs through a sequence of partial feature acquisition and evaluation steps. This allows for an efficient evaluation of inputs that prevents unneeded feature acquisition. We propose a problem-specific genetic algorithm to build budgeted, sequential classifiers with confidence-based reject options.
arXiv Detail & Related papers (2022-05-01T22:05:16Z)
Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially. We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Variance Penalized On-Policy and Off-Policy Actor-Critic [60.06593931848165]
We propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return. Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.
arXiv Detail & Related papers (2021-02-03T10:06:16Z)
Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods. The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints. This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Stochastic Adaptive Line Search for Differentially Private Optimization [6.281099620056346]
The performance of private gradient-based optimization algorithms is highly dependent on the choice step size (or learning rate) We introduce a variant of classic non-trivial line search algorithm that adjusts the privacy gradient according to the reliability of noisy gradient. We show that the adaptively chosen step sizes allow the proposed algorithm to efficiently use the privacy budget and show competitive performance against existing private gradients.
arXiv Detail & Related papers (2020-08-18T15:18:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.