Instance-Optimality in Interactive Decision Making: Toward a
Non-Asymptotic Theory
- URL: http://arxiv.org/abs/2304.12466v1
- Date: Mon, 24 Apr 2023 21:51:58 GMT
- Title: Instance-Optimality in Interactive Decision Making: Toward a
Non-Asymptotic Theory
- Authors: Andrew Wagenmaker, Dylan J. Foster
- Abstract summary: We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms.
In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation.
- Score: 30.061707627742766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the development of adaptive, instance-dependent algorithms for
interactive decision making (bandits, reinforcement learning, and beyond) that,
rather than only performing well in the worst case, adapt to favorable
properties of real-world instances for improved performance. We aim for
instance-optimality, a strong notion of adaptivity which asserts that, on any
particular problem instance, the algorithm under consideration outperforms all
consistent algorithms. Instance-optimality enjoys a rich asymptotic theory
originating from the work of
\citet{lai1985asymptotically,graves1997asymptotically}, but non-asymptotic
guarantees have remained elusive outside of certain special cases. Even for
problems as simple as tabular reinforcement learning, existing algorithms do
not attain instance-optimal performance until the number of rounds of
interaction is doubly exponential in the number of states.
In this paper, we take the first step toward developing a non-asymptotic
theory of instance-optimal decision making with general function approximation.
We introduce a new complexity measure, the Allocation-Estimation Coefficient
(AEC), and provide a new algorithm, $\mathsf{AE}^2$, which attains
non-asymptotic instance-optimal performance at a rate controlled by the AEC.
Our results recover the best known guarantees for well-studied problems such as
finite-armed and linear bandits and, when specialized to tabular reinforcement
learning, attain the first instance-optimal regret bounds with polynomial
dependence on all problem parameters, improving over prior work exponentially.
We complement these results with lower bounds that show that i) existing
notions of statistical complexity are insufficient to derive non-asymptotic
guarantees, and ii) under certain technical conditions, boundedness of the AEC
is necessary to learn an instance-optimal allocation of decisions in finite
time.
Related papers
- Asymptotic and Non-Asymptotic Convergence of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis [17.34603953600226]
We introduce an innovative comprehensive analysis of Ada, filling the existing gaps in the literature.
We derive almost sure and mean-prodd in expectation for AdaGrad.
In addition, we demonstrate the average non-a-bpt-d gradient measured by the rate-prod in expectation, which is potentially independent of the existing results.
arXiv Detail & Related papers (2024-09-08T08:29:51Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Non-asymptotic estimates for TUSLA algorithm for non-convex learning
with applications to neural networks with ReLU activation function [3.5044892799305956]
We provide a non-asymptotic analysis for the tamed un-adjusted Langevin algorithm (TUSLA) introduced in Lovas et al.
In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wassersteinstein-1-2.
We show that the TUSLA algorithm converges rapidly to the optimal solution.
arXiv Detail & Related papers (2021-07-19T07:13:02Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.