Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
- URL: http://arxiv.org/abs/2206.02326v2
- Date: Mon, 12 Jun 2023 01:51:06 GMT
- Title: Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
- Authors: Kefan Dong, Tengyu Ma
- Abstract summary: In this paper, we design the first instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions.
Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits and prior works on linear bandits.
- Score: 35.76184529520015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Past research on interactive decision making problems (bandits, reinforcement
learning, etc.) mostly focuses on the minimax regret that measures the
algorithm's performance on the hardest instance. However, an ideal algorithm
should adapt to the complexity of a particular problem instance and incur
smaller regrets on easy instances than worst-case instances. In this paper, we
design the first asymptotic instance-optimal algorithm for general interactive
decision making problems with finite number of decisions under mild conditions.
On every instance $f$, our algorithm outperforms all consistent algorithms
(those achieving non-trivial regrets on all instances), and has asymptotic
regret $\mathcal{C}(f) \ln n$, where $\mathcal{C}(f)$ is an exact
characterization of the complexity of $f$. The key step of the algorithm
involves hypothesis testing with active data collection. It computes the most
economical decisions with which the algorithm collects observations to test
whether an estimated instance is indeed correct; thus, the complexity
$\mathcal{C}(f)$ is the minimum cost to test the instance $f$ against other
instances. Our results, instantiated on concrete problems, recover the
classical gap-dependent bounds for multi-armed bandits [Lai and Robbins, 1985]
and prior works on linear bandits [Lattimore and Szepesvari, 2017], and improve
upon the previous best instance-dependent upper bound [Xu et al., 2021] for
reinforcement learning.
Related papers
- The Limits of Assumption-free Tests for Algorithm Performance [6.7171902258864655]
How well does an algorithm perform at a given modeling task, and which algorithm performs best?
We make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$?
arXiv Detail & Related papers (2024-02-12T03:19:30Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Instance-Dependent Regret Analysis of Kernelized Bandits [19.252319300590653]
We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle.
We derive emphinstance-dependent regret lower bounds for algorithms with uniformly(over the function class) vanishing normalized cumulative regret.
arXiv Detail & Related papers (2022-03-12T00:53:59Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point
Solving [21.496093093434357]
We develop a new simple regret minimizer, the Conic Blackwell Algorithm$+$ (CBA$+$)
We show how to implement CBA$+$ for the simplex, $ell_p$ norm balls, and ellipsoidal confidence regions in the simplex.
We present numerical experiments for solving matrix games and distributionally robust optimization problems.
arXiv Detail & Related papers (2021-05-27T14:50:31Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Learning Sparse Classifiers: Continuous and Mixed Integer Optimization
Perspectives [10.291482850329892]
Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems.
We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$.
In addition, we present new estimation error bounds for $ell$-regularizeds.
arXiv Detail & Related papers (2020-01-17T18:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.