Online Learning in Dynamically Changing Environments
- URL: http://arxiv.org/abs/2302.00103v1
- Date: Tue, 31 Jan 2023 21:10:03 GMT
- Title: Online Learning in Dynamically Changing Environments
- Authors: Changlong Wu, Ananth Grama, Wojciech Szpankowski
- Abstract summary: We study the problem of online learning and online regret when samples are drawn from a general unknown non-stationary process.
We prove a tight (upto $sqrtlog T$ factor) bound $O(sqrtKTcdotmathsfVC(mathcalH)log T)$ for the expected worst case regret of any finite VC-dimensional class.
We extend these results to general smooth adversary processes with unknown reference measure.
- Score: 11.731001328350983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of online learning and online regret minimization when
samples are drawn from a general unknown non-stationary process. We introduce
the concept of a dynamic changing process with cost $K$, where the conditional
marginals of the process can vary arbitrarily, but that the number of different
conditional marginals is bounded by $K$ over $T$ rounds. For such processes we
prove a tight (upto $\sqrt{\log T}$ factor) bound
$O(\sqrt{KT\cdot\mathsf{VC}(\mathcal{H})\log T})$ for the expected worst case
regret of any finite VC-dimensional class $\mathcal{H}$ under absolute loss
(i.e., the expected miss-classification loss). We then improve this bound for
general mixable losses, by establishing a tight (up to $\log^3 T$ factor)
regret bound $O(K\cdot\mathsf{VC}(\mathcal{H})\log^3 T)$. We extend these
results to general smooth adversary processes with unknown reference measure by
showing a sub-linear regret bound for $1$-dimensional threshold functions under
a general bounded convex loss. Our results can be viewed as a first step
towards regret analysis with non-stationary samples in the distribution blind
(universal) regime. This also brings a new viewpoint that shifts the study of
complexity of the hypothesis classes to the study of the complexity of
processes generating data.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge [0.704590071265998]
We study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently.
We present an optimistic Q-learning algorithm that achieves $tildemathcalO(textPoly(H)sqrtSAT)$ regret under perfect knowledge of $f$.
arXiv Detail & Related papers (2023-12-19T19:53:58Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.