Sharp Analysis of Smoothed Bellman Error Embedding
- URL: http://arxiv.org/abs/2007.03749v1
- Date: Tue, 7 Jul 2020 19:27:09 GMT
- Title: Sharp Analysis of Smoothed Bellman Error Embedding
- Authors: Ahmed Touati and Pascal Vincent
- Abstract summary: We study the theoretical behavior of SBEED in batch-mode reinforcement learning.
We prove a near-optimal performance guarantee that depends on the representation power of the used function classes.
- Score: 17.296084954104415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed},
known as SBEED, was proposed as a provably convergent reinforcement learning
algorithm with general nonlinear function approximation. It has been
successfully implemented with neural networks and achieved strong empirical
results. In this work, we study the theoretical behavior of SBEED in batch-mode
reinforcement learning. We prove a near-optimal performance guarantee that
depends on the representation power of the used function classes and a tight
notion of the distribution shift. Our results improve upon prior guarantees for
SBEED in ~\citet{dai2018sbeed} in terms of the dependence on the planning
horizon and on the sample size. Our analysis builds on the recent work of
~\citet{Xie2020} which studies a related algorithm MSBO, that could be
interpreted as a \textit{non-smooth} counterpart of SBEED.
Related papers
- Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities [4.842973374883628]
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored.
We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral.
Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel search algorithm.
arXiv Detail & Related papers (2024-07-04T17:33:15Z) - Regularized Q-Learning with Linear Function Approximation [3.10770247120758]
We consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - SHOT: Suppressing the Hessian along the Optimization Trajectory for
Gradient-Based Meta-Learning [28.26143547479141]
We introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory)
SHOT does not increase the computational complexity of the baseline model much.
We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline.
arXiv Detail & Related papers (2023-10-04T11:43:08Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction.
We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts.
Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z) - Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph
Learning [37.89640056739607]
Two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) are proven to be (linearly) convergent.
We provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching.
arXiv Detail & Related papers (2022-05-17T06:26:54Z) - Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network.
We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.