Related papers: Sharp Analysis of Smoothed Bellman Error Embedding

Sharp Analysis of Smoothed Bellman Error Embedding

URL: http://arxiv.org/abs/2007.03749v1
Date: Tue, 7 Jul 2020 19:27:09 GMT
Title: Sharp Analysis of Smoothed Bellman Error Embedding
Authors: Ahmed Touati and Pascal Vincent
Abstract summary: We study the theoretical behavior of SBEED in batch-mode reinforcement learning. We prove a near-optimal performance guarantee that depends on the representation power of the used function classes.
Score: 17.296084954104415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully implemented with neural networks and achieved strong empirical results. In this work, we study the theoretical behavior of SBEED in batch-mode reinforcement learning. We prove a near-optimal performance guarantee that depends on the representation power of the used function classes and a tight notion of the distribution shift. Our results improve upon prior guarantees for SBEED in ~\citet{dai2018sbeed} in terms of the dependence on the planning horizon and on the sample size. Our analysis builds on the recent work of ~\citet{Xie2020} which studies a related algorithm MSBO, that could be interpreted as a \textit{non-smooth} counterpart of SBEED.

Related papers

BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z)
Stochastic Forward-Forward Learning through Representational Dimensionality Compression [7.847900313045352]
Forward-Forward (FF) algorithm provides a bottom-up alternative to backpropagation (BP) for training neural networks.<n>We propose a novel goodness function termed dimensionality compression that uses the effective dimensionality (ED) of fluctuating neural responses to incorporate second-order statistical structure.
arXiv Detail & Related papers (2025-05-22T13:19:29Z)
Offline Reinforcement Learning via Inverse Optimization [3.0586855806896054]
We propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces. To mitigate the distribution shift commonly observed in ORL problems, we employ a robust and non-causal Model Predictive Control expert. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation.
arXiv Detail & Related papers (2025-02-27T12:11:44Z)
Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning. We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition. We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z)
Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities [4.842973374883628]
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel search algorithm.
arXiv Detail & Related papers (2024-07-04T17:33:15Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning [28.26143547479141]
We introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) SHOT does not increase the computational complexity of the baseline model much. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline.
arXiv Detail & Related papers (2023-10-04T11:43:08Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction. We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts. Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z)
Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Learning [37.89640056739607]
Two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) are proven to be (linearly) convergent. We provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching.
arXiv Detail & Related papers (2022-05-17T06:26:54Z)
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network. We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.