NeoRL: Efficient Exploration for Nonepisodic RL
- URL: http://arxiv.org/abs/2406.01175v4
- Date: Tue, 11 Feb 2025 13:35:23 GMT
- Title: NeoRL: Efficient Exploration for Nonepisodic RL
- Authors: Bhavya Sukhija, Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause,
- Abstract summary: We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems.<n>We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty.
- Score: 50.67294735645895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $O(\Gamma_T \sqrt{T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.
Related papers
- Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer [72.0797062226335]
We build discounted-to-dynamic dynamic regret minimization methods.<n>We focus on two representative curved losses: linear regression and logistic regression.<n>Our method yields new dynamic regret guarantees for online logistic regression.
arXiv Detail & Related papers (2026-02-09T08:10:53Z) - Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions [4.605026772972944]
We introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL)<n>On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x.
arXiv Detail & Related papers (2025-12-24T06:00:51Z) - SOMBRL: Scalable and Optimistic Model-Based RL [78.3360288726531]
We propose an approach based on the principle of optimism in the face of uncertainty.<n>We show that SOMBRL offers a flexible and scalable solution for principled exploration.<n>We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art.
arXiv Detail & Related papers (2025-11-25T08:39:21Z) - Sample-efficient and Scalable Exploration in Continuous-Time RL [39.99126118024949]
We study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations.<n>We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE.<n>This yields a scalable and sample-efficient approach to continuous-time model-based RL.
arXiv Detail & Related papers (2025-10-28T14:54:12Z) - Statistical and Algorithmic Foundations of Reinforcement Learning [45.707617428078585]
sequential learning (RL) has received a flurry of attention in recent years.<n>We aim to introduce several important developments in RL, highlighting the connections between new ideas classical topics.
arXiv Detail & Related papers (2025-07-19T02:42:41Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [93.00629872970364]
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks.<n>We introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions.<n>We study whether difficult problems -- those yielding no RL signals and mixed-quality reasoning traces -- can still be effectively used for training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs [15.033410073144939]
A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL.
We develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without single-policy concentrability.
arXiv Detail & Related papers (2024-08-08T15:26:18Z) - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.
AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z) - Look Beneath the Surface: Exploiting Fundamental Symmetry for
Sample-Efficient Offline RL [29.885978495034703]
offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets.
However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets.
We provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets.
arXiv Detail & Related papers (2023-06-07T07:51:05Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Hyperbolic Deep Reinforcement Learning [8.983647543608226]
We propose a new class of deep reinforcement learning algorithms that model latent representations in hyperbolic space.
We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks.
arXiv Detail & Related papers (2022-10-04T12:03:04Z) - Reinforcement Learning for Classical Planning: Viewing Heuristics as
Dense Reward Generators [54.6441336539206]
We propose to leverage domain-independent functions commonly used in the classical planning literature to improve the sample efficiency of RL.
These classicals act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals.
We demonstrate on several classical planning domains that using classical logics for RL allows for good sample efficiency compared to sparse-reward RL.
arXiv Detail & Related papers (2021-09-30T03:36:01Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Maximum Entropy RL (Provably) Solves Some Robust RL Problems [94.80212602202518]
We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function.
Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
arXiv Detail & Related papers (2021-03-10T18:45:48Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.