Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
- URL: http://arxiv.org/abs/2507.04606v1
- Date: Mon, 07 Jul 2025 01:54:05 GMT
- Title: Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
- Authors: Aman Mehra, Alexandre Capone, Jeff Schneider,
- Abstract summary: Expert demonstrations and simulators can reset to arbitrary states.<n>We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning.
- Score: 50.44719434877687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A long-standing problem in online reinforcement learning (RL) is of ensuring sample efficiency, which stems from an inability to explore environments efficiently. Most attempts at efficient exploration tackle this problem in a setting where learning begins from scratch, without prior information available to bootstrap learning. However, such approaches fail to leverage expert demonstrations and simulators that can reset to arbitrary states. These affordances are valuable resources that offer enormous potential to guide exploration and speed up learning. In this paper, we explore how a small number of expert demonstrations and a simulator allowing arbitrary resets can accelerate learning during online RL. We find that training with a suitable choice of an auxiliary start state distribution that may differ from the true start state distribution of the underlying Markov Decision Process can significantly improve sample efficiency. We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning. By using episode length information as a way to operationalize this notion, we demonstrate state-of-the-art sample efficiency on a sparse-reward hard-exploration environment.
Related papers
- Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning [0.0]
We propose to use Adversarial Estimates as a new, simple and efficient approach to mitigate this problem.<n>Our approach leverages latent similarity search from a small set of human-collected trajectories to boost learning.<n>The results of our study show algorithms trained with Adversarial Estimates converge faster than their original version.
arXiv Detail & Related papers (2025-02-03T17:41:02Z) - Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations [23.15178050525514]
We propose a two-stage method dubbed as Skill-enhanced Reinforcement Learning Acceleration (SeRLA)<n>SeRLA introduces a skill-level adversarial Positive-Unlabeled (PU) learning model to extract useful skill prior knowledge.<n>It then deploys a skill-based soft actor-critic algorithm to leverage this acquired prior knowledge in the downstream online RL stage.
arXiv Detail & Related papers (2024-12-09T04:58:14Z) - A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards.
Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations.
This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.