Shaped Policy Search for Evolutionary Strategies using Waypoints
- URL: http://arxiv.org/abs/2105.14639v2
- Date: Mon, 3 Jul 2023 06:09:53 GMT
- Title: Shaped Policy Search for Evolutionary Strategies using Waypoints
- Authors: Kiran Lekkala, Laurent Itti
- Abstract summary: We try to improve exploration in Blackbox methods, particularly Evolution strategies (ES)
We use the state-action pairs from the trajectories obtained during rollouts/evaluations to learn the dynamics of the agent.
The learnt dynamics are then used in the optimization procedure to speed-up training.
- Score: 17.8055398673228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we try to improve exploration in Blackbox methods,
particularly Evolution strategies (ES), when applied to Reinforcement Learning
(RL) problems where intermediate waypoints/subgoals are available. Since
Evolutionary strategies are highly parallelizable, instead of extracting just a
scalar cumulative reward, we use the state-action pairs from the trajectories
obtained during rollouts/evaluations, to learn the dynamics of the agent. The
learnt dynamics are then used in the optimization procedure to speed-up
training. Lastly, we show how our proposed approach is universally applicable
by presenting results from experiments conducted on Carla driving and UR5
robotic arm simulators.
Related papers
- Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong [0.0]
We present Evo-Sparrow, a deep learning-based agent for AI decision-making in Sparrow Mahjong.<n>Our model evaluates board states and optimize decision policies in a non-deterministic, partially observable game environment.
arXiv Detail & Related papers (2025-08-11T00:53:52Z) - Preference-Guided Reinforcement Learning for Efficient Exploration [7.83845308102632]
We introduce LOPE: Learning Online with trajectory Preference guidancE, an end-to-end preference-guided RL framework.
Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance.
LOPE outperforms several state-of-the-art methods regarding convergence rate and overall performance.
arXiv Detail & Related papers (2024-07-09T02:11:12Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Discovering Behavioral Modes in Deep Reinforcement Learning Policies
Using Trajectory Clustering in Latent Space [0.0]
We introduce a new approach for investigating the behavior modes of DRL policies.
Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering.
Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements.
arXiv Detail & Related papers (2024-02-20T11:50:50Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - GPU-Accelerated Policy Optimization via Batch Automatic Differentiation
of Gaussian Processes for Real-World Control [8.720903734757627]
We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass.
We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
arXiv Detail & Related papers (2022-02-28T09:31:15Z) - Improving the Exploration of Deep Reinforcement Learning in Continuous
Domains using Planning for Policy Search [6.088695984060244]
We propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from generated environment interactions.
We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems.
This generates training data that helps PPS discover better policies.
arXiv Detail & Related papers (2020-10-24T20:19:06Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.