Refined Continuous Control of DDPG Actors via Parametrised Activation
- URL: http://arxiv.org/abs/2006.02818v1
- Date: Thu, 4 Jun 2020 12:27:46 GMT
- Title: Refined Continuous Control of DDPG Actors via Parametrised Activation
- Authors: Mohammed Hossny, Julie Iskander, Mohammed Attia, Khaled Saleh
- Abstract summary: The proposed method allows the reinforcement learning actor to produce more robust actions that accommodate the discrepancy in the actuators' response functions.
This is particularly useful for real life scenarios where actuators exhibit different response functions depending on the load and the interaction with the environment.
- Score: 3.32399229114419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose enhancing actor-critic reinforcement learning
agents by parameterising the final actor layer which produces the actions in
order to accommodate the behaviour discrepancy of different actuators, under
different load conditions during interaction with the environment. We propose
branching the action producing layer in the actor to learn the tuning parameter
controlling the activation layer (e.g. Tanh and Sigmoid). The learned
parameters are then used to create tailored activation functions for each
actuator. We ran experiments on three OpenAI Gym environments, i.e.
Pendulum-v0, LunarLanderContinuous-v2 and BipedalWalker-v2. Results have shown
an average of 23.15% and 33.80% increase in total episode reward of the
LunarLanderContinuous-v2 and BipedalWalker-v2 environments, respectively. There
was no significant improvement in Pendulum-v0 environment but the proposed
method produces a more stable actuation signal compared to the state-of-the-art
method. The proposed method allows the reinforcement learning actor to produce
more robust actions that accommodate the discrepancy in the actuators' response
functions. This is particularly useful for real life scenarios where actuators
exhibit different response functions depending on the load and the interaction
with the environment. This also simplifies the transfer learning problem by
fine tuning the parameterised activation layers instead of retraining the
entire policy every time an actuator is replaced. Finally, the proposed method
would allow better accommodation to biological actuators (e.g. muscles) in
biomechanical systems.
Related papers
- Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment [92.48012013825988]
We study the problem of adapting on-the-fly to novel scenarios during deployment.
Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors.
We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped.
arXiv Detail & Related papers (2023-11-02T08:22:28Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization.
We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively.
Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z) - Moving Forward by Moving Backward: Embedding Action Impact over Action
Semantics [57.671493865825255]
We propose to model the impact of actions on-the-fly using latent embeddings.
By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy.
We show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space.
arXiv Detail & Related papers (2023-04-24T17:35:47Z) - OSCAR: Data-Driven Operational Space Control for Adaptive and Robust
Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation.
We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors.
We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z) - CARL: Controllable Agent with Reinforcement Learning for Quadruped
Locomotion [0.0]
We present CARL, a quadruped agent that can be controlled with high-level directives and react naturally to dynamic environments.
We use Generative Adrial Networks to adapt high-level controls, such as speed and heading, to action distributions that correspond to the original animations.
Further fine-tuning through the deep reinforcement learning enables the agent to recover from unseen external perturbations while producing smooth transitions.
arXiv Detail & Related papers (2020-05-07T07:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.