Related papers: Refined Continuous Control of DDPG Actors via Parametrised Activation

Refined Continuous Control of DDPG Actors via Parametrised Activation

URL: http://arxiv.org/abs/2006.02818v1
Date: Thu, 4 Jun 2020 12:27:46 GMT
Title: Refined Continuous Control of DDPG Actors via Parametrised Activation
Authors: Mohammed Hossny, Julie Iskander, Mohammed Attia, Khaled Saleh
Abstract summary: The proposed method allows the reinforcement learning actor to produce more robust actions that accommodate the discrepancy in the actuators' response functions. This is particularly useful for real life scenarios where actuators exhibit different response functions depending on the load and the interaction with the environment.
Score: 3.32399229114419
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose enhancing actor-critic reinforcement learning agents by parameterising the final actor layer which produces the actions in order to accommodate the behaviour discrepancy of different actuators, under different load conditions during interaction with the environment. We propose branching the action producing layer in the actor to learn the tuning parameter controlling the activation layer (e.g. Tanh and Sigmoid). The learned parameters are then used to create tailored activation functions for each actuator. We ran experiments on three OpenAI Gym environments, i.e. Pendulum-v0, LunarLanderContinuous-v2 and BipedalWalker-v2. Results have shown an average of 23.15% and 33.80% increase in total episode reward of the LunarLanderContinuous-v2 and BipedalWalker-v2 environments, respectively. There was no significant improvement in Pendulum-v0 environment but the proposed method produces a more stable actuation signal compared to the state-of-the-art method. The proposed method allows the reinforcement learning actor to produce more robust actions that accommodate the discrepancy in the actuators' response functions. This is particularly useful for real life scenarios where actuators exhibit different response functions depending on the load and the interaction with the environment. This also simplifies the transfer learning problem by fine tuning the parameterised activation layers instead of retraining the entire policy every time an actuator is replaced. Finally, the proposed method would allow better accommodation to biological actuators (e.g. muscles) in biomechanical systems.

Related papers

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation [82.73098356401725]
We propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions. Each character has its own reaction policy as its "brain", enabling them to interact like real humans in a streaming manner. Our approach can be controlled by sparse signals, making it well-suited for VR and other online interactive environments.
arXiv Detail & Related papers (2025-02-27T18:40:30Z)
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors [8.761404991620285]
Activation intervention has emerged as an effective and economical method to modify the behavior of large language models (LLMs) We propose Semantics-Adaptive Dynamic Intervention (SADI), a novel method that constructs a dynamic steering vector to intervene model activations at inference time. Experimental results show that SADI outperforms established baselines by substantial margins, improving task performance without training.
arXiv Detail & Related papers (2024-10-16T06:58:49Z)
Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment [92.48012013825988]
We study the problem of adapting on-the-fly to novel scenarios during deployment. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors. We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped.
arXiv Detail & Related papers (2023-11-02T08:22:28Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization. We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively. Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z)
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics [57.671493865825255]
We propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy. We show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space.
arXiv Detail & Related papers (2023-04-24T17:35:47Z)
OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion [0.0]
We present CARL, a quadruped agent that can be controlled with high-level directives and react naturally to dynamic environments. We use Generative Adrial Networks to adapt high-level controls, such as speed and heading, to action distributions that correspond to the original animations. Further fine-tuning through the deep reinforcement learning enables the agent to recover from unseen external perturbations while producing smooth transitions.
arXiv Detail & Related papers (2020-05-07T07:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.