DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated
and Musculoskeletal Systems
- URL: http://arxiv.org/abs/2206.00484v2
- Date: Thu, 27 Apr 2023 12:53:39 GMT
- Title: DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated
and Musculoskeletal Systems
- Authors: Pierre Schumacher, Daniel H\"aufle, Dieter B\"uchler, Syn Schmitt,
Georg Martius
- Abstract summary: Reinforcement learning on large musculoskeletal models has not been able to show similar performance.
We conjecture that ineffective exploration in large overactuated action spaces is a key problem.
By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems.
- Score: 14.295720603503806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Muscle-actuated organisms are capable of learning an unparalleled diversity
of dexterous movements despite their vast amount of muscles. Reinforcement
learning (RL) on large musculoskeletal models, however, has not been able to
show similar performance. We conjecture that ineffective exploration in large
overactuated action spaces is a key problem. This is supported by the finding
that common exploration noise strategies are inadequate in synthetic examples
of overactuated systems. We identify differential extrinsic plasticity (DEP), a
method from the domain of self-organization, as being able to induce
state-space covering exploration within seconds of interaction. By integrating
DEP into RL, we achieve fast learning of reaching and locomotion in
musculoskeletal systems, outperforming current approaches in all considered
tasks in sample efficiency and robustness.
Related papers
- Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion [16.63152794060493]
We demonstrate that adversarial imitation learning can address this issue by analyzing key problems and providing solutions.
We validate our methodology by learning walking and running gaits on a simulated humanoid model with 16 degrees of freedom and 92 Muscle-Tendon Units.
arXiv Detail & Related papers (2024-07-16T12:27:55Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning [1.9253333342733674]
We investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis.
We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails.
We find that RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the environment.
arXiv Detail & Related papers (2024-04-02T14:42:52Z) - SAR: Generalization of Physiological Agility and Dexterity via
Synergistic Action Representation [10.349135207285464]
We show that modular control via muscle synergies enables organisms to learn muscle control in a simplified and generalizable action space.
We use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks.
We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning.
arXiv Detail & Related papers (2023-07-07T17:07:41Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Demonstration-Guided Reinforcement Learning with Efficient Exploration
for Task Automation of Surgical Robot [54.80144694888735]
We introduce Demonstration-guided EXploration (DEX), an efficient reinforcement learning algorithm.
Our method estimates expert-like behaviors with higher values to facilitate productive interactions.
Experiments on $10$ surgical manipulation tasks from SurRoL, a comprehensive surgical simulation platform, demonstrate significant improvements.
arXiv Detail & Related papers (2023-02-20T05:38:54Z) - Learning with Muscles: Benefits for Data-Efficiency and Robustness in
Anthropomorphic Tasks [13.545245521356218]
Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements.
We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning.
arXiv Detail & Related papers (2022-07-08T15:16:38Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Towards Understanding the Adversarial Vulnerability of Skeleton-based
Action Recognition [133.35968094967626]
Skeleton-based action recognition has attracted increasing attention due to its strong adaptability to dynamic circumstances.
With the help of deep learning techniques, it has also witnessed substantial progress and currently achieved around 90% accuracy in benign environment.
Research on the vulnerability of skeleton-based action recognition under different adversarial settings remains scant.
arXiv Detail & Related papers (2020-05-14T17:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.