DrM: Mastering Visual Reinforcement Learning through Dormant Ratio
Minimization
- URL: http://arxiv.org/abs/2310.19668v2
- Date: Wed, 14 Feb 2024 03:56:25 GMT
- Title: DrM: Mastering Visual Reinforcement Learning through Dormant Ratio
Minimization
- Authors: Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan,
Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze,
Hal Daum\'e III, Furong Huang, Huazhe Xu
- Abstract summary: Visual reinforcement learning has shown promise in continuous control tasks.
Current algorithms are still unsatisfactory in virtually every aspect of the performance.
DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains.
- Score: 43.60484692738197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual reinforcement learning (RL) has shown promise in continuous control
tasks. Despite its progress, current algorithms are still unsatisfactory in
virtually every aspect of the performance such as sample efficiency, asymptotic
performance, and their robustness to the choice of random seeds. In this paper,
we identify a major shortcoming in existing visual RL methods that is the
agents often exhibit sustained inactivity during early training, thereby
limiting their ability to explore effectively. Expanding upon this crucial
observation, we additionally unveil a significant correlation between the
agents' inclination towards motorically inactive exploration and the absence of
neuronal activity within their policy networks. To quantify this inactivity, we
adopt dormant ratio as a metric to measure inactivity in the RL agent's
network. Empirically, we also recognize that the dormant ratio can act as a
standalone indicator of an agent's activity level, regardless of the received
reward signals. Leveraging the aforementioned insights, we introduce DrM, a
method that uses three core mechanisms to guide agents'
exploration-exploitation trade-offs by actively minimizing the dormant ratio.
Experiments demonstrate that DrM achieves significant improvements in sample
efficiency and asymptotic performance with no broken seeds (76 seeds in total)
across three continuous control benchmark environments, including DeepMind
Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first
model-free algorithm that consistently solves tasks in both the Dog and
Manipulator domains from the DeepMind Control Suite as well as three dexterous
hand manipulation tasks without demonstrations in Adroit, all based on pixel
observations.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning [17.437573206368494]
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks.
Current algorithms suffer from low sample efficiency, limiting their practical applicability.
We present MENTOR, a method that improves both the architecture and optimization of RL agents.
arXiv Detail & Related papers (2024-10-19T04:31:54Z) - Deep Reinforcement Learning Empowered Activity-Aware Dynamic Health
Monitoring Systems [69.41229290253605]
Existing monitoring approaches were designed on the premise that medical devices track several health metrics concurrently.
This means that they report all relevant health values within that scope, which can result in excess resource use and the gathering of extraneous data.
We propose Dynamic Activity-Aware Health Monitoring strategy (DActAHM) for striking a balance between optimal monitoring performance and cost efficiency.
arXiv Detail & Related papers (2024-01-19T16:26:35Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Task-Agnostic Continual Reinforcement Learning: Gaining Insights and
Overcoming Challenges [27.474011433615317]
Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks.
We investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents.
arXiv Detail & Related papers (2022-05-28T17:59:00Z) - Continuous Control with Action Quantization from Demonstrations [35.44893918778709]
In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems.
We propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces.
We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data --demonstrations of a human playing in an environment but not solving any specific task-- and Imitation Learning.
arXiv Detail & Related papers (2021-10-19T17:59:04Z) - Continuous Decoding of Daily-Life Hand Movements from Forearm Muscle
Activity for Enhanced Myoelectric Control of Hand Prostheses [78.120734120667]
We introduce a novel method, based on a long short-term memory (LSTM) network, to continuously map forearm EMG activity onto hand kinematics.
Ours is the first reported work on the prediction of hand kinematics that uses this challenging dataset.
Our results suggest that the presented method is suitable for the generation of control signals for the independent and proportional actuation of the multiple DOFs of state-of-the-art hand prostheses.
arXiv Detail & Related papers (2021-04-29T00:11:32Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.