Chaos-based reinforcement learning with TD3
- URL: http://arxiv.org/abs/2405.09086v1
- Date: Wed, 15 May 2024 04:47:31 GMT
- Title: Chaos-based reinforcement learning with TD3
- Authors: Toshitaka Matsuki, Yusuke Sakemi, Kazuyuki Aihara,
- Abstract summary: Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration.
This study introduced Twin Delayed Deep Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep reinforcement learning algorithms.
- Score: 3.04503073434724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration. This approach offers a model for considering how the biological brain can create variability in its behavior and learn in an exploratory manner. At the same time, it is a learning model that has the ability to automatically switch between exploration and exploitation modes and the potential to realize higher explorations that reflect what it has learned so far. However, the learning algorithms in CBRL have not been well-established in previous studies and have yet to incorporate recent advances in reinforcement learning. This study introduced Twin Delayed Deep Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep reinforcement learning algorithms that can treat deterministic and continuous action spaces, to CBRL. The validation results provide several insights. First, TD3 works as a learning algorithm for CBRL in a simple goal-reaching task. Second, CBRL agents with TD3 can autonomously suppress their exploratory behavior as learning progresses and resume exploration when the environment changes. Finally, examining the effect of the agent's chaoticity on learning shows that extremely strong chaos negatively impacts the flexible switching between exploration and exploitation.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Quick and Accurate Affordance Learning [2.2499166814992444]
Infants learn actively in their environments, shaping their own learning curricula.
Here we model this type of behavior by means of a deep learning architecture.
Inference processes actively move the simulated agent towards regions where they expect affordance-related knowledge gain.
arXiv Detail & Related papers (2024-05-13T14:58:57Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Bridging Declarative, Procedural, and Conditional Metacognitive
Knowledge Gap Using Deep Reinforcement Learning [7.253181280137071]
In deductive domains, three metacognitive knowledge types in ascending order are declarative, procedural, and conditional learning.
This work leverages Deep Reinforcement Learning (DRL) in providing adaptive metacognitive interventions to bridge the gap between the three knowledge types.
Our results show that on both ITSs, DRL bridged the metacognitive knowledge gap between students and significantly improved their learning performance over their control peers.
arXiv Detail & Related papers (2023-04-23T20:07:07Z) - What deep reinforcement learning tells us about human motor learning and
vice-versa [24.442174952832108]
We show how recent deep RL methods correspond to the dominant motor learning framework in neuroscience, error-based learning.
We introduce a novel deep RL algorithm: model-based deterministic policy gradients (MB-DPG)
MB-DPG draws inspiration from error-based learning by explicitly relying on the observed outcome of actions.
arXiv Detail & Related papers (2022-08-23T11:56:49Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - PBCS : Efficient Exploration and Exploitation Using a Synergy between
Reinforcement Learning and Motion Planning [8.176152440971897]
"Plan, Backplay, Chain Skills" combines motion planning and reinforcement learning to solve hard exploration environments.
We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes.
arXiv Detail & Related papers (2020-04-24T11:37:09Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.