Discovering Hierarchical Achievements in Reinforcement Learning via
Contrastive Learning
- URL: http://arxiv.org/abs/2307.03486v3
- Date: Thu, 2 Nov 2023 06:20:50 GMT
- Title: Discovering Hierarchical Achievements in Reinforcement Learning via
Contrastive Learning
- Authors: Seungyong Moon, Junyoung Yeom, Bumsoo Park, Hyun Oh Song
- Abstract summary: We introduce a novel contrastive learning method, called achievement distillation, which strengthens the agent's ability to predict the next achievement.
Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment.
- Score: 17.28280896937486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discovering achievements with a hierarchical structure in procedurally
generated environments presents a significant challenge. This requires an agent
to possess a broad range of abilities, including generalization and long-term
reasoning. Many prior methods have been built upon model-based or hierarchical
approaches, with the belief that an explicit module for long-term planning
would be advantageous for learning hierarchical dependencies. However, these
methods demand an excessive number of environment interactions or large model
sizes, limiting their practicality. In this work, we demonstrate that proximal
policy optimization (PPO), a simple yet versatile model-free algorithm,
outperforms previous methods when optimized with recent implementation
practices. Moreover, we find that the PPO agent can predict the next
achievement to be unlocked to some extent, albeit with limited confidence.
Based on this observation, we introduce a novel contrastive learning method,
called achievement distillation, which strengthens the agent's ability to
predict the next achievement. Our method exhibits a strong capacity for
discovering hierarchical achievements and shows state-of-the-art performance on
the challenging Crafter environment in a sample-efficient manner while
utilizing fewer model parameters.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning.
Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a
Survey [1.2031796234206134]
Model-based reinforcement learning creates an explicit model of the environment dynamics to reduce the need for environment samples.
A challenge for deep model-based methods is to achieve high predictive power while maintaining low sample complexity.
We propose a taxonomy based on three approaches: using explicit planning on given transitions, using explicit planning on learned transitions, and end-to-end learning of both planning and transitions.
arXiv Detail & Related papers (2020-08-11T08:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.