Learning user-defined sub-goals using memory editing in reinforcement
learning
- URL: http://arxiv.org/abs/2205.00399v1
- Date: Sun, 1 May 2022 05:19:51 GMT
- Title: Learning user-defined sub-goals using memory editing in reinforcement
learning
- Authors: GyeongTaek Lee
- Abstract summary: The aim of reinforcement learning (RL) is to allow the agent to achieve the final goal.
I propose a methodology to achieve the user-defined sub-goals as well as the final goal using memory editing.
I expect that this methodology can be used in the fields that need to control the agent in a variety of scenarios.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of reinforcement learning (RL) is to allow the agent to achieve the
final goal. Most RL studies have focused on improving the efficiency of
learning to achieve the final goal faster. However, the RL model is very
difficult to modify an intermediate route in the process of reaching the final
goal. That is, the agent cannot be under control to achieve other sub-goals in
the existing studies. If the agent can go through the sub-goals on the way to
the destination, the RL can be applied and studied in various fields. In this
study, I propose a methodology to achieve the user-defined sub-goals as well as
the final goal using memory editing. The memory editing is performed to
generate various sub-goals and give an additional reward to the agent. In
addition, the sub-goals are separately learned from the final goal. I set two
simple environments and various scenarios in the test environments. As a
result, the agent almost successfully passed the sub-goals as well as the final
goal under control. Moreover, the agent was able to be induced to visit the
novel state indirectly in the environments. I expect that this methodology can
be used in the fields that need to control the agent in a variety of scenarios.
Related papers
- NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z) - A Fully Controllable Agent in the Path Planning using Goal-Conditioned
Reinforcement Learning [0.0]
In the path planning, the routes may vary depending on the number of variables such as that it is important for the agent to reach various goals.
I propose a novel reinforcement learning framework for a fully controllable agent in the path planning.
arXiv Detail & Related papers (2022-05-20T05:18:03Z) - Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning [71.52722621691365]
Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
arXiv Detail & Related papers (2022-04-27T17:00:11Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.