MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks
- URL: http://arxiv.org/abs/2105.06350v1
- Date: Thu, 13 May 2021 15:07:23 GMT
- Title: MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks
- Authors: Menghui Zhu, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen,
Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang
- Abstract summary: In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem.
We develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model.
To improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training.
- Score: 37.529217646431825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Goal-oriented Reinforcement learning, relabeling the raw goals in past
experience to provide agents with hindsight ability is a major solution to the
reward sparsity problem. In this paper, to enhance the diversity of relabeled
goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy
that relabels the goals by looking into the future with a learned dynamics
model. Besides, to improve sample efficiency, we propose to use the dynamics
model to generate simulated trajectories for policy training. By integrating
these two improvements, we introduce the MapGo framework (Model-Assisted Policy
Optimization for Goal-oriented tasks). In our experiments, we first show the
effectiveness of the FGI strategy compared with the hindsight one, and then
show that the MapGo framework achieves higher sample efficiency when compared
to model-free baselines on a set of complicated tasks.
Related papers
- MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning [23.422157931057498]
State-of-the-art algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, have been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL)
GCWSL has demonstrated outstanding performance across diverse goal-reaching tasks, providing a simple, effective, and stable solution.
However, prior research has identified a critical limitation of GCWSL: the lack of trajectory stitching capabilities.
We propose a Model-based Goal Data Augmentation (MGDA) approach, which leverages a learned dynamics model to sample more suitable augmented goals.
arXiv Detail & Related papers (2024-12-16T03:25:28Z) - Parameter-Efficient Active Learning for Foundational models [7.799711162530711]
Foundational vision transformer models have shown impressive few shot performance on many vision tasks.
This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework.
arXiv Detail & Related papers (2024-06-13T16:30:32Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.
We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z) - Efficient Reinforced Feature Selection via Early Stopping Traverse
Strategy [36.890295071860166]
We propose a single-agent Monte Carlo based reinforced feature selection (MCRFS) method.
We also propose two efficiency improvement strategies, i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy.
arXiv Detail & Related papers (2021-09-29T03:51:13Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.