Related papers: MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

URL: http://arxiv.org/abs/2105.06350v1
Date: Thu, 13 May 2021 15:07:23 GMT
Title: MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks
Authors: Menghui Zhu, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang
Abstract summary: In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. We develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. To improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training.
Score: 37.529217646431825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy Optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

Related papers

Reinforced Reasoning for Embodied Planning [18.40186665383579]
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals.<n>We introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning.
arXiv Detail & Related papers (2025-05-28T07:21:37Z)
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning [60.100794160682646]
We propose a new learning framework that jointly optimize state prediction and action selection through preference learning. To automatically collect trajectories and stepwise preference data without human annotation, we introduce a tree search mechanism for extensive exploration via trial-and-error. Our method significantly outperforms existing methods and GPT-4o when applied to Qwen2-VL (7B), LLaVA-1.6 (7B), and LLaMA-3.2 (11B)
arXiv Detail & Related papers (2025-03-13T15:49:56Z)
MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning [23.422157931057498]
State-of-the-art algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, have been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL) GCWSL has demonstrated outstanding performance across diverse goal-reaching tasks, providing a simple, effective, and stable solution. However, prior research has identified a critical limitation of GCWSL: the lack of trajectory stitching capabilities. We propose a Model-based Goal Data Augmentation (MGDA) approach, which leverages a learned dynamics model to sample more suitable augmented goals.
arXiv Detail & Related papers (2024-12-16T03:25:28Z)
Parameter-Efficient Active Learning for Foundational models [7.799711162530711]
Foundational vision transformer models have shown impressive few shot performance on many vision tasks. This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework.
arXiv Detail & Related papers (2024-06-13T16:30:32Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model. We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z)
Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy [36.890295071860166]
We propose a single-agent Monte Carlo based reinforced feature selection (MCRFS) method. We also propose two efficiency improvement strategies, i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy.
arXiv Detail & Related papers (2021-09-29T03:51:13Z)
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z)
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [26.631740480100724]
We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images.
arXiv Detail & Related papers (2020-06-13T03:25:31Z)
PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator. We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.