Value Summation: A Novel Scoring Function for MPC-based Model-based
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.08169v2
- Date: Wed, 19 Jul 2023 16:45:18 GMT
- Title: Value Summation: A Novel Scoring Function for MPC-based Model-based
Reinforcement Learning
- Authors: Mehran Raisi, Amirhossein Noohian, Luc Mccutcheon, Saber Fallah
- Abstract summary: This paper proposes a novel scoring function for the planning module of MPC-based reinforcement learning methods.
The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values.
The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.
- Score: 4.473327661758546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel scoring function for the planning module of
MPC-based reinforcement learning methods to address the inherent bias of using
the reward function to score trajectories. The proposed method enhances the
learning efficiency of existing MPC-based MBRL methods using the discounted sum
of values. The method utilizes optimal trajectories to guide policy learning
and updates its state-action value function based on real-world and augmented
onboard data. The learning efficiency of the proposed method is evaluated in
selected MuJoCo Gym environments as well as in learning locomotion skills for a
simulated model of the Cassie robot. The results demonstrate that the proposed
method outperforms the current state-of-the-art algorithms in terms of learning
efficiency and average reward return.
Related papers
- Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages.
1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data.
2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning step to fine-tune the model.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow [14.681645502417215]
We introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow)
This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process.
Our method achieves superior performance compared to widely-adopted representative baselines.
arXiv Detail & Related papers (2024-05-22T13:26:26Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - MeanAP-Guided Reinforced Active Learning for Object Detection [34.19741444116433]
This paper introduces MeanAP-Guided Reinforced Active Learning for Object Detection (MAGRAL)
Built upon LSTM architecture, the agent efficiently explores and selects subsequent training instances.
We assess MAGRAL's efficacy across popular benchmarks, PASCAL VOC and MS COCO.
arXiv Detail & Related papers (2023-10-12T14:59:22Z) - The Virtues of Laziness in Model-based RL: A Unified Objective and
Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL)
Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy.
We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z) - Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning [36.14516028564416]
This paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework to learn optimal control policies.
An active learning method is proposed to enhance the sampling efficiency of the system.
Experimental results show superiority of the MM-KTD framework in comparison to its state-of-the-art counterparts.
arXiv Detail & Related papers (2020-05-30T06:39:55Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.