Bisimulation metric for Model Predictive Control
- URL: http://arxiv.org/abs/2410.04553v1
- Date: Sun, 6 Oct 2024 17:12:10 GMT
- Title: Bisimulation metric for Model Predictive Control
- Authors: Yutaka Shimizu, Masayoshi Tomizuka,
- Abstract summary: Bisimulation Metric for Model Predictive Control (BS-MPC) is a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder.
BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time.
We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite.
- Score: 44.301098448479195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite, demonstrating superior performance and robustness compared to state-of-the-art baseline methods.
Related papers
- Prior Constraints-based Reward Model Training for Aligning Large Language Models [58.33118716810208]
This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.
PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins.
Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling.
arXiv Detail & Related papers (2024-04-01T07:49:11Z) - PID Control-Based Self-Healing to Improve the Robustness of Large Language Models [23.418411870842178]
Minor perturbations can significantly reduce the performance of well-trained language models.
We construct a computationally efficient self-healing process to correct undesired model behavior.
The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models.
arXiv Detail & Related papers (2024-03-31T23:46:51Z) - Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations.
We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models.
Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Adaptive Stochastic MPC under Unknown Noise Distribution [19.03553854357296]
We address the MPC problem for linear systems, subject to chance state constraints and hard input constraints, under unknown noise distribution.
We design a distributionally robust and robustly stable benchmark SMPC algorithm for the ideal setting of known noise statistics.
We employ this benchmark controller to derive a novel adaptive SMPC scheme that learns the necessary noise statistics online.
arXiv Detail & Related papers (2022-04-03T16:35:18Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z) - Neural Lyapunov Model Predictive Control: Learning Safe Global
Controllers from Sub-optimal Examples [4.777323087050061]
In many real-world and industrial applications, it is typical to have an existing control strategy, for instance, execution from a human operator.
The objective of this work is to improve upon this unknown, safe but suboptimal policy by learning a new controller that retains safety and stability.
The proposed algorithm alternatively learns the terminal cost and updates the MPC parameters according to a stability metric.
arXiv Detail & Related papers (2020-02-21T16:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.