Related papers: Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification

Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification

URL: http://arxiv.org/abs/2502.02133v1
Date: Tue, 04 Feb 2025 09:06:07 GMT
Title: Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification
Authors: Rudolf Reiter, Jasper Hoffmann, Dirk Reinhardt, Florian Messerer, Katrin Baumgärtner, Shamburaj Sawant, Joschka Boedecker, Moritz Diehl, Sebastien Gros,
Abstract summary: This work illuminates the differences, similarities, and fundamentals that allow for different combination algorithms.<n>We focus on the versatile actor-critic RL approach as a basis for our categorization and examine how the online optimization approach of MPC can be used to improve the overall closed-loop performance of a policy.
Score: 5.260523686933724
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The fields of MPC and RL consider two successful control techniques for Markov decision processes. Both approaches are derived from similar fundamental principles, and both are widely used in practical applications, including robotics, process control, energy systems, and autonomous driving. Despite their similarities, MPC and RL follow distinct paradigms that emerged from diverse communities and different requirements. Various technical discrepancies, particularly the role of an environment model as part of the algorithm, lead to methodologies with nearly complementary advantages. Due to their orthogonal benefits, research interest in combination methods has recently increased significantly, leading to a large and growing set of complex ideas leveraging MPC and RL. This work illuminates the differences, similarities, and fundamentals that allow for different combination algorithms and categorizes existing work accordingly. Particularly, we focus on the versatile actor-critic RL approach as a basis for our categorization and examine how the online optimization approach of MPC can be used to improve the overall closed-loop performance of a policy.

Related papers

A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control [21.22244612145334]
Diffusion policies have emerged as a powerful approach for robotic control.<n>Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems are studied.
arXiv Detail & Related papers (2026-01-05T05:19:23Z)
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives [6.277211882332452]
Multi-agent cooperative decision-making involves multiple agents working together to complete established tasks and achieve specific objectives. These techniques are widely applicable in real-world scenarios such as autonomous driving, drone navigation, disaster rescue, and simulated military confrontations.
arXiv Detail & Related papers (2025-03-17T17:45:46Z)
Efficient and Scalable Deep Reinforcement Learning for Mean Field Control Games [16.62770187749295]
Mean Field Control Games (MFCGs) provide a powerful theoretical framework for analyzing systems of infinitely many interacting agents.<n>This paper presents a scalable deep Reinforcement Learning (RL) approach to approximate equilibrium solutions of MFCGs.
arXiv Detail & Related papers (2024-12-28T02:04:53Z)
Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System [0.7499722271664147]
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a Quanser Aero 2 system. PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability.
arXiv Detail & Related papers (2024-08-28T08:35:34Z)
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.<n>The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.<n>We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning [48.79569442193824]
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
arXiv Detail & Related papers (2024-02-04T09:58:42Z)
Personalized Reinforcement Learning with a Budget of Policies [9.846353643883443]
Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users. We propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints. In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies. We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs.
arXiv Detail & Related papers (2024-01-12T11:27:55Z)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL. They tend to struggle when asked to accomplish a series of heterogeneous tasks. We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z)
Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning [7.020079427649125]
We show that grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance. We propose a probabilistic mixture-of-experts (PMOE) for multimodal policy, together with a novel gradient estimator for the indifferentiability problem.
arXiv Detail & Related papers (2021-04-19T08:21:56Z)
Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot. We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control. We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.