Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control
- URL: http://arxiv.org/abs/2209.09006v1
- Date: Mon, 19 Sep 2022 13:32:09 GMT
- Title: Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control
- Authors: Quentin Le Lidec, Wilson Jallet, Ivan Laptev, Cordelia Schmid, Justin
Carpentier
- Abstract summary: Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.
We propose several improvements on top of these approaches to learn global control policies quicker.
- Score: 75.28441662678394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) and trajectory optimization (TO) present strong
complementary advantages. On one hand, RL approaches are able to learn global
control policies directly from data, but generally require large sample sizes
to properly converge towards feasible policies. On the other hand, TO methods
are able to exploit gradient-based information extracted from simulators to
quickly converge towards a locally optimal control trajectory which is only
valid within the vicinity of the solution. Over the past decade, several
approaches have aimed to adequately combine the two classes of methods in order
to obtain the best of both worlds. Following on from this line of research, we
propose several improvements on top of these approaches to learn global control
policies quicker, notably by leveraging sensitivity information stemming from
TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce
the consensus between TO and policy learning. We evaluate the benefits of these
improvements on various classical tasks in robotics through comparison with
existing approaches in the literature.
Related papers
- Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Enabling Efficient, Reliable Real-World Reinforcement Learning with
Approximate Physics-Based Models [10.472792899267365]
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.
In this paper we introduce a novel policy gradient-based policy optimization framework.
We show that our approach can learn precise control strategies reliably and with only minutes of real-world data.
arXiv Detail & Related papers (2023-07-16T22:36:36Z) - Towards a Theoretical Foundation of Policy Optimization for Learning
Control Policies [26.04704565406123]
Gradient-based methods have been widely used for system design and optimization in diverse application domains.
Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning.
This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis.
arXiv Detail & Related papers (2022-10-10T16:13:34Z) - Global Convergence Using Policy Gradient Methods for Model-free
Markovian Jump Linear Quadratic Control [8.98732207994362]
We study the global convergence of gradient-based policy optimization methods for control of discrete-time and model-free Markovian jump linear systems.
We show global convergence of the policy using gradient descent and natural policy gradient methods.
arXiv Detail & Related papers (2021-11-30T09:26:26Z) - Domain Adaptive Person Re-Identification via Coupling Optimization [58.567492812339566]
Domain adaptive person Re-Identification (ReID) is challenging owing to the domain gap and shortage of annotations on target scenarios.
This paper proposes a coupling optimization method including the Domain-Invariant Mapping (DIM) method and the Global-Local distance Optimization ( GLO)
GLO is designed to train the ReID model with unsupervised setting on the target domain.
arXiv Detail & Related papers (2020-11-06T14:01:03Z) - Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications.
We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space.
By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z) - Lane-Merging Using Policy-based Reinforcement Learning and
Post-Optimization [0.0]
We combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies.
We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.
arXiv Detail & Related papers (2020-03-06T12:57:25Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.