Optimization Algorithm for Feedback and Feedforward Policies towards
Robot Control Robust to Sensing Failures
- URL: http://arxiv.org/abs/2104.00385v1
- Date: Thu, 1 Apr 2021 10:41:42 GMT
- Title: Optimization Algorithm for Feedback and Feedforward Policies towards
Robot Control Robust to Sensing Failures
- Authors: Taisuke Kobayashi, Kenta Yoshizawa
- Abstract summary: We propose a new optimization problem for optimizing both the FB/FF policies simultaneously.
In numerical simulations and a robot experiment, we verified that the proposed method can stably optimize the composed policy even with the different learning law from the traditional RL.
- Score: 1.7970523486905976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free or learning-based control, in particular, reinforcement learning
(RL), is expected to be applied for complex robotic tasks. Traditional RL
requires a policy to be optimized is state-dependent, that means, the policy is
a kind of feedback (FB) controllers. Due to the necessity of correct state
observation in such a FB controller, it is sensitive to sensing failures. To
alleviate this drawback of the FB controllers, feedback error learning
integrates one of them with a feedforward (FF) controller. RL can be improved
by dealing with the FB/FF policies, but to the best of our knowledge, a
methodology for learning them in a unified manner has not been developed. In
this paper, we propose a new optimization problem for optimizing both the FB/FF
policies simultaneously. Inspired by control as inference, the optimization
problem considers minimization/maximization of divergences between trajectory,
predicted by the composed policy and a stochastic dynamics model, and
optimal/non-optimal trajectories. By approximating the stochastic dynamics
model using variational method, we naturally derive a regularization between
the FB/FF policies. In numerical simulations and a robot experiment, we
verified that the proposed method can stably optimize the composed policy even
with the different learning law from the traditional RL. In addition, we
demonstrated that the FF policy is robust to the sensing failures and can hold
the optimal motion. Attached video is also uploaded on youtube:
https://youtu.be/zLL4uXIRmrE
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System [0.7499722271664147]
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a Quanser Aero 2 system.
PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability.
arXiv Detail & Related papers (2024-08-28T08:35:34Z) - Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control.
The control is learned via a least squares problem by trying to fit a matching vector field.
Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Pareto Deterministic Policy Gradients and Its Application in 5G Massive
MIMO Networks [32.099949375036495]
We consider jointly optimizing cell load balance and network throughput via a reinforcement learning (RL) approach.
Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics.
To accomplish this joint optimization, we integrate vector rewards into the RL value network and conduct RL action via a separate policy network.
arXiv Detail & Related papers (2020-12-02T15:35:35Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.