Local Policy Optimization for Trajectory-Centric Reinforcement Learning
- URL: http://arxiv.org/abs/2001.08092v1
- Date: Wed, 22 Jan 2020 15:56:00 GMT
- Title: Local Policy Optimization for Trajectory-Centric Reinforcement Learning
- Authors: Patrik Kolaric, Devesh K. Jha, Arvind U. Raghunathan, Frank L. Lewis,
Mouhacine Benosman, Diego Romeres and Daniel Nikovski
- Abstract summary: A lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy.
We present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning.
- Score: 31.495672846638346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this paper is to present a method for simultaneous trajectory and
local stabilizing policy optimization to generate local policies for
trajectory-centric model-based reinforcement learning (MBRL). This is motivated
by the fact that global policy optimization for non-linear systems could be a
very challenging problem both algorithmically and numerically. However, a lot
of robotic manipulation tasks are trajectory-centric, and thus do not require a
global model or policy. Due to inaccuracies in the learned model estimates, an
open-loop trajectory optimization process mostly results in very poor
performance when used on the real system. Motivated by these problems, we try
to formulate the problem of trajectory optimization and local policy synthesis
as a single optimization problem. It is then solved simultaneously as an
instance of nonlinear programming. We provide some results for analysis as well
as achieved performance of the proposed technique under some simplifying
assumptions.
Related papers
- Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [61.580419063416734]
A recent stream of structured learning approaches has improved the practical state of the art for a range of optimization problems.
The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately.
In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves the generalization error.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization [26.45568696453259]
This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.
We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $mathttiLQR$-like policy updates.
arXiv Detail & Related papers (2023-05-16T17:13:00Z) - Convex Optimization-based Policy Adaptation to Compensate for
Distributional Shifts [0.991395455012393]
We show that we can learn policies that track the optimal trajectory with much better error performance, and faster computation times.
We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
arXiv Detail & Related papers (2023-04-05T09:26:59Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - An Actor-Critic Method for Simulation-Based Optimization [6.261751912603047]
We focus on a simulation-based optimization problem of choosing the best design from the feasible space.
We formulate the sampling process as a policy searching problem and give a solution from the perspective of Reinforcement Learning (RL)
Some experiments are designed to validate the effectiveness of proposed algorithms.
arXiv Detail & Related papers (2021-10-31T09:04:23Z) - Learning to Optimize Under Constraints with Unsupervised Deep Neural
Networks [0.0]
We propose a machine learning (ML) method to learn how to solve a generic constrained continuous optimization problem.
In this paper, we propose an unsupervised deep learning (DL) solution for solving constrained optimization problems in real-time.
arXiv Detail & Related papers (2021-01-04T02:58:37Z) - Optimizing Wireless Systems Using Unsupervised and
Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems.
In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.