Related papers: Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

URL: http://arxiv.org/abs/2304.02324v1
Date: Wed, 5 Apr 2023 09:26:59 GMT
Title: Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts
Authors: Navid Hashemi, Justin Ruths, Jyotirmoy V. Deshmukh
Abstract summary: We show that we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
Score: 0.991395455012393
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.

Related papers

Optimal Exploration for Model-Based RL in Nonlinear Systems [14.540210895533937]
Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. We develop an algorithm able to efficiently explore the system to reduce uncertainty in a task-dependent metric. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest.
arXiv Detail & Related papers (2023-06-15T15:47:50Z)
Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks. We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z)
Safe and Efficient Model-free Adaptive Control via Bayesian Optimization [39.962395119933596]
We propose a purely data-driven, model-free approach for adaptive control. tuning low-level controllers based solely on system data raises concerns on the underlying algorithm safety and computational performance. We numerically demonstrate for several types of disturbances that our approach is sample efficient, outperforms constrained Bayesian optimization in terms of safety, and achieves the performance optima computed by grid evaluation.
arXiv Detail & Related papers (2021-01-19T19:15:00Z)
Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation [120.69747175899421]
Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation. We propose a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications. Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions.
arXiv Detail & Related papers (2020-10-12T17:13:40Z)
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
Local Policy Optimization for Trajectory-Centric Reinforcement Learning [31.495672846638346]
A lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. We present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning.
arXiv Detail & Related papers (2020-01-22T15:56:00Z)
Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.