Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts
- URL: http://arxiv.org/abs/2311.01457v1
- Date: Thu, 2 Nov 2023 17:59:30 GMT
- Title: Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts
- Authors: Huang Huang, Satvik Sharma, Antonio Loquercio, Anastasios
Angelopoulos, Ken Goldberg, Jitendra Malik
- Abstract summary: This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables.
The key idea is the design of switching policies that can take conformal quantiles as input.
We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
- Score: 61.929388479847525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on the problem of detecting and reacting to changes in the
distribution of a sensorimotor controller's observables. The key idea is the
design of switching policies that can take conformal quantiles as input, which
we define as conformal policy learning, that allows robots to detect
distribution shifts with formal statistical guarantees. We show how to design
such policies by using conformal quantiles to switch between base policies with
different characteristics, e.g. safety or speed, or directly augmenting a
policy observation with a quantile and training it with reinforcement learning.
Theoretically, we show that such policies achieve the formal convergence
guarantees in finite time. In addition, we thoroughly evaluate their advantages
and limitations on two compelling use cases: simulated autonomous driving and
active perception with a physical quadruped. Empirical results demonstrate that
our approach outperforms five baselines. It is also the simplest of the
baseline strategies besides one ablation. Being easy to use, flexible, and with
formal guarantees, our work demonstrates how conformal prediction can be an
effective tool for sensorimotor learning under uncertainty.
Related papers
- How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation [17.638831964639834]
Behavior cloning policies are increasingly successful at solving complex tasks by learning from human demonstrations.
We present a framework that provides a tight lower-bound on robot performance in an arbitrary environment.
In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware.
arXiv Detail & Related papers (2024-05-08T22:00:35Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Enabling Efficient, Reliable Real-World Reinforcement Learning with
Approximate Physics-Based Models [10.472792899267365]
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.
In this paper we introduce a novel policy gradient-based policy optimization framework.
We show that our approach can learn precise control strategies reliably and with only minutes of real-world data.
arXiv Detail & Related papers (2023-07-16T22:36:36Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.