Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
- URL: http://arxiv.org/abs/2411.03294v3
- Date: Thu, 20 Mar 2025 18:03:15 GMT
- Title: Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
- Authors: George Jiayuan Gao, Tianyu Li, Nadia Figueroa,
- Abstract summary: We propose an object-centric recovery (OCR) framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning.<n>We show OCR's capacity to autonomously collect demonstrations for continual learning.
- Score: 2.6696199945489534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an object-centric recovery (OCR) framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from the object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7\% over the base policy in OOD. Furthermore, we show OCR's capacity to autonomously collect demonstrations for continual learning. Overall, we believe this framework represents a step toward improving the robustness of visuomotor policies in real-world settings. Project Website: https://sites.google.com/view/ocr-penn
Related papers
- Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.
We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.
We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z) - Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning [24.600297554387108]
This paper presents a new principled method in recovering diverse polices from expert data.
In particular, after inferring or assigning a latent style for a trajectory, we enhance the vanilla behavioral cloning by incorporating a weighting mechanism.
We provide theoretical justifications for our new objective, and extensive empirical evaluations confirm the effectiveness of our method.
arXiv Detail & Related papers (2024-10-21T11:33:14Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning [11.084321518414226]
We adapt existing importance-sampling ratio estimation techniques for off-policy evaluation to drastically improve the stability and efficiency of so-called hindsight policy methods.
Our hindsight distribution correction facilitates stable, efficient learning across a broad range of environments where credit assignment plagues baseline methods.
arXiv Detail & Related papers (2023-07-21T20:54:52Z) - Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward.
This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z) - Goal-Conditioned Imitation Learning using Score-based Diffusion Policies [3.49482137286472]
We propose a new policy representation based on score-based diffusion models (SDMs)
We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL)
We show how BESO can even be used to learn a goal-independent policy from play-data usingintuitive-free guidance.
arXiv Detail & Related papers (2023-04-05T15:52:34Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Back to the Manifold: Recovering from Out-of-Distribution States [20.36024602311382]
We propose a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states.
We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform.
arXiv Detail & Related papers (2022-07-18T15:10:58Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.
We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance.
Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z) - Evolutionary Stochastic Policy Distillation [139.54121001226451]
We propose a new method called Evolutionary Policy Distillation (ESPD) to solve GCRS tasks.
ESPD enables a target policy to learn from a series of its variants through the technique of policy distillation (PD)
The experiments based on the MuJoCo control suite show the high learning efficiency of the proposed method.
arXiv Detail & Related papers (2020-04-27T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.