From Imitation to Refinement -- Residual RL for Precise Assembly
- URL: http://arxiv.org/abs/2407.16677v2
- Date: Mon, 4 Nov 2024 18:54:23 GMT
- Title: From Imitation to Refinement -- Residual RL for Precise Assembly
- Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, Pulkit Agrawal,
- Abstract summary: Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress.
Our key insight is that chunked BC policies function as trajectory planners, enabling long-horizon tasks.
We present ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL.
- Score: 19.9786629249219
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and precise movements, such as aligning and inserting objects. Our key insight is that chunked BC policies function as trajectory planners, enabling long-horizon tasks. Conversely, as they execute action chunks open-loop, they lack the fine-grained reactivity necessary for reliable execution. Further, we find that the performance of BC policies saturates despite increasing data. Reinforcement learning (RL) is a natural way to overcome this, but it is not straightforward to apply directly to action-chunked models like diffusion policies. We present a simple yet effective method, ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL. The residual policy is trained via on-policy RL, addressing distribution shifts and introducing reactivity without altering the BC trajectory planner. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning. Videos, code, and data are available at \url{https://residual-assembly.github.io}.
Related papers
- Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC)
An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data.
We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z) - Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging [0.0]
tuning parameters in model predictive control (MPC) presents significant challenges, particularly when there is a notable discrepancy between the controller's predictions and the behavior of the closed-loop plant.
We apply Bayesian optimization for efficient learning of unknown model parameters and parameterized constraint backoff terms, aiming to improve closed-loop performance of battery fast charging.
arXiv Detail & Related papers (2024-04-09T08:49:41Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and
Stable Online Fine-Tuning [7.462336024223669]
Key challenge is overcoming overestimation bias for actions not present in data.
One simple method to reduce this bias is to introduce a policy constraint via behavioural cloning (BC)
We demonstrate that by continuing to train a policy offline while reducing the influence of the BC component we can produce refined policies.
arXiv Detail & Related papers (2022-11-21T19:10:27Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement
Learning [27.322942155582687]
The goal of offline reinforcement learning (RL) is to learn near-optimal policies from static logged datasets, thus sidestepping expensive online interactions.
Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning.
We propose ConserWeightive Behavioral Cloning (CWBC) to improve the performance of conditional BC for offline RL.
arXiv Detail & Related papers (2022-10-11T05:37:22Z) - Improving the Efficiency of Off-Policy Reinforcement Learning by
Accounting for Past Decisions [20.531576904743282]
Off-policy estimation bias is corrected in a per-decision manner.
Off-policy algorithms such as Tree Backup and Retrace rely on this mechanism.
We propose a multistep operator that permits arbitrary past-dependent traces.
arXiv Detail & Related papers (2021-12-23T00:07:28Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.