CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
- URL: http://arxiv.org/abs/2412.06782v2
- Date: Sat, 21 Dec 2024 09:32:08 GMT
- Title: CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
- Authors: Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang,
- Abstract summary: Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning.<n>It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.<n>CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
- Score: 28.761494362934087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In robotic visuomotor policy learning, diffusion-based models have achieved significant success in improving the accuracy of action trajectory generation compared to traditional autoregressive models. However, they suffer from inefficiency due to multiple denoising steps and limited flexibility from complex constraints. In this paper, we introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach. CARP decouples action generation into two stages: first, an action autoencoder learns multi-scale representations of the entire action sequence; then, a GPT-style transformer refines the sequence prediction through a coarse-to-fine autoregressive process. This straightforward and intuitive approach produces highly accurate and smooth actions, matching or even surpassing the performance of diffusion-based policies while maintaining efficiency on par with autoregressive policies. We conduct extensive evaluations across diverse settings, including single-task and multi-task scenarios on state-based and image-based simulation benchmarks, as well as real-world tasks. CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies, establishing a high-performance, efficient, and flexible paradigm for action generation in robotic tasks.
Related papers
- Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.
Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.
We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction.
It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner.
Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z) - ROCM: RLHF on consistency models [8.905375742101707]
We propose a reward optimization framework for applying RLHF to consistency models.
We investigate various $f$-divergences as regularization strategies, striking a balance between reward and model consistency.
arXiv Detail & Related papers (2025-03-08T11:19:48Z) - AdaWM: Adaptive World Model based Planning for Autonomous Driving [34.57859869929471]
World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving.
The pretrain-finetune paradigm is often used, where online RL is performance by a pretrained model and a policy learned offline.
We introduce AdaWM, an Adaptive World Model based planning method, featuring two key steps: (a) mismatch identification, which quantifies the mismatches and informs the finetuning strategy, and (b) alignment-driven finetuning, which selectively updates either the policy or the model as needed.
arXiv Detail & Related papers (2025-01-22T18:34:51Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Diffusion Transformer Policy [48.50988753948537]
We propose a large multi-modal diffusion transformer, dubbed as Diffusion Transformer Policy, to model continuous end-effector actions.
By leveraging the scaling capability of transformers, the proposed approach can effectively model continuous end-effector actions across large diverse robot datasets.
arXiv Detail & Related papers (2024-10-21T12:43:54Z) - Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation [12.377289165111028]
Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task.
This work proposes a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks.
We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory.
arXiv Detail & Related papers (2024-10-17T17:46:27Z) - Autoregressive Action Sequence Learning for Robotic Manipulation [32.9580007141312]
Existing autoregressive architectures generate end-effector waypoints sequentially as word tokens in language modeling.
We extend causal transformers' single-token prediction to support predicting a variable number of tokens in a single step.
We propose the Autoregressive Policy architecture, which solves manipulation tasks by generating hybrid action sequences.
arXiv Detail & Related papers (2024-10-04T04:07:15Z) - Dynamic Corrective Self-Distillation for Better Fine-Tuning of
Pretrained Models [0.9217021281095907]
We tackle the issue of aggressive fine-tuning encountered during the process of transfer learning of pre-trained language models (PLMs)
Inspired by the adaptive boosting method in traditional machine learning, we present an effective dynamic corrective self-distillation approach to improve the fine-tuning of the PLMs.
Our technique involves performing a self-distillation mechanism where, at each iteration, the student model actively adapts and corrects itself by dynamically adjusting the weights assigned to individual data points.
arXiv Detail & Related papers (2023-12-12T07:26:36Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.