Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via
Self-supervised Learning
- URL: http://arxiv.org/abs/2307.01849v3
- Date: Thu, 11 Jan 2024 18:42:15 GMT
- Title: Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via
Self-supervised Learning
- Authors: Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo
- Abstract summary: diffusion models have been adopted for behavioral cloning in a sequence modeling fashion.
We propose Crossway Diffusion, a simple yet effective method to enhance diffusion-based visuomotor policy learning.
Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks.
- Score: 42.009856923352864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence modeling approaches have shown promising results in robot imitation
learning. Recently, diffusion models have been adopted for behavioral cloning
in a sequence modeling fashion, benefiting from their exceptional capabilities
in modeling complex data distributions. The standard diffusion-based policy
iteratively generates action sequences from random noise conditioned on the
input states. Nonetheless, the model for diffusion policy can be further
improved in terms of visual representations. In this work, we propose Crossway
Diffusion, a simple yet effective method to enhance diffusion-based visuomotor
policy learning via a carefully designed state decoder and an auxiliary
self-supervised learning (SSL) objective. The state decoder reconstructs raw
image pixels and other state information from the intermediate representations
of the reverse diffusion process. The whole model is jointly optimized by the
SSL objective and the original diffusion loss. Our experiments demonstrate the
effectiveness of Crossway Diffusion in various simulated and real-world robot
tasks, confirming its consistent advantages over the standard diffusion-based
policy and substantial improvements over the baselines.
Related papers
- Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Score Regularized Policy Optimization through Diffusion Behavior [25.926641622408752]
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling.
We propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models.
Our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks.
arXiv Detail & Related papers (2023-10-11T08:31:26Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.