Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via
Self-supervised Learning
- URL: http://arxiv.org/abs/2307.01849v3
- Date: Thu, 11 Jan 2024 18:42:15 GMT
- Title: Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via
Self-supervised Learning
- Authors: Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo
- Abstract summary: diffusion models have been adopted for behavioral cloning in a sequence modeling fashion.
We propose Crossway Diffusion, a simple yet effective method to enhance diffusion-based visuomotor policy learning.
Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks.
- Score: 42.009856923352864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence modeling approaches have shown promising results in robot imitation
learning. Recently, diffusion models have been adopted for behavioral cloning
in a sequence modeling fashion, benefiting from their exceptional capabilities
in modeling complex data distributions. The standard diffusion-based policy
iteratively generates action sequences from random noise conditioned on the
input states. Nonetheless, the model for diffusion policy can be further
improved in terms of visual representations. In this work, we propose Crossway
Diffusion, a simple yet effective method to enhance diffusion-based visuomotor
policy learning via a carefully designed state decoder and an auxiliary
self-supervised learning (SSL) objective. The state decoder reconstructs raw
image pixels and other state information from the intermediate representations
of the reverse diffusion process. The whole model is jointly optimized by the
SSL objective and the original diffusion loss. Our experiments demonstrate the
effectiveness of Crossway Diffusion in various simulated and real-world robot
tasks, confirming its consistent advantages over the standard diffusion-based
policy and substantial improvements over the baselines.
Related papers
- Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator.
Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z) - LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation [6.866014367868788]
This paper proposes a novel facial swapping module, termed as LDFaceNet (Latent Diffusion based Face Swapping Network)
It is based on a guided latent diffusion model that utilizes facial segmentation and facial recognition modules for a conditioned denoising process.
The results of this study demonstrate that the proposed method can generate extremely realistic and coherent images.
arXiv Detail & Related papers (2024-08-04T16:09:04Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.