Related papers: ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

URL: http://arxiv.org/abs/2411.03990v1
Date: Wed, 06 Nov 2024 15:30:42 GMT
Title: ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
Authors: Chenrui Tie, Yue Chen, Ruihai Wu, Boxuan Dong, Zeyi Li, Chongkai Gao, Hao Dong,
Abstract summary: We propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations.
Score: 11.454229873419697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: https://et-seed.github.io/

Related papers

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences. Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations. Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z)
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training. It achieves state-of-the-art performance among diffusion models on ImageNet benchmark. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z)
Large Language-Geometry Model: When LLM meets Equivariance [53.8505081745406]
We propose EquiLLM, a novel framework for representing 3D physical systems. We show that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design.
arXiv Detail & Related papers (2025-02-16T14:50:49Z)
Variational Schrödinger Diffusion Models [14.480273869571468]
Schr"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. We leverage variational inference to linearize the forward score functions (variational scores) of SB. We propose the variational Schr"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport.
arXiv Detail & Related papers (2024-05-08T04:01:40Z)
Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection [37.142470149311904]
We propose atemporal equivariant learning framework by considering both spatial and temporal augmentations jointly. We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
arXiv Detail & Related papers (2024-04-17T20:41:49Z)
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation [66.16525145765604]
We introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud. Experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.
arXiv Detail & Related papers (2023-10-26T12:47:26Z)
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation [5.11432473998551]
Diffusion-EDFs is a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks. We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective end-to-end training in less than an hour.
arXiv Detail & Related papers (2023-09-06T03:42:20Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning [83.11657818251447]
We propose EqMotion, an efficient equivariant motion prediction model with invariant interaction reasoning. We conduct experiments for the proposed model on four distinct scenarios: particle dynamics, molecule dynamics, human skeleton motion prediction and pedestrian trajectory prediction. Our method achieves state-of-the-art prediction performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%.
arXiv Detail & Related papers (2023-03-20T05:23:46Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
Optimizing Training Trajectories in Variational Autoencoders via Latent Bayesian Optimization Approach [0.0]
Unsupervised and semi-supervised ML methods have become widely adopted across multiple areas of physics, chemistry, and materials sciences. We propose a latent Bayesian optimization (zBO) approach for the hyper parameter trajectory optimization for the unsupervised and semi-supervised ML. We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system.
arXiv Detail & Related papers (2022-06-30T23:41:47Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Stabilizing Training of Generative Adversarial Nets via Langevin Stein Variational Gradient Descent [11.329376606876101]
We propose to stabilize GAN training via a novel particle-based variational inference -- Langevin Stein variational descent gradient (LSVGD) We show that LSVGD dynamics has an implicit regularization which is able to enhance particles' spread-out and diversity.
arXiv Detail & Related papers (2020-04-22T11:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.