GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training
- URL: http://arxiv.org/abs/2507.13097v1
- Date: Thu, 17 Jul 2025 13:09:28 GMT
- Title: GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training
- Authors: Adithyavairavan Murali, Balakumar Sundaralingam, Yu-Wei Chao, Wentao Yuan, Jun Yamada, Mark Carlson, Fabio Ramos, Stan Birchfield, Dieter Fox, Clemens Eppner,
- Abstract summary: We build upon the recent success on modeling the object-centric grasp generation process as an iterative diffusion process.<n>Our proposed framework, GraspGen, consists of a DiffusionTransformer architecture that enhances grasp generation, paired with an efficient discriminator to score and filter sampled grasps.<n>To scale GraspGen to both objects and grippers, we release a new simulated dataset consisting of over 53 million grasps.
- Score: 53.25060512131128
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Grasping is a fundamental robot skill, yet despite significant research advancements, learning-based 6-DOF grasping approaches are still not turnkey and struggle to generalize across different embodiments and in-the-wild settings. We build upon the recent success on modeling the object-centric grasp generation process as an iterative diffusion process. Our proposed framework, GraspGen, consists of a DiffusionTransformer architecture that enhances grasp generation, paired with an efficient discriminator to score and filter sampled grasps. We introduce a novel and performant on-generator training recipe for the discriminator. To scale GraspGen to both objects and grippers, we release a new simulated dataset consisting of over 53 million grasps. We demonstrate that GraspGen outperforms prior methods in simulations with singulated objects across different grippers, achieves state-of-the-art performance on the FetchBench grasping benchmark, and performs well on a real robot with noisy visual observations.
Related papers
- Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning [2.677995462843075]
HybridGen is an automated framework that integrates Vision-Language Model and hybrid planning.<n>It generates a large volume of training data without requiring specific data formats.<n>In the most challenging task variants, HybridGen achieves significant improvement, reaching a 59.7% average success rate.
arXiv Detail & Related papers (2025-03-17T13:49:43Z) - Self-Guidance: Boosting Flow and Diffusion Generation on Their Own [32.91402070439289]
Self-Guidance (SG) improves the image quality by suppressing the generation of low-quality samples.<n>We conduct experiments on text-to-image and text-to-video generation with different architectures.
arXiv Detail & Related papers (2024-12-08T06:32:27Z) - One-Step Diffusion Distillation through Score Implicit Matching [74.91234358410281]
We present Score Implicit Matching (SIM) a new approach to distilling pre-trained diffusion models into single-step generator models.
SIM shows strong empirical performances for one-step generators.
By applying SIM to a leading transformer-based diffusion model, we distill a single-step generator for text-to-image generation.
arXiv Detail & Related papers (2024-10-22T08:17:20Z) - QDGset: A Large Scale Grasping Dataset Generated with Quality-Diversity [2.095923926387536]
Quality-Diversity (QD) algorithms have been proven to make grasp sampling significantly more efficient.
We extend QDG-6DoF, a QD framework for generating object-centric grasps, to scale up the production of synthetic grasping datasets.
arXiv Detail & Related papers (2024-10-03T08:56:14Z) - Is Tokenization Needed for Masked Particle Modelling? [8.79008927474707]
Masked particle modeling (MPM) is a self-supervised learning scheme for constructing expressive representations of unordered sets.
We improve MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder.
We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets.
arXiv Detail & Related papers (2024-09-19T09:12:29Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Accurate generation of stochastic dynamics based on multi-model
Generative Adversarial Networks [0.0]
Generative Adversarial Networks (GANs) have shown immense potential in fields such as text and image generation.
Here we quantitatively test this approach by applying it to a prototypical process on a lattice.
Importantly, the discreteness of the model is retained despite the noise.
arXiv Detail & Related papers (2023-05-25T10:41:02Z) - Joint Generator-Ranker Learning for Natural Language Generation [99.16268050116717]
JGR is a novel joint training algorithm that integrates the generator and the ranker in a single framework.
By iteratively updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly.
arXiv Detail & Related papers (2022-06-28T12:58:30Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z) - Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps [46.316638161863025]
We present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second.
We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda.
Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping.
arXiv Detail & Related papers (2020-12-17T16:08:18Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.