Related papers: DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes

DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes

URL: http://arxiv.org/abs/2410.23004v1
Date: Wed, 30 Oct 2024 13:30:39 GMT
Title: DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes
Authors: Jialiang Zhang, Haoran Liu, Danshi Li, Xinqiang Yu, Haoran Geng, Yufei Ding, Jiayi Chen, He Wang,
Abstract summary: We present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps. We also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry. With the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.
Score: 18.95051035812627
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Grasping in cluttered scenes remains highly challenging for dexterous hands due to the scarcity of data. To address this problem, we present a large-scale synthetic benchmark, encompassing 1319 objects, 8270 scenes, and 427 million grasps. Beyond benchmarking, we also propose a novel two-stage grasping method that learns efficiently from data by using a diffusion model that conditions on local geometry. Our proposed generative method outperforms all baselines in simulation experiments. Furthermore, with the aid of test-time-depth restoration, our method demonstrates zero-shot sim-to-real transfer, attaining 90.7% real-world dexterous grasping success rate in cluttered scenes.

Related papers

Event-based Visual Deformation Measurement [76.25283405575108]
Visual Deformation Measurement aims to recover dense deformation fields by tracking surface motion from camera observations.<n>Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space.<n>We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation.
arXiv Detail & Related papers (2026-02-16T01:04:48Z)
Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface [49.742538510885]
We propose Faithful Contouring, a voxelized representation that achieves near-lossless fidelity in 3D meshes.<n>The proposed method also shows flexibility and improvement for existing representation.
arXiv Detail & Related papers (2025-11-06T03:56:12Z)
Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning [82.63833405368159]
Existing end-to-end methods require training on large-scale datasets for specific hands.<n>We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation.
arXiv Detail & Related papers (2025-10-07T15:57:00Z)
GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes [5.289647064481469]
We present GraspClutter6D, a large-scale real-world grasping dataset featuring 1,000 cluttered scenes with dense arrangements. We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments. We validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments.
arXiv Detail & Related papers (2025-04-09T13:15:46Z)
Diffusion Suction Grasping with Large-Scale Parcel Dataset [6.112197264635304]
We present Parcel-Suction-Dataset, a large-scale synthetic dataset containing 25 thousand cluttered scenes with 410 million precision-annotated suction grasp poses. This dataset is generated through our novel geometric sampling algorithm that enables efficient generation of optimal suction grasps. We also propose Diffusion-Suction, an innovative framework that reformulates suction grasp prediction as a conditional generation task.
arXiv Detail & Related papers (2025-02-11T04:09:11Z)
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces [71.1071688018433]
Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces. We improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2Kx2K)
arXiv Detail & Related papers (2023-12-05T22:04:49Z)
DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation. Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details. Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z)
Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio. We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets. We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z)
Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events. We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur. A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z)
Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark [8.025760743074066]
Sim-Suction is a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints. Sim-Suction-Dataset comprises 500 cluttered environments with 3.2 million annotated suction grasp poses. Sim-Suction-Pointnet generates robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset.
arXiv Detail & Related papers (2023-05-25T15:31:08Z)
Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns. We evaluate a suite of top methods on a suite of real-world datasets. We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
SuctionNet-1Billion: A Large-Scale Benchmark for Suction Grasping [47.221326169627666]
We propose a new physical model to analytically evaluate seal formation and wrench resistance of a suction grasping. A two-step methodology is adopted to generate annotations on a large-scale dataset collected in real-world cluttered scenarios. A standard online evaluation system is proposed to evaluate suction poses in continuous operation space.
arXiv Detail & Related papers (2021-03-23T05:02:52Z)
Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.