Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping
- URL: http://arxiv.org/abs/2404.08844v2
- Date: Sun, 22 Sep 2024 11:46:14 GMT
- Title: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping
- Authors: Lei Zhang, Kaixin Bai, Guowen Huang, Zhenshan Bing, Zhaopeng Chen, Alois Knoll, Jianwei Zhang,
- Abstract summary: We develop a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map.
We also propose the multi-modal multi-fingered grasping dataset generation method.
- Score: 14.674925349389179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep learning models has significantly advanced dexterous manipulation techniques for multi-fingered hand grasping. However, the contact information-guided grasping in cluttered environments remains largely underexplored. To address this gap, we have developed a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map. We introduce a contact semantic conditional variational autoencoder network (CoSe-CVAE) for creating comprehensive contact semantic map from object point cloud. We utilize grasp detection method to estimate hand grasp poses from the contact semantic map. Finally, an unified grasp evaluation model is designed to assess grasp quality and collision probability, substantially improving the reliability of identifying optimal grasps in cluttered scenarios. Our grasp generation method has demonstrated remarkable success, outperforming state-of-the-art methods by at least 4.65% with 81.0% average grasping success rate in real-world single-object environment and 75.3% grasping success rate in cluttered scenes. We also proposed the multi-modal multi-fingered grasping dataset generation method. Our multi-fingered hand grasping dataset outperforms previous datasets in scene diversity, modality diversity. The dataset, code and supplementary materials can be found at https://sites.google.com/view/ffh-cluttered-grasping.
Related papers
- A Paradigm Shift in Mouza Map Vectorization: A Human-Machine Collaboration Approach [2.315458677488431]
Current manual digitization methods are time-consuming and labor-intensive.
Our study proposes a semi-automated approach to streamline the digitization process, saving both time and human resources.
arXiv Detail & Related papers (2024-10-21T12:47:36Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Simultaneous prediction of hand gestures, handedness, and hand keypoints
using thermal images [0.6087960723103347]
We propose a technique for simultaneous hand gesture classification, handedness detection, and hand keypoints localization using thermal data captured by an infrared camera.
Our method uses a novel deep multi-task learning architecture that includes shared encoderdecoder layers followed by three branches dedicated for each mentioned task.
arXiv Detail & Related papers (2023-03-02T19:25:40Z) - SurfEmb: Dense and Continuous Correspondence Distributions for Object
Pose Estimation with Learnt Surface Embeddings [2.534402217750793]
We present an approach to learn dense, continuous 2D-3D correspondence distributions over the surface of objects from data.
We also present a new method for 6D pose estimation of rigid objects using the learnt distributions to sample, score and refine pose hypotheses.
arXiv Detail & Related papers (2021-11-26T13:39:38Z) - ProxyFAUG: Proximity-based Fingerprint Augmentation [81.15016852963676]
ProxyFAUG is a rule-based, proximity-based method of fingerprint augmentation.
The best performing positioning method on this dataset is improved by 40% in terms of median error and 6% in terms of mean error, with the use of the augmented dataset.
arXiv Detail & Related papers (2021-02-04T15:59:30Z) - Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps [46.316638161863025]
We present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second.
We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda.
Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping.
arXiv Detail & Related papers (2020-12-17T16:08:18Z) - MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand
Pose Estimation [32.12879364117658]
Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed.
We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth.
We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
arXiv Detail & Related papers (2020-12-06T07:55:08Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.