Related papers: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping

Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping

URL: http://arxiv.org/abs/2404.08844v1
Date: Fri, 12 Apr 2024 23:11:36 GMT
Title: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping
Authors: Lei Zhang, Kaixin Bai, Guowen Huang, Zhaopeng Chen, Jianwei Zhang,
Abstract summary: We develop a novel method for generating five-fingered hand grasp samples in cluttered settings. A key aspect of our approach is our data generation method, capable of estimating contact spatial and semantic representations. We introduce a unique grasp detection technique that efficiently formulates mechanical hand grasp poses from these maps.
Score: 8.11121483911344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of optimization method and generative models has significantly advanced dexterous manipulation techniques for five-fingered hand grasping. Yet, the application of these techniques in cluttered environments is a relatively unexplored area. To address this research gap, we have developed a novel method for generating five-fingered hand grasp samples in cluttered settings. This method emphasizes simulated grasp quality and the nuanced interaction between the hand and surrounding objects. A key aspect of our approach is our data generation method, capable of estimating contact spatial and semantic representations and affordance grasps based on object affordance information. Furthermore, our Contact Semantic Conditional Variational Autoencoder (CoSe-CVAE) network is adept at creating comprehensive contact maps from point clouds, incorporating both spatial and semantic data. We introduce a unique grasp detection technique that efficiently formulates mechanical hand grasp poses from these maps. Additionally, our evaluation model is designed to assess grasp quality and collision probability, significantly improving the practicality of five-fingered hand grasping in complex scenarios. Our data generation method outperforms previous datasets in grasp diversity, scene diversity, modality diversity. Our grasp generation method has demonstrated remarkable success, outperforming established baselines with 81.0% average success rate in real-world single-object grasping and 75.3% success rate in multi-object grasping. The dataset and supplementary materials can be found at https://sites.google.com/view/ffh-clutteredgrasping, and we will release the code upon publication.

Related papers

A Paradigm Shift in Mouza Map Vectorization: A Human-Machine Collaboration Approach [2.315458677488431]
Current manual digitization methods are time-consuming and labor-intensive. Our study proposes a semi-automated approach to streamline the digitization process, saving both time and human resources.
arXiv Detail & Related papers (2024-10-21T12:47:36Z)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications. This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds. Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z)
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Simultaneous prediction of hand gestures, handedness, and hand keypoints using thermal images [0.6087960723103347]
We propose a technique for simultaneous hand gesture classification, handedness detection, and hand keypoints localization using thermal data captured by an infrared camera. Our method uses a novel deep multi-task learning architecture that includes shared encoderdecoder layers followed by three branches dedicated for each mentioned task.
arXiv Detail & Related papers (2023-03-02T19:25:40Z)
Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications. We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z)
S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images. Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels. We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z)
SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings [2.534402217750793]
We present an approach to learn dense, continuous 2D-3D correspondence distributions over the surface of objects from data. We also present a new method for 6D pose estimation of rigid objects using the learnt distributions to sample, score and refine pose hypotheses.
arXiv Detail & Related papers (2021-11-26T13:39:38Z)
Greedy Offset-Guided Keypoint Grouping for Human Pose Estimation [31.468003041368814]
We employ an Hourglass Network to infer all the keypoints from different persons indiscriminately. We greedily group the candidate keypoints into multiple human poses, utilizing the predicted guiding offsets. Our approach is comparable to the state of the art on the challenging COCO dataset under fair conditions.
arXiv Detail & Related papers (2021-07-07T09:32:01Z)
ProxyFAUG: Proximity-based Fingerprint Augmentation [81.15016852963676]
ProxyFAUG is a rule-based, proximity-based method of fingerprint augmentation. The best performing positioning method on this dataset is improved by 40% in terms of median error and 6% in terms of mean error, with the use of the augmented dataset.
arXiv Detail & Related papers (2021-02-04T15:59:30Z)
Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps [46.316638161863025]
We present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second. We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda. Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping.
arXiv Detail & Related papers (2020-12-17T16:08:18Z)
MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation [32.12879364117658]
Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed. We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth. We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
arXiv Detail & Related papers (2020-12-06T07:55:08Z)
Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video. Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses. We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.