RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
- URL: http://arxiv.org/abs/2410.07995v1
- Date: Thu, 10 Oct 2024 14:52:30 GMT
- Title: RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
- Authors: Yilin Wang, Chuan Guo, Li Cheng, Hai Jiang,
- Abstract summary: RegionGrasp-CVAE is proposed to generate plausible hand grasps of 3D objects.
Condition encoder O-Enc and pretraining strategy O-Enc are used.
HoINet is introduced to encode hand-object interaction features.
- Score: 35.11194409871017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can machine automatically generate multiple distinct and natural hand grasps, given specific contact region of an object in 3D? This motivates us to consider a novel task of \textit{Region Controllable Hand Grasp Generation (RegionGrasp)}, as follows: given as input a 3D object, together with its specific surface area selected as the intended contact region, to generate a diverse set of plausible hand grasps of the object, where the thumb finger tip touches the object surface on the contact region. To address this task, RegionGrasp-CVAE is proposed, which consists of two main parts. First, to enable contact region-awareness, we propose ConditionNet as the condition encoder that includes in it a transformer-backboned object encoder, O-Enc; a pretraining strategy is adopted by O-Enc, where the point patches of object surface are randomly masked off and subsequently restored, to further capture surface geometric information of the object. Second, to realize interaction awareness, HOINet is introduced to encode hand-object interaction features by entangling high-level hand features with embedded object features through geometric-aware multi-head cross attention. Empirical evaluations demonstrate the effectiveness of our approach qualitatively and quantitatively where it is shown to compare favorably with respect to the state of the art methods.
Related papers
- Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping [34.98831146003579]
One-shot transfer of dexterous grasps to novel scenes with object and context variations has been a challenging problem.
We propose the textitneural attention field for representing semantic-aware dense feature fields in the 3D space.
arXiv Detail & Related papers (2024-10-30T14:06:51Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - GEARS: Local Geometry-aware Hand-object Interaction Synthesis [38.75942505771009]
We introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions.
As an important step towards mitigating the learning complexity, we transform the points from global frame to template hand frame and use a shared module to process sensor features of each individual joint.
This is followed by a perceptual-temporal transformer network aimed at capturing correlation among the joints in different dimensions.
arXiv Detail & Related papers (2024-04-02T09:18:52Z) - Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction [8.253265795150401]
This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D.
For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object.
For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion.
arXiv Detail & Related papers (2024-03-31T04:56:30Z) - NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of
Hand-Object Interaction [19.957593804898064]
We present a novel free-point rendering framework, Neural Contact Radiance Field ( NCRF), to reconstruct hand-object interactions from a sparse set of videos.
We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints.
Our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
arXiv Detail & Related papers (2024-02-08T10:09:12Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - 3D Object Detection on Point Clouds using Local Ground-aware and
Adaptive Representation of scenes' surface [1.9336815376402714]
A novel, adaptive ground-aware, and cost-effective 3D Object Detection pipeline is proposed.
A new state-of-the-art 3D object detection performance among the two-stage Lidar Object Detection pipelines is proposed.
arXiv Detail & Related papers (2020-02-02T05:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.