RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis and Transfer
- URL: http://arxiv.org/abs/2409.14519v2
- Date: Mon, 03 Mar 2025 00:51:41 GMT
- Title: RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis and Transfer
- Authors: Ninad Khargonkar, Luis Felipe Casas, Balakrishnan Prabhakaran, Yu Xiang,
- Abstract summary: We introduce a novel grasp representation named the Unified Gripper Coordinate Space for grasp synthesis and grasp transfer.<n>Our representation leverages spherical coordinates to create a shared coordinate space across different robot grippers.
- Score: 3.84876707968786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel grasp representation named the Unified Gripper Coordinate Space (UGCS) for grasp synthesis and grasp transfer. Our representation leverages spherical coordinates to create a shared coordinate space across different robot grippers, enabling it to synthesize and transfer grasps for both novel objects and previously unseen grippers. The strength of this representation lies in the ability to map palm and fingers of a gripper and the unified coordinate space. Grasp synthesis is formulated as predicting the unified spherical coordinates on object surface points via a conditional variational autoencoder. The predicted unified gripper coordinates establish exact correspondences between the gripper and object points, which is used to optimize grasp pose and joint values. Grasp transfer is facilitated through the point-to-point correspondence between any two (potentially unseen) grippers and solved via a similar optimization. Extensive simulation and real-world experiments showcase the efficacy of the unified grasp representation for grasp synthesis in generating stable and diverse grasps. Similarly, we showcase real-world grasp transfer from human demonstrations across different objects.
Related papers
- SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction [86.54738165527502]
We introduce a novel task of generating realistic and diverse 3D hand trajectories given a single image of an object.
Hand-object interaction trajectory priors can greatly benefit applications in robotics, embodied AI, augmented reality and related fields.
arXiv Detail & Related papers (2025-03-28T20:53:20Z) - PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM [0.0]
PoseLess is a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations.
Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands.
arXiv Detail & Related papers (2025-03-10T09:34:05Z) - GEARS: Local Geometry-aware Hand-object Interaction Synthesis [38.75942505771009]
We introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions.
As an important step towards mitigating the learning complexity, we transform the points from global frame to template hand frame and use a shared module to process sensor features of each individual joint.
This is followed by a perceptual-temporal transformer network aimed at capturing correlation among the joints in different dimensions.
arXiv Detail & Related papers (2024-04-02T09:18:52Z) - Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method [61.19028558470065]
We present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions.
To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion.
We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions.
arXiv Detail & Related papers (2024-03-24T14:24:13Z) - SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes [59.23385953161328]
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics.
We propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians.
Our method can enable user-controlled motion editing while retaining high-fidelity appearances.
arXiv Detail & Related papers (2023-12-04T11:57:14Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Fast and Expressive Gesture Recognition using a Combination-Homomorphic
Electromyogram Encoder [21.25126610043744]
We study the task of gesture recognition from electromyography (EMG)
We define combination gestures consisting of a direction component and a modifier component.
New subjects only demonstrate the single component gestures.
We extrapolate to unseen combination gestures by combining the feature vectors of real single gestures to produce synthetic training data.
arXiv Detail & Related papers (2023-10-30T20:03:34Z) - Variational Barycentric Coordinates [18.752506994498845]
We propose a variational technique to optimize for generalized barycentric coordinates.
We directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field.
arXiv Detail & Related papers (2023-10-05T19:45:06Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - Coordinate Quantized Neural Implicit Representations for Multi-view
Reconstruction [28.910183274743872]
We introduce neural implicit representations with quantized coordinates, which reduces the uncertainty and ambiguity in the field during optimization.
We use discrete coordinates and their positional encodings to learn implicit functions through volume rendering.
Our evaluations under the widely used benchmarks show our superiority over the state-of-the-art.
arXiv Detail & Related papers (2023-08-21T20:27:33Z) - Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction [106.06256351200068]
This paper introduces a model learning framework with auxiliary tasks.
In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise.
We propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data.
arXiv Detail & Related papers (2023-08-17T12:26:11Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object
Tracking? [62.997667081978825]
We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates on graph edges.
This allows our graph neural network to learn to effectively encode temporal and spatial interactions.
We establish a new state-of-the-art on nuScenes dataset and, more importantly, show that our method, PolarMOT, generalizes remarkably well across different locations.
arXiv Detail & Related papers (2022-08-03T10:06:56Z) - A Dual-Masked Auto-Encoder for Robust Motion Capture with
Spatial-Temporal Skeletal Token Completion [13.88656793940129]
We propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity.
We then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion.
In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset.
arXiv Detail & Related papers (2022-07-15T10:00:43Z) - NeuralGrasps: Learning Implicit Representations for Grasps of Multiple
Robotic Hands [15.520158510964757]
We introduce a neural implicit representation for grasps of objects from multiple robotic hands.
Different grasps across multiple robotic hands are encoded into a shared latent space.
grasp transfer has the potential to share grasping skills between robots and enable robots to learn grasping skills from humans.
arXiv Detail & Related papers (2022-07-06T20:33:32Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Learning High-DOF Reaching-and-Grasping via Dynamic Representation of
Gripper-Object Interaction [21.03434784990944]
We propose an effective representation of grasping state characterizing the spatial interaction between the gripper and the target object.
IBS is surprisingly effective as a state representation since it well informs the fine-grained control of each finger with spatial relation against the target object.
Experiments show that it generates high-quality dexterous grasp for complex shapes with smooth grasping motions.
arXiv Detail & Related papers (2022-04-03T07:03:54Z) - Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning [52.73083137245969]
We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions.
Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences.
arXiv Detail & Related papers (2021-07-31T15:13:39Z) - HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction [33.661745138578596]
We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image.
Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap.
We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints.
arXiv Detail & Related papers (2021-04-29T20:19:20Z) - Generalized Grasping for Mechanical Grippers for Unknown Objects with
Partial Point Cloud Representations [4.196869541965447]
We use point clouds to discover grasp pose solutions for multiple grasp types, executed by a mechanical gripper, in near real-time.
We show via simulations and experiments that 1) grasp poses for three grasp types can be found in near real-time, 2) grasp pose solutions are consistent with respect to voxel resolution changes for both partial and complete point cloud scans, and 3) a planned grasp is executed with a mechanical gripper.
arXiv Detail & Related papers (2020-06-23T00:34:05Z) - Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map
Representation [62.79160608266713]
morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping.
Existing grasp generation approaches are cursed to construct discontinuous grasp maps by aggregating annotations for drastically different orientations per grasping point.
We propose a novel augmented grasp map representation, suitable for pixel-wise synthesis, that locally disentangles grasping orientations by partitioning the angle space into multiple bins.
arXiv Detail & Related papers (2020-06-09T08:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.