Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data
- URL: http://arxiv.org/abs/2511.17276v1
- Date: Fri, 21 Nov 2025 14:31:39 GMT
- Title: Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data
- Authors: Julien Merand, Boris Meden, Mathieu Grossard,
- Abstract summary: This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain.<n>We use Conditional Variational Auto-Encoder (CVAE) which takes point cloud data of key structural elements as input and reconstructs the corresponding joint configurations.<n>We validate our approach on the MultiDex grasping dataset using the Allegro Hand, operating within 0.05 milliseconds and achieving accuracy comparable to state-of-the-art methods.
- Score: 1.3124513975412255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by visual sensors, simulations or even generative neural networks. Well-known inverse kinematics (IK) techniques can provide mathematically exact solutions (when they exist) for joint configuration determination based solely on the fingertip pose, but often require post-hoc decision-making by considering the positions of all intermediate phalanges in the gripper's fingers, or rely on algorithms to numerically approximate solutions for more complex kinematics. In contrast, our method leverages machine learning to implicitly overcome these challenges. This is achieved through a Conditional Variational Auto-Encoder (CVAE), which takes point cloud data of key structural elements as input and reconstructs the corresponding joint configurations. We validate our approach on the MultiDex grasping dataset using the Allegro Hand, operating within 0.05 milliseconds and achieving accuracy comparable to state-of-the-art methods. This highlights the effectiveness of our pipeline for joint configuration estimation within the broader context of AI-driven techniques for grasp planning.
Related papers
- Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation [30.355128117680444]
Scene flow estimation aims to generate the 3D motion field of points between two consecutive frames of point clouds.<n>Existing point-based methods ignore the irregularity of point clouds and have difficulty capturing long-range dependencies.<n>We propose a point-voxel fusion method, where we utilize a voxel branch based on sparse grid attention and the shifted window strategy to capture long-range dependencies.
arXiv Detail & Related papers (2024-10-17T09:05:15Z) - Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday)
At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores.
Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - The Wyner Variational Autoencoder for Unsupervised Multi-Layer Wireless
Fingerprinting [6.632671046812309]
We propose a multi-layer fingerprinting framework that jointly considers the multi-layer signatures for improved identification performance.
In contrast to previous works, by leveraging the recent multi-view machine learning paradigm, our method can cluster the device information shared among the multi-layer features without supervision.
Our empirical results show that the proposed method outperforms the state-of-the-art baselines in both supervised and unsupervised settings.
arXiv Detail & Related papers (2023-03-28T10:05:06Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.