Geometric Retargeting: A Principled, Ultrafast Neural Hand Retargeting Algorithm
- URL: http://arxiv.org/abs/2503.07541v1
- Date: Mon, 10 Mar 2025 17:10:21 GMT
- Title: Geometric Retargeting: A Principled, Ultrafast Neural Hand Retargeting Algorithm
- Authors: Zhao-Heng Yin, Changhao Wang, Luis Pineda, Krishna Bodduluri, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam,
- Abstract summary: We introduce Geometric Retargeting (GeoRT), an ultrafast, principled hand algorithm for teleoperation.<n>GeoRT converts human finger keypoints to robot hand keypoints at 1KHz, achieving state-of-the-art speed and accuracy.
- Score: 50.52643618961044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Geometric Retargeting (GeoRT), an ultrafast, and principled neural hand retargeting algorithm for teleoperation, developed as part of our recent Dexterity Gen (DexGen) system. GeoRT converts human finger keypoints to robot hand keypoints at 1KHz, achieving state-of-the-art speed and accuracy with significantly fewer hyperparameters. This high-speed capability enables flexible postprocessing, such as leveraging a foundational controller for action correction like DexGen. GeoRT is trained in an unsupervised manner, eliminating the need for manual annotation of hand pairs. The core of GeoRT lies in novel geometric objective functions that capture the essence of retargeting: preserving motion fidelity, ensuring configuration space (C-space) coverage, maintaining uniform response through high flatness, pinch correspondence and preventing self-collisions. This approach is free from intensive test-time optimization, offering a more scalable and practical solution for real-time hand retargeting.
Related papers
- ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.
The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.
To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Split-Boost Neural Networks [1.1549572298362787]
We propose an innovative training strategy for feed-forward architectures - called split-boost.
Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term.
The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.
arXiv Detail & Related papers (2023-09-06T17:08:57Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - A Distance-Geometric Method for Recovering Robot Joint Angles From an
RGB Image [7.971699294672282]
We present a novel method for retrieving the joint angles of a robot manipulator using only a single RGB image of its current configuration.
Our approach, based on a distance-geometric representation of the configuration space, exploits the knowledge of a robot's kinematic model.
arXiv Detail & Related papers (2023-01-05T12:57:45Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - Parameter-free Locally Accelerated Conditional Gradients [91.19349793915615]
We introduce a novel,.
Free Locally accelerated CG (PF-LaCG) algorithm, for which we provide rigorous convergence guarantees.
Our theoretical results are complemented by numerical experiments, which demonstrate local acceleration and showcase the practical improvements of PF-LaCG over non-accelerated algorithms.
arXiv Detail & Related papers (2021-02-12T22:50:01Z) - Randomized Automatic Differentiation [22.95414996614006]
We develop a general framework and approach for randomized automatic differentiation (RAD)
RAD can allow unbiased estimates to be computed with reduced memory in return for variance.
We show that RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks.
arXiv Detail & Related papers (2020-07-20T19:03:44Z) - Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant)
We achieve this by adding an anchor model on each node.
After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z) - LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition
Network for Embedded AR Devices [8.509059894058947]
We propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power.
We show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments.
arXiv Detail & Related papers (2020-01-16T05:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.