Related papers: Grasp-HGN: Grasping the Unexpected

Grasp-HGN: Grasping the Unexpected

URL: http://arxiv.org/abs/2508.07648v1
Date: Mon, 11 Aug 2025 05:58:28 GMT
Title: Grasp-HGN: Grasping the Unexpected
Authors: Mehrshad Zandigohar, Mallesham Dasari, Gunar Schirner,
Abstract summary: Current grasp models perform poorly on unseen objects, negatively affecting users' independence and quality of life.<n>We propose Grasp-LLaVA, a Grasp Vision Language Model enabling human-like reasoning to infer the suitable grasp type estimate.<n>Finally, we propose Hybrid Grasp Network (HGN), an edge-cloud deployment infrastructure enabling fast grasp estimation on edge.
Score: 1.2174549154334797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. To advance next-generation prosthetic hand control design, it is crucial to address current shortcomings in robustness to out of lab artifacts, and generalizability to new environments. Due to the fixed number of object to interact with in existing datasets, contrasted with the virtually infinite variety of objects encountered in the real world, current grasp models perform poorly on unseen objects, negatively affecting users' independence and quality of life. To address this: (i) we define semantic projection, the ability of a model to generalize to unseen object types and show that conventional models like YOLO, despite 80% training accuracy, drop to 15% on unseen objects. (ii) we propose Grasp-LLaVA, a Grasp Vision Language Model enabling human-like reasoning to infer the suitable grasp type estimate based on the object's physical characteristics resulting in a significant 50.2% accuracy over unseen object types compared to 36.7% accuracy of an SOTA grasp estimation model. Lastly, to bridge the performance-latency gap, we propose Hybrid Grasp Network (HGN), an edge-cloud deployment infrastructure enabling fast grasp estimation on edge and accurate cloud inference as a fail-safe, effectively expanding the latency vs. accuracy Pareto. HGN with confidence calibration (DC) enables dynamic switching between edge and cloud models, improving semantic projection accuracy by 5.6% (to 42.3%) with 3.5x speedup over the unseen object types. Over a real-world sample mix, it reaches 86% average accuracy (12.2% gain over edge-only), and 2.2x faster inference than Grasp-LLaVA alone.

Related papers

DD-MDN: Human Trajectory Forecasting with Diffusion-Based Dual Mixture Density Networks and Uncertainty Self-Calibration [7.946001746395269]
Human Trajectory Forecasting (HTF) predicts future human movements from past trajectories and environmental context.<n>We propose DD-MDN, an end-to-end probabilistic HTF model that combines high positional accuracy, calibrated uncertainty, and robustness to short observations.<n> Experiments on the ETH/UCY, SDD, inD, and IMPTC datasets demonstrate state-of-the-art accuracy, robustness at short observation intervals, and reliable uncertainty modeling.
arXiv Detail & Related papers (2026-02-11T08:59:33Z)
Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching Activities [0.0]
Two time-series models were trained using bidirectional long short-term memory (BLSTM) and transformer architectures.<n>The results indicated that the new cost function decreased the prediction error of the models by approximately 8% and 21% for the arm and leg models, respectively.
arXiv Detail & Related papers (2025-11-25T18:40:48Z)
WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion [13.760323763554448]
We propose four novel extensions to model-based 6D pose estimation methods.<n>We show that our proposed approach achieves more than 5% improvement in accuracy on ICBIN and more than 2% on BOP dataset benchmarks.
arXiv Detail & Related papers (2025-11-19T21:01:49Z)
Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction [2.1711205684359247]
This paper proposes a novel deep learning model that enhances the Social LSTM with a new Dynamic Occupied Space loss function.<n>This loss function guides Social LSTM in learning to avoid realistic collisions without increasing displacement error across different crowd densities.<n>The proposed model consistently outperforms several state-of-the-art deep learning models across most test sets.
arXiv Detail & Related papers (2025-11-12T20:49:58Z)
Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation [10.371863139473456]
VST-Pose is a novel framework for accurate pose estimation using WiFi channel state information.<n>Our method achieves 92.2% accuracy on the PCK@50 metric, outperforming existing methods by 8.3% in PCK@50 on the self-collected dataset.<n>The proposed system provides a reliable and privacy-aware solution for continuous human motion analysis in indoor environments.
arXiv Detail & Related papers (2025-07-13T15:11:18Z)
RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
Efficient Brain Tumor Classification with Lightweight CNN Architecture: A Novel Approach [0.0]
Brain tumor classification using MRI images is critical in medical diagnostics, where early and accurate detection significantly impacts patient outcomes.<n>Recent advancements in deep learning (DL) have shown promise, but many models struggle with balancing accuracy and computational efficiency.<n>We propose a novel model architecture integrating separable convolutions and squeeze and excitation (SE) blocks, designed to enhance feature extraction while maintaining computational efficiency.
arXiv Detail & Related papers (2025-02-01T21:06:42Z)
Uncertainty-boosted Robust Video Activity Anticipation [72.14155465769201]
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision to autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. We propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results.
arXiv Detail & Related papers (2024-04-29T12:31:38Z)
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture [17.78048571619575]
Safety and robustness are crucial factors in developing trustworthy autonomous vehicles. We propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance.
arXiv Detail & Related papers (2024-04-04T20:04:12Z)
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection. We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset. Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z)
A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head [0.0]
Current state-of-the-art deep learning (DL)-based damage detection models often lack superior feature extraction capability in complex and noisy environments. DenseSPH-YOLOv5 is a real-time DL-based high-performance damage detection model where DenseNet blocks have been integrated with the backbone. DenseSPH-YOLOv5 obtains a mean average precision (mAP) value of 85.25 %, F1-score of 81.18 %, and precision (P) value of 89.51 % outperforming current state-of-the-art models.
arXiv Detail & Related papers (2023-03-07T22:53:36Z)
Modeling Object Dissimilarity for Deep Saliency Prediction [86.14710352178967]
We introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects. Our approach is general, allowing us to fuse our object dissimilarities with features extracted by any deep saliency prediction network.
arXiv Detail & Related papers (2021-04-08T16:10:37Z)
Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.