Related papers: MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping

MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping

URL: http://arxiv.org/abs/2602.06504v1
Date: Fri, 06 Feb 2026 08:56:21 GMT
Title: MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping
Authors: Stephany Ortuno-Chanelo, Paolo Rabino, Enrico Civitelli, Tatiana Tommasi, Raffaello Camoriano,
Abstract summary: MultiGraspNet is a novel multitask 3D deep learning method that predicts feasible poses simultaneously for parallel and vacuum grippers within a unified framework.<n>We run real-world experiments on a single-arm multi-gripper robotic setup showing that our approach outperforms the vacuum baseline.
Score: 8.558823208942277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-based models for robotic grasping automate critical, repetitive, and draining industrial tasks. Existing approaches are typically limited in two ways: they either target a single gripper and are potentially applied on costly dual-arm setups, or rely on custom hybrid grippers that require ad-hoc learning procedures with logic that cannot be transferred across tasks, restricting their general applicability. In this work, we present MultiGraspNet, a novel multitask 3D deep learning method that predicts feasible poses simultaneously for parallel and vacuum grippers within a unified framework, enabling a single robot to handle multiple end effectors. The model is trained on the richly annotated GraspNet-1Billion and SuctionNet-1Billion datasets, which have been aligned for the purpose, and generates graspability masks quantifying the suitability of each scene point for successful grasps. By sharing early-stage features while maintaining gripper-specific refiners, MultiGraspNet effectively leverages complementary information across grasping modalities, enhancing robustness and adaptability in cluttered scenes. We characterize MultiGraspNet's performance with an extensive experimental analysis, demonstrating its competitiveness with single-task models on relevant benchmarks. We run real-world experiments on a single-arm multi-gripper robotic setup showing that our approach outperforms the vacuum baseline, grasping 16% percent more seen objects and 32% more of the novel ones, while obtaining competitive results for the parallel task.

Related papers

RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation [37.52152452548065]
RoboGene is an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks.<n>We conduct extensive quantitative analysis and large-scale real-world experiments, collecting datasets of 18k trajectories.<n>Results demonstrate that RoboGene significantly outperforms state-of-the-art foundation models.
arXiv Detail & Related papers (2026-02-18T13:29:43Z)
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation [8.214084596349744]
EchoMimicV3 is an efficient framework that unifies multi-task and multi-modal human animation.<n>With a minimal model size of 1.3 billion parameters, EchoMimicV3 achieves competitive performance in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-07-05T05:36:26Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [79.58755811919366]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.<n>It is designed to accurately detect horizontal or oriented objects from any sensor modality.<n>This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks. RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model. Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation. We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z)
DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes [22.835683657191936]
Multi-fingered robotic grasping can potentially perform complex object manipulation. Current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge.
arXiv Detail & Related papers (2023-08-01T11:21:07Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Towards Practical Multi-Robot Hybrid Tasks Allocation for Autonomous Cleaning [40.715435411065336]
We formulate multi-robot hybrid-task allocation under the uncertain cleaning environment as a robust optimization problem. We establish a dataset of emph100 instances made from floor plans, each of which has 2D manually-labeled images and a 3D model. We provide comprehensive results on the collected dataset using three traditional optimization approaches and a deep reinforcement learning-based solver.
arXiv Detail & Related papers (2023-03-12T01:15:08Z)
MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks. Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.