MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping
- URL: http://arxiv.org/abs/2507.02672v1
- Date: Thu, 03 Jul 2025 14:36:45 GMT
- Title: MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping
- Authors: Qingyu Fan, Yinghao Cai, Chao Li, Chunting Jiao, Xudong Zheng, Tao Lu, Bin Liang, Shuo Wang,
- Abstract summary: MISCGrasp is a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping.<n>We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features.<n>Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks.
- Score: 15.127239823566194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.
Related papers
- Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention [0.0]
Learnable Multi-Scale Wavelet Transformer (LMWT) is a novel architecture that replaces the standard dot-product self-attention.<n>We present the detailed mathematical formulation of the learnable Haar wavelet module and its integration into the transformer framework.<n>Our results indicate that the LMWT achieves competitive performance while offering substantial computational advantages.
arXiv Detail & Related papers (2025-04-08T22:16:54Z) - Mixture of Attention Yields Accurate Results for Tabular Data [21.410818837489973]
We propose MAYA, an encoder-decoder transformer-based framework.<n>In the encoder, we design a Mixture of Attention (MOA) that constructs multiple parallel attention branches.<n>We employ collaborative learning with a dynamic consistency weight constraint to produce more robust representations.
arXiv Detail & Related papers (2025-02-18T03:43:42Z) - Symmetries-enhanced Multi-Agent Reinforcement Learning [25.383183391244373]
Multi-agent reinforcement learning has emerged as a powerful framework for enabling agents to learn complex, coordinated behaviors.<n>Recent advancements have sought to alleviate those issues by embedding intrinsic symmetries of the systems in the policy.<n>This paper presents a novel framework for embedding extrinsic symmetries in multi-agent system dynamics.
arXiv Detail & Related papers (2025-01-02T08:41:31Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Multi-Scale Implicit Transformer with Re-parameterize for
Arbitrary-Scale Super-Resolution [2.4865475189445405]
Multi-Scale Implicit Transformer (MSIT)
MSIT consists of an Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA)
arXiv Detail & Related papers (2024-03-11T09:23:20Z) - Scale Equalization for Multi-Level Feature Fusion [8.541075075344438]
We find that multi-level features from parallel branches are on different scales.
The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent.
We propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling.
arXiv Detail & Related papers (2024-02-02T05:25:51Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Learning Multiscale Consistency for Self-supervised Electron Microscopy
Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes.
Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations.
It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation
for Reference-based Super-Resolution [48.093500219958834]
We propose an Accelerated Multi-Scale Aggregation network (AMSA) for Reference-based Super-Resolution.
The proposed AMSA achieves superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2022-01-12T08:40:23Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.