Related papers: Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation

Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation

URL: http://arxiv.org/abs/2403.11511v1
Date: Mon, 18 Mar 2024 06:42:38 GMT
Title: Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation
Authors: Haoxiang Ma, Ran Qin, Modi shi, Boyang Gao, Di Huang,
Abstract summary: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. We present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment.
Score: 19.384129689848294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We then propose a global-to-local alignment pipeline with individual global domain classifiers for scene features of RGB and depth images as well as a local one specifically working for grasp features in the two modalities. In particular, we propose a grasp prototype adaptation module, which aims to facilitate fine-grained local feature alignment by dynamically updating and matching the grasp prototypes from the simulation and real-world scenarios throughout the training process. Due to such designs, the proposed method substantially reduces the domain shift and thus leads to consistent performance improvements. Extensive experiments are conducted on the GraspNet-Planar benchmark and physical environment, and superior results are achieved which demonstrate the effectiveness of our method.

Related papers

RUN: Reversible Unfolding Network for Concealed Object Segmentation [61.13528324971598]
reversible strategies across both mask and RGB domains. We propose the Reversible Unfolding Network (RUN), which applies reversible strategies across both mask and RGB domains.
arXiv Detail & Related papers (2025-01-30T22:19:15Z)
VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition [54.27379947727035]
This paper proposes a novel PEFT strategy to adapt the pre-trained foundation vision models for the RGB-Event-based classification. The frame difference of the dual modalities is also considered to capture the motion cues via the frame difference backbone network. The source code and pre-trained models will be released on urlhttps://github.com/Event-AHU/VELoRA.
arXiv Detail & Related papers (2024-12-28T07:38:23Z)
RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications [55.24463002889]
We focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim) In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors. RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.
arXiv Detail & Related papers (2024-04-05T08:52:32Z)
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM [6.242958695705305]
Implicit neural representation (INR) in combination with geometric rendering has been employed in real-time dense RGB-D SLAM. We establish the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions. We propose explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system.
arXiv Detail & Related papers (2024-03-28T14:59:56Z)
Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data. We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios. We re-emphasize the adaptation process across Sim2Real domains in this paper. We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z)
Dual-Flow Transformation Network for Deformable Image Registration with Region Consistency Constraint [95.30864269428808]
Current deep learning (DL)-based image registration approaches learn the spatial transformation from one image to another by leveraging a convolutional neural network. We present a novel dual-flow transformation network with region consistency constraint which maximizes the similarity of ROIs within a pair of images. Experiments on four public 3D MRI datasets show that the proposed method achieves the best registration performance in accuracy and generalization.
arXiv Detail & Related papers (2021-12-04T05:30:44Z)
G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities. This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z)
Neural BRDF Representation and Importance Sampling [79.84316447473873]
We present a compact neural network-based representation of reflectance BRDF data. We encode BRDFs as lightweight networks, and propose a training scheme with adaptive angular sampling. We evaluate encoding results on isotropic and anisotropic BRDFs from multiple real-world datasets.
arXiv Detail & Related papers (2021-02-11T12:00:24Z)
DASGIL: Domain Adaptation for Semantic and Geometric-aware Image-based Localization [27.294822556484345]
Long-term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics. We propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition.
arXiv Detail & Related papers (2020-10-01T17:44:25Z)
Domain-invariant Similarity Activation Map Contrastive Learning for Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.