Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation
- URL: http://arxiv.org/abs/2204.09847v1
- Date: Thu, 21 Apr 2022 02:35:20 GMT
- Title: Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation
- Authors: Lu Zhang, Siqi Zhang, Xu Yang, Zhiyong Liu
- Abstract summary: Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
- Score: 14.258456366985444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting unseen objects is a crucial ability for the robot since it may
encounter new environments during the operation. Recently, a popular solution
is leveraging RGB-D features of large-scale synthetic data and directly
applying the model to unseen real-world scenarios. However, even though depth
data have fair generalization ability, the domain shift due to the Sim2Real gap
is inevitable, which presents a key challenge to the unseen object instance
segmentation (UOIS) model. To tackle this problem, we re-emphasize the
adaptation process across Sim2Real domains in this paper. Specifically, we
propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation
(FTEA) based on parameters of the BatchNorm layer. To construct the learning
objective for test-time back-propagation, we propose a novel non-parametric
entropy objective that can be implemented without explicit classification
layers. Moreover, we design a cross-modality knowledge distillation module to
encourage the information transfer during test time. The proposed method can be
efficiently conducted with test-time images, without requiring annotations or
revisiting the large-scale synthetic training data. Besides significant time
savings, the proposed method consistently improves segmentation results on both
overlap and boundary metrics, achieving state-of-the-art performances on two
real-world RGB-D image datasets. We hope our work could draw attention to the
test-time adaptation and reveal a promising direction for robot perception in
unseen environments.
Related papers
- Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation [19.384129689848294]
This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem.
We present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment.
arXiv Detail & Related papers (2024-03-18T06:42:38Z) - RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features [6.358423536732677]
We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
arXiv Detail & Related papers (2024-03-04T05:03:24Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Category-Level 6D Object Pose and Size Estimation using Self-Supervised
Deep Prior Deformation Networks [39.6823489555449]
It is difficult to precisely annotate object instances and their semantics in 3D space, and as such, synthetic data are extensively used for these tasks.
In this work, we aim to address this issue in the task setting of Sim2Real, unsupervised domain adaptation for category-level 6D object pose and size estimation.
We propose a method that is built upon a novel CAMERA Deep Prior Deformation Network, shortened as DPDN.
arXiv Detail & Related papers (2022-07-12T10:24:52Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis [16.5390740005143]
We propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIART.
We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed.
arXiv Detail & Related papers (2020-11-13T15:17:31Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Unseen Object Instance Segmentation for Robotic Environments [67.88276573341734]
We propose a method to segment unseen object instances in tabletop environments.
UOIS-Net is comprised of two stages: first, it operates only on depth to produce object instance center votes in 2D or 3D.
Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic.
arXiv Detail & Related papers (2020-07-16T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.