Unsupervised Domain Adaptation through Inter-modal Rotation for RGB-D
Object Recognition
- URL: http://arxiv.org/abs/2004.10016v1
- Date: Tue, 21 Apr 2020 13:53:55 GMT
- Title: Unsupervised Domain Adaptation through Inter-modal Rotation for RGB-D
Object Recognition
- Authors: Mohammad Reza Loghmani, Luca Robbiano, Mirco Planamente, Kiru Park,
Barbara Caputo and Markus Vincze
- Abstract summary: We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image.
Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image.
- Score: 31.24587317555857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Domain Adaptation (DA) exploits the supervision of a label-rich
source dataset to make predictions on an unlabeled target dataset by aligning
the two data distributions. In robotics, DA is used to take advantage of
automatically generated synthetic data, that come with "free" annotation, to
make effective predictions on real data. However, existing DA methods are not
designed to cope with the multi-modal nature of RGB-D data, which are widely
used in robotic vision. We propose a novel RGB-D DA method that reduces the
synthetic-to-real domain shift by exploiting the inter-modal relation between
the RGB and depth image. Our method consists of training a convolutional neural
network to solve, in addition to the main recognition task, the pretext task of
predicting the relative rotation between the RGB and depth image. To evaluate
our method and encourage further research in this area, we define two benchmark
datasets for object categorization and instance recognition. With extensive
experiments, we show the benefits of leveraging the inter-modal relations for
RGB-D DA.
Related papers
- DCANet: Differential Convolution Attention Network for RGB-D Semantic
Segmentation [2.2032272277334375]
We propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data.
We extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies.
A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data.
arXiv Detail & Related papers (2022-10-13T05:17:34Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Visible-Infrared Person Re-Identification Using Privileged Intermediate
Information [10.816003787786766]
Cross-modal person re-identification (ReID) is challenging due to the large domain shift in data distributions between RGB and IR modalities.
This paper introduces a novel approach for a creating intermediate virtual domain that acts as bridges between the two main domains.
We devised a new method to generate images between visible and infrared domains that provide additional information to train a deep ReID model.
arXiv Detail & Related papers (2022-09-19T21:08:14Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z) - UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional
Variational Autoencoders [81.5490760424213]
We propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network.
arXiv Detail & Related papers (2020-04-13T04:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.