Semantic RGB-D Image Synthesis
- URL: http://arxiv.org/abs/2308.11356v2
- Date: Tue, 19 Sep 2023 02:52:19 GMT
- Title: Semantic RGB-D Image Synthesis
- Authors: Shijie Li, Rong Li, Juergen Gall
- Abstract summary: We introduce semantic RGB-D image synthesis to address this problem.
Current approaches, however, are uni-modal and cannot cope with multi-modal data.
We propose a generator for multi-modal data that separates modal-independent information of the semantic layout from the modal-dependent information.
- Score: 22.137419841504908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting diverse sets of training images for RGB-D semantic image
segmentation is not always possible. In particular, when robots need to operate
in privacy-sensitive areas like homes, the collection is often limited to a
small set of locations. As a consequence, the annotated images lack diversity
in appearance and approaches for RGB-D semantic image segmentation tend to
overfit the training data. In this paper, we thus introduce semantic RGB-D
image synthesis to address this problem. It requires synthesising a
realistic-looking RGB-D image for a given semantic label map. Current
approaches, however, are uni-modal and cannot cope with multi-modal data.
Indeed, we show that extending uni-modal approaches to multi-modal data does
not perform well. In this paper, we therefore propose a generator for
multi-modal data that separates modal-independent information of the semantic
layout from the modal-dependent information that is needed to generate an RGB
and a depth image, respectively. Furthermore, we propose a discriminator that
ensures semantic consistency between the label maps and the generated images
and perceptual similarity between the real and generated images. Our
comprehensive experiments demonstrate that the proposed method outperforms
previous uni-modal methods by a large margin and that the accuracy of an
approach for RGB-D semantic segmentation can be significantly improved by
mixing real and generated images during training.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - A Multi-modal Approach to Single-modal Visual Place Classification [2.580765958706854]
Multi-sensor fusion approaches combining RGB and depth (D) have gained popularity in recent years.
We reformulate the single-modal RGB image classification task as a pseudo multi-modal RGB-D classification problem.
A practical, fully self-supervised framework for training, appropriately processing, fusing, and classifying these two modalities is described.
arXiv Detail & Related papers (2023-05-10T14:04:21Z) - Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation [21.950751953721817]
We propose a novel Bi-directional Fractal Cross Fusion Network (BiFCNet) for semantic segmentation.
We use RGB images with rich color features as input to our network in which the Fractal Cross Fusion module fuses RGB and depth data.
To reduce the cost of real data collection, we propose a data augmentation method based on an adversarial strategy.
arXiv Detail & Related papers (2023-05-05T03:21:55Z) - Complementary Random Masking for RGB-Thermal Semantic Segmentation [63.93784265195356]
RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions.
This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities.
We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-30T13:57:21Z) - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z) - Self-Supervised Modality-Aware Multiple Granularity Pre-Training for
RGB-Infrared Person Re-Identification [9.624510941236837]
Modality-Aware Multiple Granularity Learning (MMGL) is a self-supervised pre-training alternative to ImageNet pre-training.
MMGL learns better representations (+6.47% Rank-1) with faster training speed (converge in few hours) and solider data efficiency (5% data size) than ImageNet pre-training.
Results suggest it generalizes well to various existing models, losses and has promising transferability across datasets.
arXiv Detail & Related papers (2021-12-12T04:40:33Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.