RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
- URL: http://arxiv.org/abs/2206.07047v1
- Date: Tue, 14 Jun 2022 17:59:59 GMT
- Title: RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
- Authors: Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti,
Stefano Mattoccia, Luigi Di Stefano
- Abstract summary: We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
- Score: 49.28588927121722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of registering synchronized color (RGB) and
multi-spectral (MS) images featuring very different resolution by solving
stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset
framing 13 different scenes in indoor environments and providing a total of 34
image pairs annotated with semi-dense, high-resolution ground-truth labels in
the form of disparity maps. To tackle the task, we propose a deep learning
architecture trained in a self-supervised manner by exploiting a further RGB
camera, required only during training data acquisition. In this setup, we can
conveniently learn cross-modal matching in the absence of ground-truth labels
by distilling knowledge from an easier RGB-RGB matching task based on a
collection of about 11K unlabeled image triplets. Experiments show that the
proposed pipeline sets a good performance bar (1.16 pixels average registration
error) for future research on this novel, challenging task.
Related papers
- Semantic RGB-D Image Synthesis [22.137419841504908]
We introduce semantic RGB-D image synthesis to address this problem.
Current approaches, however, are uni-modal and cannot cope with multi-modal data.
We propose a generator for multi-modal data that separates modal-independent information of the semantic layout from the modal-dependent information.
arXiv Detail & Related papers (2023-08-22T11:16:24Z) - Edge-guided Multi-domain RGB-to-TIR image Translation for Training
Vision Tasks with Challenging Labels [12.701191873813583]
The insufficient number of annotated thermal infrared (TIR) image datasets hinders TIR image-based deep learning networks to have comparable performances to that of RGB.
We propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels.
We have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in end point error by 56.5% on average and the best object detection mAP of 23.9% respectively.
arXiv Detail & Related papers (2023-01-30T06:44:38Z) - Self-Supervised Modality-Aware Multiple Granularity Pre-Training for
RGB-Infrared Person Re-Identification [9.624510941236837]
Modality-Aware Multiple Granularity Learning (MMGL) is a self-supervised pre-training alternative to ImageNet pre-training.
MMGL learns better representations (+6.47% Rank-1) with faster training speed (converge in few hours) and solider data efficiency (5% data size) than ImageNet pre-training.
Results suggest it generalizes well to various existing models, losses and has promising transferability across datasets.
arXiv Detail & Related papers (2021-12-12T04:40:33Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modality Paired-Images Generation for RGB-Infrared Person
Re-Identification [29.92261627385826]
We propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments.
Our method can explicitly remove modality-specific features and the modality variation can be better reduced.
Our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP.
arXiv Detail & Related papers (2020-02-10T22:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.