RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
- URL: http://arxiv.org/abs/2206.07047v1
- Date: Tue, 14 Jun 2022 17:59:59 GMT
- Title: RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
- Authors: Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti,
Stefano Mattoccia, Luigi Di Stefano
- Abstract summary: We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
- Score: 49.28588927121722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of registering synchronized color (RGB) and
multi-spectral (MS) images featuring very different resolution by solving
stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset
framing 13 different scenes in indoor environments and providing a total of 34
image pairs annotated with semi-dense, high-resolution ground-truth labels in
the form of disparity maps. To tackle the task, we propose a deep learning
architecture trained in a self-supervised manner by exploiting a further RGB
camera, required only during training data acquisition. In this setup, we can
conveniently learn cross-modal matching in the absence of ground-truth labels
by distilling knowledge from an easier RGB-RGB matching task based on a
collection of about 11K unlabeled image triplets. Experiments show that the
proposed pipeline sets a good performance bar (1.16 pixels average registration
error) for future research on this novel, challenging task.
Related papers
- Discovering an Image-Adaptive Coordinate System for Photography Processing [51.164345878060956]
We propose a novel algorithm, IAC, to learn an image-adaptive coordinate system in the RGB color space before performing curve operations.
This end-to-end trainable approach enables us to efficiently adjust images with a jointly learned image-adaptive coordinate system and curves.
arXiv Detail & Related papers (2025-01-11T06:20:07Z) - Alignment-Free RGB-T Salient Object Detection: A Large-scale Dataset and Progressive Correlation Network [17.777510689748173]
We construct a large-scale and high-diversity unaligned RGB-T SOD dataset named UVT20K, comprising 20,000 image pairs, 407 scenes, and 1256 object categories.
To support the exploration for further research, each sample in UVT20K is annotated with a comprehensive set of ground truths, including saliency masks, scribbles, boundaries, and challenge attributes.
In addition, we propose a Progressive Correlation Network (PCNet), which models inter- and intra-modal correlations on the basis of explicit alignment to achieve accurate predictions in unaligned image pairs.
arXiv Detail & Related papers (2024-12-19T06:52:12Z) - Semantic RGB-D Image Synthesis [22.137419841504908]
We introduce semantic RGB-D image synthesis to address this problem.
Current approaches, however, are uni-modal and cannot cope with multi-modal data.
We propose a generator for multi-modal data that separates modal-independent information of the semantic layout from the modal-dependent information.
arXiv Detail & Related papers (2023-08-22T11:16:24Z) - Self-Supervised Modality-Aware Multiple Granularity Pre-Training for
RGB-Infrared Person Re-Identification [9.624510941236837]
Modality-Aware Multiple Granularity Learning (MMGL) is a self-supervised pre-training alternative to ImageNet pre-training.
MMGL learns better representations (+6.47% Rank-1) with faster training speed (converge in few hours) and solider data efficiency (5% data size) than ImageNet pre-training.
Results suggest it generalizes well to various existing models, losses and has promising transferability across datasets.
arXiv Detail & Related papers (2021-12-12T04:40:33Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Cross-Modality Paired-Images Generation for RGB-Infrared Person
Re-Identification [29.92261627385826]
We propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments.
Our method can explicitly remove modality-specific features and the modality variation can be better reduced.
Our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP.
arXiv Detail & Related papers (2020-02-10T22:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.