Alignment-Free RGB-T Salient Object Detection: A Large-scale Dataset and Progressive Correlation Network
- URL: http://arxiv.org/abs/2412.14576v1
- Date: Thu, 19 Dec 2024 06:52:12 GMT
- Title: Alignment-Free RGB-T Salient Object Detection: A Large-scale Dataset and Progressive Correlation Network
- Authors: Kunpeng Wang, Keke Chen, Chenglong Li, Zhengzheng Tu, Bin Luo,
- Abstract summary: We construct a large-scale and high-diversity unaligned RGB-T SOD dataset named UVT20K, comprising 20,000 image pairs, 407 scenes, and 1256 object categories.
To support the exploration for further research, each sample in UVT20K is annotated with a comprehensive set of ground truths, including saliency masks, scribbles, boundaries, and challenge attributes.
In addition, we propose a Progressive Correlation Network (PCNet), which models inter- and intra-modal correlations on the basis of explicit alignment to achieve accurate predictions in unaligned image pairs.
- Score: 17.777510689748173
- License:
- Abstract: Alignment-free RGB-Thermal (RGB-T) salient object detection (SOD) aims to achieve robust performance in complex scenes by directly leveraging the complementary information from unaligned visible-thermal image pairs, without requiring manual alignment. However, the labor-intensive process of collecting and annotating image pairs limits the scale of existing benchmarks, hindering the advancement of alignment-free RGB-T SOD. In this paper, we construct a large-scale and high-diversity unaligned RGB-T SOD dataset named UVT20K, comprising 20,000 image pairs, 407 scenes, and 1256 object categories. All samples are collected from real-world scenarios with various challenges, such as low illumination, image clutter, complex salient objects, and so on. To support the exploration for further research, each sample in UVT20K is annotated with a comprehensive set of ground truths, including saliency masks, scribbles, boundaries, and challenge attributes. In addition, we propose a Progressive Correlation Network (PCNet), which models inter- and intra-modal correlations on the basis of explicit alignment to achieve accurate predictions in unaligned image pairs. Extensive experiments conducted on unaligned and aligned datasets demonstrate the effectiveness of our method.Code and dataset are available at https://github.com/Angknpng/PCNet.
Related papers
- RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker [4.235252053339947]
This paper introduces a new challenging RGB-Sonar (RGB-S) tracking task.
It investigates how to achieve efficient tracking of an underwater target through the interaction of RGB and sonar modalities.
arXiv Detail & Related papers (2024-06-11T12:01:11Z) - Alignment-Free RGBT Salient Object Detection: Semantics-guided Asymmetric Correlation Network and A Unified Benchmark [15.435695491233982]
RGB and Thermal (RGBT) Salient Object Detection (SOD) aims to achieve high-quality saliency prediction.
Existing methods are tailored for manually aligned image pairs, which are labor-intensive.
We make the first attempt to address RGBT SOD for initially captured RGB and thermal image pairs without manual alignment.
arXiv Detail & Related papers (2024-06-03T01:01:58Z) - Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data.
We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RGBT Salient Object Detection: A Large-scale Dataset and Benchmark [12.14043884641457]
Taking advantage of RGB and thermal infrared images becomes a new research direction for detecting salient object in complex scenes.
This work contributes such a RGBT image dataset named VT5000, including 5000 spatially aligned RGBT image pairs with ground truth annotations.
We propose a powerful baseline approach, which extracts multi-level features within each modality and aggregates these features of all modalities with the attention mechanism.
arXiv Detail & Related papers (2020-07-07T07:58:14Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.