Cross-Modality Proposal-guided Feature Mining for Unregistered
RGB-Thermal Pedestrian Detection
- URL: http://arxiv.org/abs/2308.12111v1
- Date: Wed, 23 Aug 2023 12:58:51 GMT
- Title: Cross-Modality Proposal-guided Feature Mining for Unregistered
RGB-Thermal Pedestrian Detection
- Authors: Chao Tian, Zikun Zhou, Yuqing Huang, Gaojun Li, and Zhenyu He
- Abstract summary: We propose a new paradigm for unregistered RGB-T pedestrian detection, which predicts two separate pedestrian locations in the RGB and thermal images, respectively.
Specifically, we propose a cross-modality proposal-guided feature mining (CPFM) mechanism to extract the two precise fusion features for representing the pedestrian in the two modalities, even if the RGB-T image pair is unaligned.
With the CPFM mechanism, we build a two-stream dense detector; it predicts the two pedestrian locations in the two modalities based on the corresponding fusion feature mined by the CPFM mechanism.
- Score: 8.403885039441263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-Thermal (RGB-T) pedestrian detection aims to locate the pedestrians in
RGB-T image pairs to exploit the complementation between the two modalities for
improving detection robustness in extreme conditions. Most existing algorithms
assume that the RGB-T image pairs are well registered, while in the real world
they are not aligned ideally due to parallax or different field-of-view of the
cameras. The pedestrians in misaligned image pairs may locate at different
positions in two images, which results in two challenges: 1) how to achieve
inter-modality complementation using spatially misaligned RGB-T pedestrian
patches, and 2) how to recognize the unpaired pedestrians at the boundary. To
deal with these issues, we propose a new paradigm for unregistered RGB-T
pedestrian detection, which predicts two separate pedestrian locations in the
RGB and thermal images, respectively. Specifically, we propose a cross-modality
proposal-guided feature mining (CPFM) mechanism to extract the two precise
fusion features for representing the pedestrian in the two modalities, even if
the RGB-T image pair is unaligned. It enables us to effectively exploit the
complementation between the two modalities. With the CPFM mechanism, we build a
two-stream dense detector; it predicts the two pedestrian locations in the two
modalities based on the corresponding fusion feature mined by the CPFM
mechanism. Besides, we design a data augmentation method, named Homography, to
simulate the discrepancy in scales and views between images. We also
investigate two non-maximum suppression (NMS) methods for post-processing.
Favorable experimental results demonstrate the effectiveness and robustness of
our method in dealing with unregistered pedestrians with different shifts.
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Breaking Modality Disparity: Harmonized Representation for Infrared and
Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration.
We employ homography to simulate the deformation between different planes.
We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - Translation, Scale and Rotation: Cross-Modal Alignment Meets
RGB-Infrared Vehicle Detection [10.460296317901662]
We find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems.
We propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem.
A two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images.
arXiv Detail & Related papers (2022-09-28T03:06:18Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Fast Road Segmentation via Uncertainty-aware Symmetric Network [15.05244258071472]
Prior methods cannot achieve high inference speed and high accuracy in both ways.
The different properties of RGB and depth data are not well-exploited, limiting the reliability of predicted road.
We propose an uncertainty-aware symmetric network (USNet) to achieve a trade-off between speed and accuracy by fully fusing RGB and depth data.
arXiv Detail & Related papers (2022-03-09T06:11:29Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z) - Cross-Modality Paired-Images Generation for RGB-Infrared Person
Re-Identification [29.92261627385826]
We propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments.
Our method can explicitly remove modality-specific features and the modality variation can be better reduced.
Our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP.
arXiv Detail & Related papers (2020-02-10T22:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.