Cross-Modality Paired-Images Generation for RGB-Infrared Person
Re-Identification
- URL: http://arxiv.org/abs/2002.04114v2
- Date: Tue, 18 Feb 2020 00:03:01 GMT
- Title: Cross-Modality Paired-Images Generation for RGB-Infrared Person
Re-Identification
- Authors: Guan-An Wang, Tianzhu Zhang. Yang Yang, Jian Cheng, Jianlong Chang, Xu
Liang, Zengguang Hou
- Abstract summary: We propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments.
Our method can explicitly remove modality-specific features and the modality variation can be better reduced.
Our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP.
- Score: 29.92261627385826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-Infrared (IR) person re-identification is very challenging due to the
large cross-modality variations between RGB and IR images. The key solution is
to learn aligned features to the bridge RGB and IR modalities. However, due to
the lack of correspondence labels between every pair of RGB and IR images, most
methods try to alleviate the variations with set-level alignment by reducing
the distance between the entire RGB and IR sets. However, this set-level
alignment may lead to misalignment of some instances, which limits the
performance for RGB-IR Re-ID. Different from existing methods, in this paper,
we propose to generate cross-modality paired-images and perform both global
set-level and fine-grained instance-level alignments. Our proposed method
enjoys several merits. First, our method can perform set-level alignment by
disentangling modality-specific and modality-invariant features. Compared with
conventional methods, ours can explicitly remove the modality-specific features
and the modality variation can be better reduced. Second, given cross-modality
unpaired-images of a person, our method can generate cross-modality paired
images from exchanged images. With them, we can directly perform instance-level
alignment by minimizing distances of every pair of images. Extensive
experimental results on two standard benchmarks demonstrate that the proposed
model favourably against state-of-the-art methods. Especially, on SYSU-MM01
dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and
mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - Cross-Modality Proposal-guided Feature Mining for Unregistered
RGB-Thermal Pedestrian Detection [8.403885039441263]
We propose a new paradigm for unregistered RGB-T pedestrian detection, which predicts two separate pedestrian locations in the RGB and thermal images, respectively.
Specifically, we propose a cross-modality proposal-guided feature mining (CPFM) mechanism to extract the two precise fusion features for representing the pedestrian in the two modalities, even if the RGB-T image pair is unaligned.
With the CPFM mechanism, we build a two-stream dense detector; it predicts the two pedestrian locations in the two modalities based on the corresponding fusion feature mined by the CPFM mechanism.
arXiv Detail & Related papers (2023-08-23T12:58:51Z) - Semantic RGB-D Image Synthesis [22.137419841504908]
We introduce semantic RGB-D image synthesis to address this problem.
Current approaches, however, are uni-modal and cannot cope with multi-modal data.
We propose a generator for multi-modal data that separates modal-independent information of the semantic layout from the modal-dependent information.
arXiv Detail & Related papers (2023-08-22T11:16:24Z) - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z) - Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared
Person Re-Identification [84.32086702849338]
We propose a novel modality-adaptive mixup and invariant decomposition (MID) approach for RGB-infrared person re-identification.
MID designs a modality-adaptive mixup scheme to generate suitable mixed modality images between RGB and infrared images.
Experiments on two challenging benchmarks demonstrate superior performance of MID over state-of-the-art methods.
arXiv Detail & Related papers (2022-03-03T14:26:49Z) - Self-Supervised Modality-Aware Multiple Granularity Pre-Training for
RGB-Infrared Person Re-Identification [9.624510941236837]
Modality-Aware Multiple Granularity Learning (MMGL) is a self-supervised pre-training alternative to ImageNet pre-training.
MMGL learns better representations (+6.47% Rank-1) with faster training speed (converge in few hours) and solider data efficiency (5% data size) than ImageNet pre-training.
Results suggest it generalizes well to various existing models, losses and has promising transferability across datasets.
arXiv Detail & Related papers (2021-12-12T04:40:33Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Multi-Scale Cascading Network with Compact Feature Learning for
RGB-Infrared Person Re-Identification [35.55895776505113]
Multi-Scale Part-Aware Cascading framework (MSPAC) is formulated by aggregating multi-scale fine-grained features from part to global.
Cross-modality correlations can thus be efficiently explored on salient features for distinctive modality-invariant feature learning.
arXiv Detail & Related papers (2020-12-12T15:39:11Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.