Learning Modal-Invariant and Temporal-Memory for Video-based
Visible-Infrared Person Re-Identification
- URL: http://arxiv.org/abs/2208.02450v1
- Date: Thu, 4 Aug 2022 04:43:52 GMT
- Title: Learning Modal-Invariant and Temporal-Memory for Video-based
Visible-Infrared Person Re-Identification
- Authors: Xinyu Lin, Jinxing Li, Zeyu Ma, Huafeng Li, Shuang Li, Kaixiong Xu,
Guangming Lu, David Zhang
- Abstract summary: We primarily study the video-based cross-modal person Re-ID method.
We prove that with the increase of frames in a tracklet, the performance does meet more enhancement.
A novel method is proposed, which projects two modalities to a modal-invariant subspace.
- Score: 46.49866514866999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks for the cross-modal retrieval techniques, visible-infrared (RGB-IR)
person re-identification (Re-ID) is achieved by projecting them into a common
space, allowing person Re-ID in 24-hour surveillance systems. However, with
respect to the probe-to-gallery, almost all existing RGB-IR based cross-modal
person Re-ID methods focus on image-to-image matching, while the video-to-video
matching which contains much richer spatial- and temporal-information remains
under-explored. In this paper, we primarily study the video-based cross-modal
person Re-ID method. To achieve this task, a video-based RGB-IR dataset is
constructed, in which 927 valid identities with 463,259 frames and 21,863
tracklets captured by 12 RGB/IR cameras are collected. Based on our constructed
dataset, we prove that with the increase of frames in a tracklet, the
performance does meet more enhancement, demonstrating the significance of
video-to-video matching in RGB-IR person Re-ID. Additionally, a novel method is
further proposed, which not only projects two modalities to a modal-invariant
subspace, but also extracts the temporal-memory for motion-invariant. Thanks to
these two strategies, much better results are achieved on our video-based
cross-modal person Re-ID. The code and dataset are released at:
https://github.com/VCMproject233/MITML.
Related papers
- ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection [51.16181295385818]
We first collect an annotated RGB-D video SODOD (DSOD-100) dataset, which contains 100 videos within a total of 9,362 frames.
All the frames in each video are manually annotated to a high-quality saliency annotation.
We propose a new baseline model, named attentive triple-fusion network (ATF-Net) for RGB-D salient object detection.
arXiv Detail & Related papers (2024-06-18T12:09:43Z) - Video-based Visible-Infrared Person Re-Identification with Auxiliary
Samples [21.781628451676205]
Visible-infrared person re-identification (VI-ReID) aims to match persons captured by visible and infrared cameras.
Previous methods focus on learning from cross-modality person images in different cameras.
We first contribute a large-scale VI-ReID dataset named BUPTCampus.
arXiv Detail & Related papers (2023-11-27T06:45:22Z) - Cross-Modal Object Tracking: Modality-Aware Representations and A
Unified Benchmark [8.932487291107812]
In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions.
We propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process.
We will release the dataset for free academic usage, dataset download link and code will be released soon.
arXiv Detail & Related papers (2021-11-08T03:58:55Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video
Re-identification [14.785070524184649]
Video-based Re-ID is an expansion of earlier image-based re-identification methods.
We show superior performance from a simple single stream 2D convolution network leveraging the ResNet50-IBN architecture.
Our approach uses best video Re-ID practice and transfer learning between datasets to outperform existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-14T12:19:32Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality
Person Re-Identification [15.475897856494583]
Conventional person re-identification can only handle RGB color images, which will fail at dark conditions.
RGB-infrared ReID (also known as Infrared-Visible ReID or Visible-Thermal ReID) is proposed.
In this paper, a novel multi-spectrum image generation method is proposed and the generated samples are utilized to help the network to find discriminative information.
arXiv Detail & Related papers (2020-02-29T09:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.