Cross-modal Local Shortest Path and Global Enhancement for
Visible-Thermal Person Re-Identification
- URL: http://arxiv.org/abs/2206.04401v1
- Date: Thu, 9 Jun 2022 10:27:22 GMT
- Title: Cross-modal Local Shortest Path and Global Enhancement for
Visible-Thermal Person Re-Identification
- Authors: Xiaohong Wang and Chaoqi Li and Xiangcai Ma
- Abstract summary: We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features.
The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
- Score: 2.294635424666456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In addition to considering the recognition difficulty caused by human posture
and occlusion, it is also necessary to solve the modal differences caused by
different imaging systems in the Visible-Thermal cross-modal person
re-identification (VT-ReID) task. In this paper,we propose the Cross-modal
Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream
network based on joint learning of local and global features. The core idea of
our paper is to use local feature alignment to solve occlusion problem, and to
solve modal difference by strengthening global feature. Firstly,
Attention-based two-stream ResNet network is designed to extract dual-modality
features and map to a unified feature space. Then, to solve the cross-modal
person pose and occlusion problems, the image are cut horizontally into several
equal parts to obtain local features and the shortest path in local features
between two graphs is used to achieve the fine-grained local feature alignment.
Thirdly, a batch normalization enhancement module applies global features to
enhance strategy, resulting in difference enhancement between different
classes. The multi granularity loss fusion strategy further improves the
performance of the algorithm. Finally, joint learning mechanism of local and
global features is used to improve cross-modal person re-identification
accuracy. The experimental results on two typical datasets show that our model
is obviously superior to the most state-of-the-art methods. Especially, on
SYSU-MM01 datasets, our model can achieve a gain of 2.89%and 7.96% in all
search term of Rank-1 and mAP. The source code will be released soon.
Related papers
- Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples.
Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both.
We propose a novel approach based on the Siamese Transformer Network (STN)
Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z) - DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point
Cloud Understanding [0.875967561330372]
Cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning.
PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences.
arXiv Detail & Related papers (2022-11-22T06:08:43Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z) - DF^2AM: Dual-level Feature Fusion and Affinity Modeling for RGB-Infrared
Cross-modality Person Re-identification [18.152310122348393]
RGB-infrared person re-identification is a challenging task due to the intra-class variations and cross-modality discrepancy.
We propose a Dual-level (i.e., local and global) Feature Fusion (DF2) module by learning attention for discnative feature from local to global manner.
To further mining the relationships between global features from person images, we propose an Affinities Modeling (AM) module.
arXiv Detail & Related papers (2021-04-01T03:12:56Z) - Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification [82.6971648465279]
We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-07T12:27:42Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - Dense Residual Network: Enhancing Global Dense Feature Flow for
Character Recognition [75.4027660840568]
This paper explores how to enhance the local and global dense feature flow by exploiting hierarchical features fully from all the convolution layers.
Technically, we propose an efficient and effective CNN framework, i.e., Fast Dense Residual Network (FDRN) for text recognition.
arXiv Detail & Related papers (2020-01-23T06:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.