Unified Batch All Triplet Loss for Visible-Infrared Person
Re-identification
- URL: http://arxiv.org/abs/2103.04607v1
- Date: Mon, 8 Mar 2021 08:58:52 GMT
- Title: Unified Batch All Triplet Loss for Visible-Infrared Person
Re-identification
- Authors: Wenkang Li, Ke Qi, Wenbin Chen, Yicong Zhou
- Abstract summary: Batch Hard Triplet loss is widely used in person re-identification tasks, but it does not perform well in the Visible-Infrared person re-identification task.
We propose a batch all triplet selection strategy, which selects all the possible triplets among samples to optimize instead of the hardest triplet.
We also introduce Unified Batch All Triplet loss and Cosine Softmax loss to collaboratively optimize the cosine distance between image vectors.
- Score: 33.23261883419256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-Infrared cross-modality person re-identification (VI-ReID), whose aim
is to match person images between visible and infrared modality, is a
challenging cross-modality image retrieval task. Batch Hard Triplet loss is
widely used in person re-identification tasks, but it does not perform well in
the Visible-Infrared person re-identification task. Because it only optimizes
the hardest triplet for each anchor image within the mini-batch, samples in the
hardest triplet may all belong to the same modality, which will lead to the
imbalance problem of modality optimization. To address this problem, we adopt
the batch all triplet selection strategy, which selects all the possible
triplets among samples to optimize instead of the hardest triplet. Furthermore,
we introduce Unified Batch All Triplet loss and Cosine Softmax loss to
collaboratively optimize the cosine distance between image vectors. Similarly,
we rewrite the Hetero Center Triplet loss, which is proposed for VI-ReID task,
into a batch all form to improve model performance. Extensive experiments
indicate the effectiveness of the proposed methods, which outperform
state-of-the-art methods by a wide margin.
Related papers
- Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification [39.70083261306122]
Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information.
In this paper, we first deduce an optimization objective for unsupervised VI-ReID based on the mutual information between the model's cross-modality input and output.
Under their guidance, we design a loop iterative training strategy alternating between model training and cross-modality matching.
arXiv Detail & Related papers (2024-07-17T17:32:07Z) - Multi-threshold Deep Metric Learning for Facial Expression Recognition [60.26967776920412]
We present the multi-threshold deep metric learning technique, which avoids the difficult threshold validation.
We find that each threshold of the triplet loss intrinsically determines a distinctive distribution of inter-class variations.
It makes the embedding layer, which is composed of a set of slices, a more informative and discriminative feature.
arXiv Detail & Related papers (2024-06-24T08:27:31Z) - Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration [50.81374327480445]
We introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation.
We propose the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS)
The proposed U-WADN achieves better performance while simultaneously reducing up to 32.3% of FLOPs and providing approximately 15.7% real-time acceleration.
arXiv Detail & Related papers (2024-01-24T04:25:12Z) - Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for
Visible-Infrared Person Re-Identification [41.600294816284865]
Visible-Infrared person Re-IDentification (VI-ReID) aims to match pedestrians' images across visible and infrared cameras.
To solve the modality gap, existing mainstream methods adopt a learning paradigm converting the image retrieval task into an image classification task.
We propose a simple and effective method, the Multi-level Cross-modality Joint Alignment (MCJA), bridging both modality and objective-level gap.
arXiv Detail & Related papers (2023-07-17T08:24:05Z) - A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with
Batch Normalization and Knowledge Distillation [3.364554138758565]
Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retrieval, where the goal is to retrieve a set of images that match a given sketch query.
We introduce a Relative Triplet Loss (RTL), an adapted triplet loss to overcome limitations through loss weighting based on anchors similarity.
We propose a straightforward approach to train small models efficiently with a marginal loss of accuracy through knowledge distillation.
arXiv Detail & Related papers (2023-05-30T12:41:04Z) - Learning Feature Recovery Transformer for Occluded Person
Re-identification [71.18476220969647]
We propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously.
To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity.
In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its $k$-nearest neighbors in the gallery to recover the complete features.
arXiv Detail & Related papers (2023-01-05T02:36:16Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Warp Consistency for Unsupervised Learning of Dense Correspondences [116.56251250853488]
Key challenge in learning dense correspondences is lack of ground-truth matches for real image pairs.
We propose Warp Consistency, an unsupervised learning objective for dense correspondence regression.
Our approach sets a new state-of-the-art on several challenging benchmarks, including MegaDepth, RobotCar and TSS.
arXiv Detail & Related papers (2021-04-07T17:58:22Z) - Strong but Simple Baseline with Dual-Granularity Triplet Loss for
Visible-Thermal Person Re-Identification [9.964287254346976]
We propose a conceptually simple and effective dual-granularity triplet loss for visible-thermal person re-identification (VT-ReID)
Our proposed dual-granularity triplet loss well organizes the sample-based triplet loss and center-based triplet loss in a hierarchical fine to coarse granularity manner.
arXiv Detail & Related papers (2020-12-09T12:43:34Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Parameter Sharing Exploration and Hetero-Center based Triplet Loss for
Visible-Thermal Person Re-Identification [17.402673438396345]
This paper focuses on the visible-thermal cross-modality person re-identification (VT Re-ID) task.
Our proposed method distinctly outperforms the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-08-14T07:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.