Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
- URL: http://arxiv.org/abs/2407.07351v1
- Date: Wed, 10 Jul 2024 04:06:39 GMT
- Title: Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
- Authors: Zhenyu Kuang, Hongyang Zhang, Lidong Cheng, Yinhao Liu, Yue Huang, Xinghao Ding,
- Abstract summary: Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains.
It still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains.
This paper proposes the two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method.
- Score: 32.80872775195836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains without additional fine-tuning or retraining. However, it still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains. This limitation occurs because the model relies heavily on primary domain-invariant features in the training data and pays less attention to potentially valuable secondary features. To solve this complex and common problem, this paper proposes the two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method, which incorporates multiple experts with unique perspectives into Contrastive Language-Image Pretraining (CLIP) and fully leverages high-level semantic knowledge for comprehensive feature representation. Specifically, we propose to construct the learnable prompt set of all specific-perspective experts by adversarial learning in the latent space of visual features during the first stage of training. The learned prompt set with high-level semantics is then utilized to guide representation learning of the multi-level features for final knowledge fusion in the next stage. In this process of knowledge fusion, although multiple experts employ different assessment ways to examine the same vehicle, their common goal is to confirm the vehicle's true identity. Their collective decision can ensure the accuracy and consistency of the evaluation results. Furthermore, we design different image inputs for two-stage training, which include image component separation and diversity enhancement in order to extract the ID-related prompt representation and to obtain feature representation highlighted by all experts, respectively. Extensive experimental results demonstrate that our method achieves state-of-the-art recognition performance.
Related papers
- CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification [42.429118831928214]
We explore the use of CLIP (Contrastive Language-Image Pretraining), a vision-language model pretrained on large-scale image-text pairs to align visual and textual features.
The adaptation of CLIP to the task presents two primary challenges: learning more fine-grained features to enhance discriminative ability, and learning more domain-invariant features to improve the model's generalization capabilities.
arXiv Detail & Related papers (2025-01-27T14:08:25Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - CLIP-Driven Fine-grained Text-Image Person Re-identification [50.94827165464813]
TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images.
We propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID.
arXiv Detail & Related papers (2022-10-19T03:43:12Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - TransFuse: A Unified Transformer-based Image Fusion Framework using
Self-supervised Learning [5.849513679510834]
Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image.
Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets.
We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
arXiv Detail & Related papers (2022-01-19T07:30:44Z) - Unleashing the Potential of Unsupervised Pre-Training with
Intra-Identity Regularization for Person Re-Identification [10.045028405219641]
We design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID.
We introduce an intra-identity (I$2$-)regularization in the UP-ReID, which is instantiated as two constraints coming from global image aspect and local patch aspect.
Our UP-ReID pre-trained model can significantly benefit the downstream ReID fine-tuning and achieve state-of-the-art performance.
arXiv Detail & Related papers (2021-12-01T07:16:37Z) - Calibrated Feature Decomposition for Generalizable Person
Re-Identification [82.64133819313186]
Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification.
A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
arXiv Detail & Related papers (2021-11-27T17:12:43Z) - Multi-Scale Cascading Network with Compact Feature Learning for
RGB-Infrared Person Re-Identification [35.55895776505113]
Multi-Scale Part-Aware Cascading framework (MSPAC) is formulated by aggregating multi-scale fine-grained features from part to global.
Cross-modality correlations can thus be efficiently explored on salient features for distinctive modality-invariant feature learning.
arXiv Detail & Related papers (2020-12-12T15:39:11Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.