Related papers: Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

URL: http://arxiv.org/abs/2407.07351v2
Date: Tue, 04 Feb 2025 13:30:12 GMT
Title: Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
Authors: Zhenyu Kuang, Hongyang Zhang, Mang Ye, Bin Yang, Yinhao Liu, Yue Huang, Xinghao Ding, Huafeng Li,
Abstract summary: Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for fine-tuning or retraining.<n>Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains.<n>We propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method to solve this unique problem.
Score: 60.20318058777603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for additional fine-tuning or retraining. Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains. However, interfered by the inherent domain-related redundancy in the source images, solely relying on common features is insufficient for accurately capturing the complementary features with lower occurrence probability and smaller energy. To solve this unique problem, we propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method, which fully leverages the high-level semantics of Contrastive Language-Image Pretraining (CLIP) to obtain a diversified prompt set and achieve complementary feature representations. Specifically, this paper first designs a Spectrum-based Transformation for Redundancy Elimination and Augmentation Module (STREAM) through simple image preprocessing to obtain two types of image inputs for the training process. Since STREAM eliminates domain-related redundancy in source images, it enables the model to pay closer attention to the detailed prompt set that is crucial for distinguishing fine-grained vehicles. This learned prompt set related to the vehicle identity is then utilized to guide the comprehensive representation learning of complementary features for final knowledge fusion and identity recognition. Inspired by the unity principle, MiKeCoCo integrates the diverse evaluation ways of experts to ensure the accuracy and consistency of ReID. Extensive experimental results demonstrate that our method achieves state-of-the-art performance.

Related papers

FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation [42.980289787679084]
Person re-identification (ReID) plays a critical role in applications like security surveillance and criminal investigations by matching individuals across large image galleries captured by non-overlapping cameras. Traditional ReID methods rely on unimodal inputs, typically images, but face limitations due to challenges like occlusions, lighting changes, and pose variations. This paper presents FusionSegReID, a multimodal model that combines both image and text inputs for enhanced ReID performance.
arXiv Detail & Related papers (2025-03-27T15:14:03Z)
CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification [42.429118831928214]
We explore the use of CLIP (Contrastive Language-Image Pretraining), a vision-language model pretrained on large-scale image-text pairs to align visual and textual features. The adaptation of CLIP to the task presents two primary challenges: learning more fine-grained features to enhance discriminative ability, and learning more domain-invariant features to improve the model's generalization capabilities.
arXiv Detail & Related papers (2025-01-27T14:08:25Z)
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images. DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z)
Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery [0.0]
This study introduces a novel approach to cross-domain knowledge discovery through the deployment of multi-AI agents. Our findings demonstrate the superior capability of domain specific multi-AI agent system in identifying and bridging knowledge gaps.
arXiv Detail & Related papers (2024-04-12T14:50:41Z)
Robust Representation Learning for Unified Online Top-K Recommendation [39.12191494863331]
We propose a robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The proposed method has been successfully deployed online to serve real business scenarios.
arXiv Detail & Related papers (2023-10-24T03:42:20Z)
CLIP-Driven Fine-grained Text-Image Person Re-identification [50.94827165464813]
TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images. We propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID.
arXiv Detail & Related papers (2022-10-19T03:43:12Z)
Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity [61.05259660910437]
We propose a global consistency and complementarity network (CoCoNet) to learn representations from multiple views. On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge can improve the discriminability of the learned representations. Lastly on the local stage, we propose a complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information.
arXiv Detail & Related papers (2022-09-16T09:24:00Z)
Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Nested Collaborative Learning for Long-Tailed Visual Recognition [71.6074806468641]
NCL consists of two core components, namely Nested Individual Learning (NIL) and Nested Balanced Online Distillation (NBOD) To learn representations more thoroughly, both NIL and NBOD are formulated in a nested way, in which the learning is conducted on not just all categories from a full perspective but some hard categories from a partial perspective. In the NCL, the learning from two perspectives is nested, highly related and complementary, and helps the network to capture not only global and robust features but also meticulous distinguishing ability.
arXiv Detail & Related papers (2022-03-29T08:55:39Z)
TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning [5.849513679510834]
Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets. We propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features.
arXiv Detail & Related papers (2022-01-19T07:30:44Z)
Unleashing the Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification [10.045028405219641]
We design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID. We introduce an intra-identity (I$2$-)regularization in the UP-ReID, which is instantiated as two constraints coming from global image aspect and local patch aspect. Our UP-ReID pre-trained model can significantly benefit the downstream ReID fine-tuning and achieve state-of-the-art performance.
arXiv Detail & Related papers (2021-12-01T07:16:37Z)
Calibrated Feature Decomposition for Generalizable Person Re-Identification [82.64133819313186]
Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification. A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
arXiv Detail & Related papers (2021-11-27T17:12:43Z)
Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification [41.923753462539736]
We propose a novel training framework, named Multiple Domain Experts Collaborative Learning (MD-ExCo) The MD-ExCo consists of a universal expert and several domain experts. Experiments on DG-ReID benchmarks show that our MD-ExCo outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-05-26T06:38:23Z)
Deep Partial Multi-View Learning [94.39367390062831]
We propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets) We fifirst provide a formal defifinition of completeness and versatility for multi-view representation. We then theoretically prove the versatility of the learned latent representations.
arXiv Detail & Related papers (2020-11-12T02:29:29Z)
Gait Recognition using Multi-Scale Partial Representation Transformation with Capsules [22.99694601595627]
We propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules. Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor. It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions.
arXiv Detail & Related papers (2020-10-18T19:47:38Z)
Self-Supervised Learning Across Domains [33.86614301708017]
We propose to apply a similar approach to the problem of object recognition across domains. Our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals on the same images. This secondary task helps the network to focus on object shapes, learning concepts like spatial orientation and part correlation, while acting as a regularizer for the classification task.
arXiv Detail & Related papers (2020-07-24T06:19:53Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
Cross-modality Person re-identification with Shared-Specific Feature Transfer [112.60513494602337]
Cross-modality person re-identification (cm-ReID) is a challenging but key technology for intelligent video analysis. We propose a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics.
arXiv Detail & Related papers (2020-02-28T00:18:45Z)
Cross-Resolution Adversarial Dual Network for Person Re-Identification and Beyond [59.149653740463435]
Person re-identification (re-ID) aims at matching images of the same person across camera views. Due to varying distances between cameras and persons of interest, resolution mismatch can be expected. We propose a novel generative adversarial network to address cross-resolution person re-ID.
arXiv Detail & Related papers (2020-02-19T07:21:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.