Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States
- URL: http://arxiv.org/abs/2405.16600v1
- Date: Sun, 26 May 2024 15:25:26 GMT
- Title: Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States
- Authors: Qizao Wang, Xuelin Qian, Bin Li, Yanwei Fu, Xiangyang Xue,
- Abstract summary: We propose a more practical task, namely lifelong person re-identification with hybrid clothing states.
We take a series of cloth-changing and cloth-consistent domains into account during lifelong learning.
We propose a novel framework, dubbed $Teata$, to effectively align, transfer and accumulate knowledge in an "image-text-image" closed loop.
- Score: 78.52704557647438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the continuous expansion of intelligent surveillance networks, lifelong person re-identification (LReID) has received widespread attention, pursuing the need of self-evolution across different domains. However, existing LReID studies accumulate knowledge with the assumption that people would not change their clothes. In this paper, we propose a more practical task, namely lifelong person re-identification with hybrid clothing states (LReID-Hybrid), which takes a series of cloth-changing and cloth-consistent domains into account during lifelong learning. To tackle the challenges of knowledge granularity mismatch and knowledge presentation mismatch that occurred in LReID-Hybrid, we take advantage of the consistency and generalization of the text space, and propose a novel framework, dubbed $Teata$, to effectively align, transfer and accumulate knowledge in an "image-text-image" closed loop. Concretely, to achieve effective knowledge transfer, we design a Structured Semantic Prompt (SSP) learning to decompose the text prompt into several structured pairs to distill knowledge from the image space with a unified granularity of text description. Then, we introduce a Knowledge Adaptation and Projection strategy (KAP), which tunes text knowledge via a slow-paced learner to adapt to different tasks without catastrophic forgetting. Extensive experiments demonstrate the superiority of our proposed $Teata$ for LReID-Hybrid as well as on conventional LReID benchmarks over advanced methods.
Related papers
- Dynamic Textual Prompt For Rehearsal-free Lifelong Person Re-identification [30.782126710974165]
Lifelong person re-identification attempts to recognize people across cameras and integrate new knowledge from continuous data streams.
Key challenges involve addressing catastrophic forgetting caused by parameter updating and domain shift.
We propose using textual descriptions as guidance to encourage the ReID model to learn cross-domain invariant features without retaining samples.
arXiv Detail & Related papers (2024-11-09T00:57:19Z) - Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration [107.31481207855835]
Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses.
We propose UniKE, a novel multimodal editing method that establishes a unified perspective for intrinsic knowledge editing and external knowledge resorting.
arXiv Detail & Related papers (2024-09-30T02:13:53Z) - Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Auto-selected Knowledge Adapters for Lifelong Person Re-identification [54.42307214981537]
Lifelong Person Re-Identification requires systems to continually learn from non-overlapping datasets across different times and locations.
Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting.
We introduce a novel framework AdalReID, that adopts knowledge adapters and a parameter-free auto-selection mechanism for lifelong learning.
arXiv Detail & Related papers (2024-05-29T11:42:02Z) - Adaptive Prompt Learning with Distilled Connective Knowledge for
Implicit Discourse Relation Recognition [18.42715011594281]
Implicit discourse relation recognition (IDRR) aims at recognizing the discourse relation between two text segments without an explicit connective.
We propose a continuous version of prompt learning together with connective knowledge distillation, called AdaptPrompt, to reduce manual design efforts via continuous prompting.
We also design an answer-relation mapping rule to generate a few virtual answers as the answer space.
arXiv Detail & Related papers (2023-09-14T09:44:46Z) - Combo of Thinking and Observing for Outside-Knowledge VQA [13.838435454270014]
Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge.
In this paper, we are inspired to constrain the cross-modality space into the same space of natural-language space.
We propose a novel framework consisting of a multimodal encoder, a textual encoder and an answer decoder.
arXiv Detail & Related papers (2023-05-10T18:32:32Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based
Object Re-Identification [93.39253443415392]
We propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image.
It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input.
We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets.
arXiv Detail & Related papers (2020-01-15T09:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.