A Versatile Framework for Multi-scene Person Re-identification
- URL: http://arxiv.org/abs/2403.11121v1
- Date: Sun, 17 Mar 2024 07:04:09 GMT
- Title: A Versatile Framework for Multi-scene Person Re-identification
- Authors: Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng,
- Abstract summary: Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views.
Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges.
This work contributes to the first attempt at learning a versatile ReID model to solve such a problem.
- Score: 30.74494316484783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at https://github.com/iSEE-Laboratory/VersReID.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification [64.36210786350568]
We propose a novel learning framework named textbfEDITOR to select diverse tokens from vision Transformers for multi-modal object ReID.
Our framework can generate more discriminative features for multi-modal object ReID.
arXiv Detail & Related papers (2024-03-15T12:44:35Z) - Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2.
We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features.
By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z) - Unleashing the Potential of Unsupervised Pre-Training with
Intra-Identity Regularization for Person Re-Identification [10.045028405219641]
We design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID.
We introduce an intra-identity (I$2$-)regularization in the UP-ReID, which is instantiated as two constraints coming from global image aspect and local patch aspect.
Our UP-ReID pre-trained model can significantly benefit the downstream ReID fine-tuning and achieve state-of-the-art performance.
arXiv Detail & Related papers (2021-12-01T07:16:37Z) - Learning to Disentangle Scenes for Person Re-identification [15.378033331385312]
This paper proposes to divide-and-conquer the person re-identification (ReID) task.
We employ several self-supervision operations to simulate different challenging problems and handle each challenging problem using different networks.
A general multi-branch network, including one master branch and two servant branches, is introduced to handle different scenes.
arXiv Detail & Related papers (2021-11-10T01:17:10Z) - Apparel-invariant Feature Learning for Apparel-changed Person
Re-identification [70.16040194572406]
Most public ReID datasets are collected in a short time window in which persons' appearance rarely changes.
In real-world applications such as in a shopping mall, the same person's clothing may change, and different persons may wearing similar clothes.
It is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes.
arXiv Detail & Related papers (2020-08-14T03:49:14Z) - Cross-Resolution Adversarial Dual Network for Person Re-Identification
and Beyond [59.149653740463435]
Person re-identification (re-ID) aims at matching images of the same person across camera views.
Due to varying distances between cameras and persons of interest, resolution mismatch can be expected.
We propose a novel generative adversarial network to address cross-resolution person re-ID.
arXiv Detail & Related papers (2020-02-19T07:21:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.