Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based
Object Re-Identification
- URL: http://arxiv.org/abs/2001.05197v2
- Date: Tue, 21 Jan 2020 17:21:07 GMT
- Title: Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based
Object Re-Identification
- Authors: Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen
- Abstract summary: We propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image.
It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input.
We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets.
- Score: 93.39253443415392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object re-identification (re-id) aims to identify a specific object across
times or camera views, with the person re-id and vehicle re-id as the most
widely studied applications. Re-id is challenging because of the variations in
viewpoints, (human) poses, and occlusions. Multi-shots of the same object can
cover diverse viewpoints/poses and thus provide more comprehensive information.
In this paper, we propose exploiting the multi-shots of the same identity to
guide the feature learning of each individual image. Specifically, we design an
Uncertainty-aware Multi-shot Teacher-Student (UMTS) Network. It consists of a
teacher network (T-net) that learns the comprehensive features from multiple
images of the same object, and a student network (S-net) that takes a single
image as input. In particular, we take into account the data dependent
heteroscedastic uncertainty for effectively transferring the knowledge from the
T-net to S-net. To the best of our knowledge, we are the first to make use of
multi-shots of an object in a teacher-student learning manner for effectively
boosting the single image based re-id. We validate the effectiveness of our
approach on the popular vehicle re-id and person re-id datasets. In inference,
the S-net alone significantly outperforms the baselines and achieves the
state-of-the-art performance.
Related papers
- Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Learning Invariance from Generated Variance for Unsupervised Person
Re-identification [15.096776375794356]
We propose to replace traditional data augmentation with a generative adversarial network (GAN)
A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features.
By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.
arXiv Detail & Related papers (2023-01-02T15:40:14Z) - Feature Disentanglement Learning with Switching and Aggregation for
Video-based Person Re-Identification [9.068045610800667]
In video person re-identification (Re-ID), the network must consistently extract features of the target person from successive frames.
Existing methods tend to focus only on how to use temporal information, which often leads to networks being fooled by similar appearances and same backgrounds.
We propose a Disentanglement and Switching and Aggregation Network (DSANet), which segregates the features representing identity and features based on camera characteristics, and pays more attention to ID information.
arXiv Detail & Related papers (2022-12-16T04:27:56Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Pose-driven Attention-guided Image Generation for Person
Re-Identification [39.605062525247135]
We propose an end-to-end pose-driven generative adversarial network to generate multiple poses of a person.
A semantic-consistency loss is proposed to preserve the semantic information of the person during pose transfer.
We show that by incorporating the proposed approach in a person re-identification framework, realistic pose transferred images and state-of-the-art re-identification results can be achieved.
arXiv Detail & Related papers (2021-04-28T14:02:24Z) - Person image generation with semantic attention network for person
re-identification [9.30413920076019]
We propose a novel person pose-guided image generation method, which is called the semantic attention network.
The network consists of several semantic attention blocks, where each block attends to preserve and update the pose code and the clothing textures.
Compared with other methods, our network can characterize better body shape and keep clothing attributes, simultaneously.
arXiv Detail & Related papers (2020-08-18T12:18:51Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.