Transformer Based Multi-Grained Features for Unsupervised Person
Re-Identification
- URL: http://arxiv.org/abs/2211.12280v1
- Date: Tue, 22 Nov 2022 13:51:17 GMT
- Title: Transformer Based Multi-Grained Features for Unsupervised Person
Re-Identification
- Authors: Jiachen Li, Menglin Wang, Xiaojin Gong
- Abstract summary: We build a dual-branch network architecture based upon a modified Vision Transformer (ViT)
Local tokens output in each branch are reshaped and then uniformly partitioned into multiple stripes to generate part-level features.
Global tokens of two branches are averaged to produce a global feature.
- Score: 9.874360118638918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-grained features extracted from convolutional neural networks (CNNs)
have demonstrated their strong discrimination ability in supervised person
re-identification (Re-ID) tasks. Inspired by them, this work investigates the
way of extracting multi-grained features from a pure transformer network to
address the unsupervised Re-ID problem that is label-free but much more
challenging. To this end, we build a dual-branch network architecture based
upon a modified Vision Transformer (ViT). The local tokens output in each
branch are reshaped and then uniformly partitioned into multiple stripes to
generate part-level features, while the global tokens of two branches are
averaged to produce a global feature. Further, based upon offline-online
associated camera-aware proxies (O2CAP) that is a top-performing unsupervised
Re-ID method, we define offline and online contrastive learning losses with
respect to both global and part-level features to conduct unsupervised
learning. Extensive experiments on three person Re-ID datasets show that the
proposed method outperforms state-of-the-art unsupervised methods by a
considerable margin, greatly mitigating the gap to supervised counterparts.
Code will be available soon at https://github.com/RikoLi/WACV23-workshop-TMGF.
Related papers
- D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy [11.239248133240126]
We seek a step toward a universal deepfake detection system with better generalization and robustness.
We propose our Discrepancy Deepfake Detector framework, whose core idea is to learn the universal artifacts from multiple generators.
Our framework achieves a 5.3% accuracy improvement in the OOD testing compared to the current SOTA methods while maintaining the ID performance.
arXiv Detail & Related papers (2024-04-06T10:45:02Z) - CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual
Learning [52.72888626663642]
Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image.
Deep learning based copy-move forgery detection methods are in the ascendant.
We propose a Transformer-style copy-move forgery network named as CMFDFormer.
We also provide a novel PCSD continual learning framework to help CMFDFormer handle new tasks.
arXiv Detail & Related papers (2023-11-22T09:27:46Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Offline-Online Associated Camera-Aware Proxies for Unsupervised Person
Re-identification [31.065557919305892]
Unsupervised person re-identification (Re-ID) has received increasing research attention.
Most clustering-based methods take each cluster as a pseudo identity class.
We propose to split each single cluster into multiple proxies according to camera views.
arXiv Detail & Related papers (2022-01-15T10:12:03Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.