Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
- URL: http://arxiv.org/abs/2404.14985v1
- Date: Tue, 23 Apr 2024 12:42:07 GMT
- Title: Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
- Authors: Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu,
- Abstract summary: We first explore the influence of global and local features of ViT and then propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID.
Our proposed method achieves superior performance on four object Re-ID benchmarks.
- Score: 63.147482497821166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from images captured at different places and times. Recently, object Re-ID has achieved great success with the advances of Vision Transformers (ViT). However, the effects of the global-local relation have not been fully explored in Transformers for object Re-ID. In this work, we first explore the influence of global and local features of ViT and then further propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID. We find that the features from last few layers of ViT already have a strong representational ability, and the global and local information can mutually enhance each other. Based on this fact, we propose a Global Aggregation Encoder (GAE) to utilize the class tokens of the last few Transformer layers and learn comprehensive global features effectively. Meanwhile, we propose the Local Multi-layer Fusion (LMF) which leverages both the global cues from GAE and multi-layer patch tokens to explore the discriminative local representations. Extensive experiments demonstrate that our proposed method achieves superior performance on four object Re-ID benchmarks.
Related papers
- Towards Global Localization using Multi-Modal Object-Instance Re-Identification [23.764646800085977]
We propose a novel re-identification transformer architecture that integrates multimodal RGB and depth information.
We demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions.
We also develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints.
arXiv Detail & Related papers (2024-09-18T14:15:10Z) - Leveraging Swin Transformer for Local-to-Global Weakly Supervised
Semantic Segmentation [12.103012959947055]
This work explores the use of Swin Transformer by proposing "SWTformer" to enhance the accuracy of the initial seed CAMs.
SWTformer-V1 achieves a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models.
SWTformer-V2 incorporates a multi-scale feature fusion mechanism to extract additional information.
arXiv Detail & Related papers (2024-01-31T13:41:17Z) - Transformer for Object Re-Identification: A Survey [69.61542572894263]
Vision Transformers have spurred a growing number of studies delving deeper into Transformer-based Re-ID.
This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID.
Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-01-13T03:17:57Z) - Part-Aware Transformer for Generalizable Person Re-identification [138.99827526048205]
Domain generalization person re-identification (DG-ReID) aims to train a model on source domains and generalize well on unseen domains.
We propose a pure Transformer model (termed Part-aware Transformer) for DG-ReID by designing a proxy task, named Cross-ID Similarity Learning (CSL)
This proxy task allows the model to learn generic features because it only cares about the visual similarity of the parts regardless of the ID labels.
arXiv Detail & Related papers (2023-08-07T06:15:51Z) - MOST: Multiple Object localization with Self-supervised Transformers for
object discovery [97.47075050779085]
We present Multiple Object localization with Self-supervised Transformers (MOST)
MOST uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.
We show MOST can be used for self-supervised pre-training of object detectors, and yields consistent improvements on fully, semi-supervised object detection and unsupervised region proposal generation.
arXiv Detail & Related papers (2023-04-11T17:57:27Z) - GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group
Propagation [25.689520892609213]
We present a novel nonhierarchical (i.e. non-pyramidal) transformer model for general visual recognition with high-resolution features.
We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation.
arXiv Detail & Related papers (2022-12-13T18:26:00Z) - TransVPR: Transformer-based place recognition with multi-level attention
aggregation [9.087163485833058]
We introduce a novel holistic place recognition model, TransVPR, based on vision Transformers.
TransVPR achieves state-of-the-art performance on several real-world benchmarks.
arXiv Detail & Related papers (2022-01-06T10:20:24Z) - Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection.
With the global view in very shallow layers, the transformer encoder preserves more local representations.
Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.