Related papers: Transformer for Object Re-Identification: A Survey

Transformer for Object Re-Identification: A Survey

URL: http://arxiv.org/abs/2401.06960v1
Date: Sat, 13 Jan 2024 03:17:57 GMT
Title: Transformer for Object Re-Identification: A Survey
Authors: Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du
Abstract summary: This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID. This survey also covers a wide range of Re-ID research objects, including progress in animal Re-ID.
Score: 73.10634142016542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from varying viewpoints. For a prolonged period, this field has been predominantly driven by deep convolutional neural networks. In recent years, the Transformer has witnessed remarkable advancements in computer vision, prompting an increasing body of research to delve into the application of Transformer in Re-ID. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single-/cross modal tasks. Besides, this survey also covers a wide range of Re-ID research objects, including progress in animal Re-ID. Given the diversity of species in animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task to facilitate future research. Finally, we discuss some important yet under-investigated open issues in the big foundation model era, we believe it will serve as a new handbook for researchers in this field.

Related papers

PAT++: a cautionary tale about generative visual augmentation for Object Re-identification [0.0]
We assess the effectiveness of identity-preserving image generation for object re-identification.<n>Our results show consistent performance degradation, driven by domain shifts and failure to retain identity-defining features.<n>These findings challenge assumptions about the transferability of generative models to fine-grained recognition tasks.
arXiv Detail & Related papers (2025-07-19T15:01:05Z)
Towards Global Localization using Multi-Modal Object-Instance Re-Identification [23.764646800085977]
We propose a novel re-identification transformer architecture that integrates multimodal RGB and depth information. We demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. We also develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints.
arXiv Detail & Related papers (2024-09-18T14:15:10Z)
Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification [63.147482497821166]
We first explore the influence of global and local features of ViT and then propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID. Our proposed method achieves superior performance on four object Re-ID benchmarks.
arXiv Detail & Related papers (2024-04-23T12:42:07Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model. We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures. We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z)
Transformers in Medical Imaging: A Survey [88.03790310594533]
Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results. Medical imaging has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. We provide a review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues.
arXiv Detail & Related papers (2022-01-24T18:50:18Z)
HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance. Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting. We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)
TransReID: Transformer-based Object Re-Identification [20.02035310635418]
Vision Transformer (ViT) is a pure transformer-based model for the object re-identification (ReID) task. With several adaptations, a strong baseline ViT-BoT is constructed with ViT as backbone. We propose a pure-transformer framework dubbed as TransReID, which is the first work to use a pure Transformer for ReID research.
arXiv Detail & Related papers (2021-02-08T17:33:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.