TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation
- URL: http://arxiv.org/abs/2312.09612v1
- Date: Fri, 15 Dec 2023 08:54:15 GMT
- Title: TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation
- Authors: Yuhao Wang and Xuehu Liu and Pingping Zhang and Hu Lu and Zhengzheng
Tu and Huchuan Lu
- Abstract summary: We propose a cyclic token permutation framework for multi-spectral object ReID, dubbled TOP-ReID.
We also propose a Token Permutation Module (TPM) for cyclic multi-spectral feature aggregation.
Our proposed framework can generate more discriminative multi-spectral features for robust object ReID.
- Score: 64.65950381870742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-spectral object Re-identification (ReID) aims to retrieve specific
objects by leveraging complementary information from different image spectra.
It delivers great advantages over traditional single-spectral ReID in complex
visual environment. However, the significant distribution gap among different
image spectra poses great challenges for effective multi-spectral feature
representations. In addition, most of current Transformer-based ReID methods
only utilize the global feature of class tokens to achieve the holistic
retrieval, ignoring the local discriminative ones. To address the above issues,
we step further to utilize all the tokens of Transformers and propose a cyclic
token permutation framework for multi-spectral object ReID, dubbled TOP-ReID.
More specifically, we first deploy a multi-stream deep network based on vision
Transformers to preserve distinct information from different image spectra.
Then, we propose a Token Permutation Module (TPM) for cyclic multi-spectral
feature aggregation. It not only facilitates the spatial feature alignment
across different image spectra, but also allows the class token of each
spectrum to perceive the local details of other spectra. Meanwhile, we propose
a Complementary Reconstruction Module (CRM), which introduces dense token-level
reconstruction constraints to reduce the distribution gap across different
image spectra. With the above modules, our proposed framework can generate more
discriminative multi-spectral features for robust object ReID. Extensive
experiments on three ReID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310)
verify the effectiveness of our methods. The code is available at
https://github.com/924973292/TOP-ReID.
Related papers
- Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification [64.36210786350568]
We propose a novel learning framework named textbfEDITOR to select diverse tokens from vision Transformers for multi-modal object ReID.
Our framework can generate more discriminative features for multi-modal object ReID.
arXiv Detail & Related papers (2024-03-15T12:44:35Z) - SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)
Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z) - MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral
Image Classification [9.051982753583232]
Transformer has become an alternative architecture of CNNs for hyperspectral image classification.
We propose a multiscale spectral-spatial convolutional Transformer (MultiscaleFormer) for hyperspectral image classification.
arXiv Detail & Related papers (2023-10-28T00:41:35Z) - FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer [29.95553680263075]
We propose Feature Matching with Reconciliatory Transformer (FMRT), a detector-free method that reconciles different features with multiple receptive fields adaptively.
FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.
arXiv Detail & Related papers (2023-10-20T15:54:18Z) - Multiview Transformer: Rethinking Spatial Information in Hyperspectral
Image Classification [43.17196501332728]
Identifying the land cover category for each pixel in a hyperspectral image relies on spectral and spatial information.
In this article, we investigate that scene-specific but not essential correlations may be recorded in an HSI cuboid.
We propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT)
arXiv Detail & Related papers (2023-10-11T04:25:24Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - There and Back Again: Self-supervised Multispectral Correspondence
Estimation [13.56924750612194]
We introduce a novel cycle-consistency metric that allows us to self-supervise. This, combined with our spectra-agnostic loss functions, allows us to train the same network across multiple spectra.
We demonstrate our approach on the challenging task of dense RGB-FIR correspondence estimation.
arXiv Detail & Related papers (2021-03-19T12:33:56Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.