Hyperspectral Adapter for Object Tracking based on Hyperspectral Video
- URL: http://arxiv.org/abs/2503.22199v1
- Date: Fri, 28 Mar 2025 07:31:48 GMT
- Title: Hyperspectral Adapter for Object Tracking based on Hyperspectral Video
- Authors: Long Gao, Yunhe Zhang, Langkun Chen, Yan Jiang, Weiying Xie, Yunsong Li,
- Abstract summary: A new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work.<n>The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information.<n>The HyA-T achieves state-of-the-art performance on all the datasets.
- Score: 18.77789707539318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object tracking based on hyperspectral video attracts increasing attention to the rich material and motion information in the hyperspectral videos. The prevailing hyperspectral methods adapt pretrained RGB-based object tracking networks for hyperspectral tasks by fine-tuning the entire network on hyperspectral datasets, which achieves impressive results in challenging scenarios. However, the performance of hyperspectral trackers is limited by the loss of spectral information during the transformation, and fine-tuning the entire pretrained network is inefficient for practical applications. To address the issues, a new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work. The hyperspectral adapter for the self-attention (HAS) and the hyperspectral adapter for the multilayer perceptron (HAM) are proposed to generate the adaption information and to transfer the multi-head self-attention (MSA) module and the multilayer perceptron (MLP) in pretrained network for the hyperspectral object tracking task by augmenting the adaption information into the calculation of the MSA and MLP. Additionally, the hyperspectral enhancement of input (HEI) is proposed to augment the original spectral information into the input of the tracking network. The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information. Moreover, only the parameters in the proposed methods are fine-tuned, which is more efficient than the existing methods. Extensive experiments were conducted on four datasets with various spectral bands, verifing the effectiveness of the proposed methods. The HyA-T achieves state-of-the-art performance on all the datasets.
Related papers
- HSOD-BIT-V2: A New Challenging Benchmarkfor Hyperspectral Salient Object Detection [12.1018751772293]
We introduce HSOD-BIT-V2, the largest and most challenging HSOD benchmark dataset to date.<n>We propose Hyper-HRNet, a high-resolution HSOD network.<n>It effectively extracts, integrates, and preserves effective spectral information while reducing dimensionality by capturing the self-similar spectral features.<n>It conveys fine details and precisely locates object contours by incorporating comprehensive global information and detailed object saliency representations.
arXiv Detail & Related papers (2025-03-18T05:09:42Z) - Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking [35.34526230299484]
This paper proposes an effective methodology that adapts transformer-based foundation models for hyperspectral object tracking.<n>We propose an adaptive, learnable spatial-spectral token fusion module that can be extended to any transformer-based backbone.
arXiv Detail & Related papers (2025-02-26T01:46:21Z) - Spectrum-oriented Point-supervised Saliency Detector for Hyperspectral Images [13.79887292039637]
We introduce point supervision into Hyperspectral salient object detection (HSOD)<n>We incorporate Spectral Saliency, derived from conventional HSOD methods, as a pivotal spectral representation within the framework.<n>We propose a novel pipeline, specifically designed for HSIs, to generate pseudo-labels, effectively mitigating the performance decline associated with point supervision strategy.
arXiv Detail & Related papers (2024-12-24T02:52:43Z) - BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking [22.533682363532403]
We provide a new task called hyperspectral camouflaged object tracking (HCOT)
We meticulously construct a large-scale HCOT dataset, termed BihoT, which consists of 41,912 hyperspectral images covering 49 video sequences.
A simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed.
arXiv Detail & Related papers (2024-08-22T09:07:51Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model for HSI interpretation.
It integrates spatial and spectral features using a specially designed spectral enhancement module.
It shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection [12.823338405434244]
Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications.
Previous methods insufficiently harness the inherent distinctive attributes of hyperspectral images (HSIs) during the feature extraction process.
We propose Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a Distilled Spectral-Spatial Transformer (MSST)
We have created a large-scale HSOD dataset, HSOD-BIT, to tackle the issue of data scarcity in this field.
arXiv Detail & Related papers (2024-03-31T14:04:57Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - HHTrack: Hyperspectral Object Tracking Using Hybrid Attention [0.0]
We propose a hyperspectral object tracker based on hybrid attention (HHTrack)
The core of HHTrack is a hyperspectral hybrid attention (HHA) module that unifies feature extraction and fusion within one component through token interactions.
A hyperspectral bands fusion (HBF) module is also introduced to selectively aggregate spatial and spectral signatures from the full hyperspectral input.
arXiv Detail & Related papers (2023-08-14T09:04:06Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.