FGSGT: Saliency-Guided Siamese Network Tracker Based on Key Fine-Grained Feature Information for Thermal Infrared Target Tracking
- URL: http://arxiv.org/abs/2504.14309v1
- Date: Sat, 19 Apr 2025 14:13:15 GMT
- Title: FGSGT: Saliency-Guided Siamese Network Tracker Based on Key Fine-Grained Feature Information for Thermal Infrared Target Tracking
- Authors: Ruoyan Xiong, Huanbin Zhang, Shentao Wang, Hui He, Yuke Hou, Yue Zhang, Yujie Cui, Huipan Guan, Shang Zhang,
- Abstract summary: We propose a novel saliency-guided Siamese network tracker based on key fine-grained feature infor-mation.<n>This design captures essential global features from shallow layers, enhances feature diversity, and minimizes the loss of fine-grained in-formation.<n>Experiment results demonstrate that the pro-posed tracker achieves the highest precision and success rates.
- Score: 11.599952876425736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thermal infrared (TIR) images typically lack detailed features and have low contrast, making it challenging for conventional feature extraction models to capture discriminative target characteristics. As a result, trackers are often affected by interference from visually similar objects and are susceptible to tracking drift. To address these challenges, we propose a novel saliency-guided Siamese network tracker based on key fine-grained feature infor-mation. First, we introduce a fine-grained feature parallel learning convolu-tional block with a dual-stream architecture and convolutional kernels of varying sizes. This design captures essential global features from shallow layers, enhances feature diversity, and minimizes the loss of fine-grained in-formation typically encountered in residual connections. In addition, we propose a multi-layer fine-grained feature fusion module that uses bilinear matrix multiplication to effectively integrate features across both deep and shallow layers. Next, we introduce a Siamese residual refinement block that corrects saliency map prediction errors using residual learning. Combined with deep supervision, this mechanism progressively refines predictions, ap-plying supervision at each recursive step to ensure consistent improvements in accuracy. Finally, we present a saliency loss function to constrain the sali-ency predictions, directing the network to focus on highly discriminative fi-ne-grained features. Extensive experiment results demonstrate that the pro-posed tracker achieves the highest precision and success rates on the PTB-TIR and LSOTB-TIR benchmarks. It also achieves a top accuracy of 0.78 on the VOT-TIR 2015 benchmark and 0.75 on the VOT-TIR 2017 benchmark.
Related papers
- DCFG: Diverse Cross-Channel Fine-Grained Feature Learning and Progressive Fusion Siamese Tracker for Thermal Infrared Target Tracking [11.3097285242147]
Cross-channel fine-grained feature learning network to suppress dominant target features.<n>Channel rearrangement mechanism to enhance efficient in-formation flow.<n> specialized cross-channel fine-grained loss function to guide feature groups toward distinct discriminative re-gions of the target.
arXiv Detail & Related papers (2025-04-19T14:24:37Z) - 10K is Enough: An Ultra-Lightweight Binarized Network for Infrared Small-Target Detection [48.074211420276605]
Binarized neural networks (BNNs) are distinguished by their exceptional efficiency in model compression.<n>We propose the Binarized Infrared Small-Target Detection Network (BiisNet)<n>BiisNet preserves the core operations of binarized convolutions while integrating full-precision features into the network's information flow.
arXiv Detail & Related papers (2025-03-04T14:25:51Z) - Improved Dense Nested Attention Network Based on Transformer for
Infrared Small Target Detection [8.388564430699155]
Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds.
The features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases.
We propose improved dense nested attention network (IDNANet), which is based on the transformer architecture.
arXiv Detail & Related papers (2023-11-15T07:29:24Z) - EFLNet: Enhancing Feature Learning for Infrared Small Target Detection [20.546186772828555]
Single-frame infrared small target detection is considered to be a challenging task.
Due to the extreme imbalance between target and background, bounding box regression is extremely sensitive to infrared small target.
We propose an enhancing feature learning network (EFLNet) to address these problems.
arXiv Detail & Related papers (2023-07-27T09:23:22Z) - One-Stage Cascade Refinement Networks for Infrared Small Target
Detection [21.28595135499812]
Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics.
We present a new research benchmark for infrared small target detection consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets.
arXiv Detail & Related papers (2022-12-16T13:37:23Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - RPT++: Customized Feature Representation for Siamese Visual Tracking [16.305972000224358]
We argue that the performance gain of visual tracking is limited since features extracted from the salient area provide more recognizable visual patterns for classification.
We propose two customized feature extractors, named polar pooling and extreme pooling to capture task-specific visual patterns.
We demonstrate the effectiveness of the task-specific feature representation by integrating it into the recent and advanced tracker RPT.
arXiv Detail & Related papers (2021-10-23T10:58:57Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Deep Probabilistic Feature-metric Tracking [27.137827823264942]
We propose a new framework to learn a pixel-wise deep feature map and a deep feature-metric uncertainty map.
CNN predicts a deep initial pose for faster and more reliable convergence.
Experimental results demonstrate state-of-the-art performances on the TUM RGB-D dataset and the 3D rigid object tracking dataset.
arXiv Detail & Related papers (2020-08-31T11:47:59Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.