Related papers: Siamese Transformer Pyramid Networks for Real-Time UAV Tracking

Siamese Transformer Pyramid Networks for Real-Time UAV Tracking

URL: http://arxiv.org/abs/2110.08822v1
Date: Sun, 17 Oct 2021 13:48:31 GMT
Title: Siamese Transformer Pyramid Networks for Real-Time UAV Tracking
Authors: Daitao Xing, Nikolaos Evangeliou, Athanasios Tsoukalas and Anthony Tzes
Abstract summary: We introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures. Experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed. Our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset.
Score: 3.0969191504482243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent object tracking methods depend upon deep networks or convoluted architectures. Most of those trackers can hardly meet real-time processing requirements on mobile platforms with limited computing resources. In this work, we introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures. Specifically, we exploit the inherent feature pyramid of a lightweight network (ShuffleNetV2) and reinforce it with a Transformer to construct a robust target-specific appearance model. A centralized architecture with lateral cross attention is developed for building augmented high-level feature maps. To avoid the computation and memory intensity while fusing pyramid representations with the Transformer, we further introduce the pooling attention module, which significantly reduces memory and time complexity while improving the robustness. Comprehensive experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed, demonstrating the effectiveness of SiamTPN. Moreover, our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset. Source codes are available at https://github.com/RISCNYUAD/SiamTPNTracker

Related papers

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking [54.124445709376154]
We propose a novel asymmetric Siamese tracker named textbfAsymTrack for efficient tracking. Building on this architecture, we devise an efficient template modulation mechanism to inject crucial cues into the search features. Experiments demonstrate that AsymTrack offers superior speed-precision trade-offs across different platforms.
arXiv Detail & Related papers (2025-03-01T14:44:54Z)
Correlation-Embedded Transformer Tracking: A Single-Branch Framework [69.0798277313574]
We propose a novel single-branch tracking framework inspired by the transformer. Unlike the Siamese-like feature extraction, our tracker deeply embeds cross-image feature correlation in multiple layers of the feature network. The output features can be directly used for predicting target locations without additional correlation steps.
arXiv Detail & Related papers (2024-01-23T13:20:57Z)
Separable Self and Mixed Attention Transformers for Efficient Object Tracking [3.9160947065896803]
This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time. Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets.
arXiv Detail & Related papers (2023-09-07T19:23:02Z)
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices. HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z)
Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer [0.8808021343665321]
We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking. It is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution. As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%.
arXiv Detail & Related papers (2022-11-09T07:19:33Z)
Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations [0.6144680854063939]
Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. In the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with dense object queries, demonstrated exceptional tracking capabilities while maintaining reasonable runtime. We propose a post-processing mechanism grounded in the Track-by-Detection paradigm, aiming to refine the track displacement estimation.
arXiv Detail & Related papers (2022-10-24T19:47:58Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z)
LiteTransformerSearch: Training-free On-device Search for Efficient Autoregressive Language Models [34.673688610935876]
We show that the latency and perplexity pareto-frontier can be found without need for any model training. We evaluate our method, dubbed Lightweight Transformer Search (LTS), on diverse devices. We show that the perplexity of Transformer-XL can be achieved with up to 2x lower latency.
arXiv Detail & Related papers (2022-03-04T02:10:43Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
Trident Pyramid Networks: The importance of processing at the feature pyramid level for better object detection [50.008529403150206]
We present a new core architecture called Trident Pyramid Network (TPN) TPN allows for a deeper design and for a better balance between communication-based processing and self-processing. We show consistent improvements when using our TPN core on the object detection benchmark, outperforming the popular BiFPN baseline by 1.5 AP.
arXiv Detail & Related papers (2021-10-08T09:59:59Z)
Searching for Efficient Multi-Stage Vision Transformers [42.0565109812926]
Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks. ViT-ResNAS is an efficient multi-stage ViT architecture designed with neural architecture search (NAS)
arXiv Detail & Related papers (2021-09-01T22:37:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.