Siamese Object Tracking for Vision-Based UAM Approaching with Pairwise
Scale-Channel Attention
- URL: http://arxiv.org/abs/2211.14564v1
- Date: Sat, 26 Nov 2022 13:33:49 GMT
- Title: Siamese Object Tracking for Vision-Based UAM Approaching with Pairwise
Scale-Channel Attention
- Authors: Guangze Zheng, Changhong Fu, Junjie Ye, Bowen Li, Geng Lu, Jia Pan
- Abstract summary: This work proposes a novel Siamese network with pairwise scale-channel attention (SiamSA) for vision-based UAM approaching.
SiamSA consists of a pairwise scale-channel attention network (PSAN) and a scale-aware anchor proposal network (SA-APN)
- Score: 27.114231832842034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the manipulating of the unmanned aerial manipulator (UAM) has been
widely studied, vision-based UAM approaching, which is crucial to the
subsequent manipulating, generally lacks effective design. The key to the
visual UAM approaching lies in object tracking, while current UAM tracking
typically relies on costly model-based methods. Besides, UAM approaching often
confronts more severe object scale variation issues, which makes it
inappropriate to directly employ state-of-the-art model-free Siamese-based
methods from the object tracking field. To address the above problems, this
work proposes a novel Siamese network with pairwise scale-channel attention
(SiamSA) for vision-based UAM approaching. Specifically, SiamSA consists of a
pairwise scale-channel attention network (PSAN) and a scale-aware anchor
proposal network (SA-APN). PSAN acquires valuable scale information for feature
processing, while SA-APN mainly attaches scale awareness to anchor proposing.
Moreover, a new tracking benchmark for UAM approaching, namely UAMT100, is
recorded with 35K frames on a flying UAM platform for evaluation. Exhaustive
experiments on the benchmarks and real-world tests validate the efficiency and
practicality of SiamSA with a promising speed. Both the code and UAMT100
benchmark are now available at https://github.com/vision4robotics/SiamSA.
Related papers
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.
We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.
Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark [22.487379136024018]
We propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) framework, including a Multi-modal Large Language Model (MLLM) named UNIAA-LLaVA.
We choose MLLMs with both visual perception and language ability for IAA and establish a low-cost paradigm for transforming the existing datasets into unified and high-quality visual instruction tuning data.
Our model performs better than GPT-4V in aesthetic perception and even approaches the junior-level human.
arXiv Detail & Related papers (2024-04-15T09:47:48Z) - Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z) - Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and
Comprehensive Analysis [15.10348491862546]
Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide range of applications.
Siamese networks shine in visual object tracking with their promising balance of accuracy, robustness, and speed.
arXiv Detail & Related papers (2022-05-09T13:53:34Z) - Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments [20.69412701553767]
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning.
In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs.
This paper presents a new dataset, DenseUAV, which is the first publicly available dataset designed for the UAV self-positioning task.
arXiv Detail & Related papers (2022-01-23T07:18:55Z) - TSA-Net: Tube Self-Attention Network for Action Quality Assessment [4.220843694492582]
We propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA)
TSA-Net is with the following merits: 1) High computational efficiency, 2) High flexibility, and 3) The state-of-the art performance.
arXiv Detail & Related papers (2022-01-11T02:25:27Z) - MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays.
We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function.
We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z) - SiamAPN++: Siamese Attentional Aggregation Network for Real-Time UAV
Tracking [16.78336740951222]
A novel attentional Siamese tracker (SiamAPN++) is proposed for real-time UAV tracking.
SiamAPN++ achieves promising tracking results with real-time speed.
arXiv Detail & Related papers (2021-06-16T14:28:57Z) - Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection [60.522877583407904]
Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods.
We present Pseudo-Intersection-over-Union(Pseudo-IoU): a simple metric that brings more standardized and accurate assignment rule into anchor-free object detection frameworks.
Our method achieves comparable performance to other recent state-of-the-art anchor-free methods without bells and whistles.
arXiv Detail & Related papers (2021-04-29T02:48:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.