RPT: Learning Point Set Representation for Siamese Visual Tracking
- URL: http://arxiv.org/abs/2008.03467v2
- Date: Wed, 2 Sep 2020 01:27:02 GMT
- Title: RPT: Learning Point Set Representation for Siamese Visual Tracking
- Authors: Ziang Ma, Linyuan Wang, Haitao Zhang, Wei Lu and Jun Yin
- Abstract summary: We propose an effcient visual tracking framework to accurately estimate the target state with a finer representation as a set of representative points.
Our method achieves new state-of-the-art performance while running at over 20 FPS.
- Score: 15.04182251944942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While remarkable progress has been made in robust visual tracking, accurate
target state estimation still remains a highly challenging problem. In this
paper, we argue that this issue is closely related to the prevalent bounding
box representation, which provides only a coarse spatial extent of object. Thus
an effcient visual tracking framework is proposed to accurately estimate the
target state with a finer representation as a set of representative points. The
point set is trained to indicate the semantically and geometrically significant
positions of target region, enabling more fine-grained localization and
modeling of object appearance. We further propose a multi-level aggregation
strategy to obtain detailed structure information by fusing hierarchical
convolution layers. Extensive experiments on several challenging benchmarks
including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method
achieves new state-of-the-art performance while running at over 20 FPS.
Related papers
- RTrack: Accelerating Convergence for Visual Object Tracking via
Pseudo-Boxes Exploration [3.29854706649876]
Single object tracking (SOT) heavily relies on the representation of the target object as a bounding box.
This paper proposes RTrack, a novel object representation baseline tracker.
RTrack automatically arranges points to define the spatial extents and highlight local areas.
arXiv Detail & Related papers (2023-09-23T04:41:59Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - Tiny Object Tracking: A Large-scale Dataset and A Baseline [40.93697515531104]
We create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames.
In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities.
We propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework.
arXiv Detail & Related papers (2022-02-11T15:00:32Z) - RPT++: Customized Feature Representation for Siamese Visual Tracking [16.305972000224358]
We argue that the performance gain of visual tracking is limited since features extracted from the salient area provide more recognizable visual patterns for classification.
We propose two customized feature extractors, named polar pooling and extreme pooling to capture task-specific visual patterns.
We demonstrate the effectiveness of the task-specific feature representation by integrating it into the recent and advanced tracker RPT.
arXiv Detail & Related papers (2021-10-23T10:58:57Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Structure-Consistent Weakly Supervised Salient Object Detection with
Local Saliency Coherence [14.79639149658596]
We propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations.
Our method achieves a new state-of-the-art performance on six benchmarks.
arXiv Detail & Related papers (2020-12-08T12:49:40Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - Visual Tracking by TridentAlign and Context Embedding [71.60159881028432]
We propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods.
The performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.
arXiv Detail & Related papers (2020-07-14T08:00:26Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.