A Discriminative Single-Shot Segmentation Network for Visual Object
Tracking
- URL: http://arxiv.org/abs/2112.11846v1
- Date: Wed, 22 Dec 2021 12:48:51 GMT
- Title: A Discriminative Single-Shot Segmentation Network for Visual Object
Tracking
- Authors: Alan Luke\v{z}i\v{c}, Ji\v{r}\'i Matas, Matej Kristan
- Abstract summary: We propose a discriminative single-shot segmentation tracker -- D3S2.
A single-shot network applies two target models with complementary geometric properties.
D3S2 outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks.
- Score: 13.375369415113534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Template-based discriminative trackers are currently the dominant tracking
paradigm due to their robustness, but are restricted to bounding box tracking
and a limited range of transformation models, which reduces their localization
accuracy. We propose a discriminative single-shot segmentation tracker -- D3S2,
which narrows the gap between visual object tracking and video object
segmentation. A single-shot network applies two target models with
complementary geometric properties, one invariant to a broad range of
transformations, including non-rigid deformations, the other assuming a rigid
object to simultaneously achieve robust online target segmentation. The overall
tracking reliability is further increased by decoupling the object and feature
scale estimation. Without per-dataset finetuning, and trained only for
segmentation as the primary output, D3S2 outperforms all published trackers on
the recent short-term tracking benchmark VOT2020 and performs very close to the
state-of-the-art trackers on the GOT-10k, TrackingNet, OTB100 and LaSoT. D3S2
outperforms the leading segmentation tracker SiamMask on video object
segmentation benchmarks and performs on par with top video object segmentation
algorithms.
Related papers
- 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data [11.17376076195671]
"BiTrack" is a 3D OMOT framework that includes modules of 2D-3D detection fusion, initial trajectory generation, and bidirectional trajectory re-optimization.
The experiment results on the KITTI dataset demonstrate that BiTrack achieves the state-of-the-art performance for 3D OMOT tasks in terms of accuracy and efficiency.
arXiv Detail & Related papers (2024-06-26T15:09:54Z) - CXTrack: Improving 3D Point Cloud Tracking with Contextual Information [59.55870742072618]
3D single object tracking plays an essential role in many applications, such as autonomous driving.
We propose CXTrack, a novel transformer-based network for 3D object tracking.
We show that CXTrack achieves state-of-the-art tracking performance while running at 29 FPS.
arXiv Detail & Related papers (2022-11-12T11:29:01Z) - InterTrack: Interaction Transformer for 3D Multi-Object Tracking [9.283656931246645]
3D multi-object tracking (MOT) is a key problem for autonomous vehicles.
Our proposed solution, InterTrack, generates discriminative object representations for data association.
We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements.
arXiv Detail & Related papers (2022-08-17T03:24:36Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end.
TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.