Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks
- URL: http://arxiv.org/abs/2005.06536v1
- Date: Wed, 13 May 2020 19:05:42 GMT
- Title: Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks
- Authors: Ning Zhang, Jingen Liu, Ke Wang, Dan Zeng, Tao Mei
- Abstract summary: We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
- Score: 62.836429958476735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current deep learning based visual tracking approaches have been very
successful by learning the target classification and/or estimation model from a
large amount of supervised training data in offline mode. However, most of them
can still fail in tracking objects due to some more challenging issues such as
dense distractor objects, confusing background, motion blurs, and so on.
Inspired by the human "visual tracking" capability which leverages motion cues
to distinguish the target from the background, we propose a Two-Stream Residual
Convolutional Network (TS-RCN) for visual tracking, which successfully exploits
both appearance and motion features for model update. Our TS-RCN can be
integrated with existing deep learning based visual trackers. To further
improve the tracking performance, we adopt a "wider" residual network ResNeXt
as its feature extraction backbone. To the best of our knowledge, TS-RCN is the
first end-to-end trainable two-stream visual tracking system, which makes full
use of both appearance and motion features of the target. We have extensively
evaluated the TS-RCN on most widely used benchmark datasets including VOT2018,
VOT2019, and GOT-10K. The experiment results have successfully demonstrated
that our two-stream model can greatly outperform the appearance based tracker,
and it also achieves state-of-the-art performance. The tracking system can run
at up to 38.1 FPS.
Related papers
- OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning [33.521077115333696]
We present a general framework to unify various tracking tasks, termed as OneTracker.
OneTracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker.
Then we regard other modality information as prompt and build Prompt Tracker upon Foundation Tracker.
arXiv Detail & Related papers (2024-03-14T17:59:13Z) - Tracking with Human-Intent Reasoning [64.69229729784008]
This work proposes a new tracking task -- Instruction Tracking.
It involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames.
TrackGPT is capable of performing complex reasoning-based tracking.
arXiv Detail & Related papers (2023-12-29T03:22:18Z) - BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving.
In this paper, we propose BEVTrack, a simple yet effective baseline method.
By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z) - Simple Cues Lead to a Strong Multi-Object Tracker [3.7189423451031356]
We propose a new type of tracking-by-detection (TbD) for Multi-Object Tracking.
We show that a combination of our appearance features with a simple motion model leads to strong tracking results.
Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-06-09T17:55:51Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end.
TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z) - Coarse-to-Fine Object Tracking Using Deep Features and Correlation
Filters [2.3526458707956643]
This paper presents a novel deep learning tracking algorithm.
We exploit the generalization ability of deep features to coarsely estimate target translation.
Then, we capitalize on the discriminative power of correlation filters to precisely localize the tracked object.
arXiv Detail & Related papers (2020-12-23T16:43:21Z) - TRAT: Tracking by Attention Using Spatio-Temporal Features [14.520067060603209]
We propose a two-stream deep neural network tracker that uses both spatial and temporal features.
Our architecture is developed over ATOM tracker and contains two backbones: (i) 2D-CNN network to capture appearance features and (ii) 3D-CNN network to capture motion features.
arXiv Detail & Related papers (2020-11-18T20:11:12Z) - Unsupervised Deep Representation Learning for Real-Time Tracking [137.69689503237893]
We propose an unsupervised learning method for visual tracking.
The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking.
We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning.
arXiv Detail & Related papers (2020-07-22T08:23:12Z) - Rethinking Convolutional Features in Correlation Filter Based Tracking [0.0]
We revisit a hierarchical deep feature-based visual tracker and find that both the performance and efficiency of the deep tracker are limited by the poor feature quality.
After removing redundant features, our proposed tracker achieves significant improvements in both performance and efficiency.
arXiv Detail & Related papers (2019-12-30T04:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.