Object-Adaptive LSTM Network for Real-time Visual Tracking with
Adversarial Data Augmentation
- URL: http://arxiv.org/abs/2002.02598v1
- Date: Fri, 7 Feb 2020 03:06:07 GMT
- Title: Object-Adaptive LSTM Network for Real-time Visual Tracking with
Adversarial Data Augmentation
- Authors: Yihan Du, Yan Yan, Si Chen, Yang Hua
- Abstract summary: We propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations.
Experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed.
- Score: 31.842910084312265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, deep learning based visual tracking methods have obtained
great success owing to the powerful feature representation ability of
Convolutional Neural Networks (CNNs). Among these methods, classification-based
tracking methods exhibit excellent performance while their speeds are heavily
limited by the expensive computation for massive proposal feature extraction.
In contrast, matching-based tracking methods (such as Siamese networks) possess
remarkable speed superiority. However, the absence of online updating renders
these methods unadaptable to significant object appearance variations. In this
paper, we propose a novel real-time visual tracking method, which adopts an
object-adaptive LSTM network to effectively capture the video sequential
dependencies and adaptively learn the object appearance variations. For high
computational efficiency, we also present a fast proposal selection strategy,
which utilizes the matching-based tracking method to pre-estimate dense
proposals and selects high-quality ones to feed to the LSTM network for
classification. This strategy efficiently filters out some irrelevant proposals
and avoids the redundant computation for feature extraction, which enables our
method to operate faster than conventional classification-based tracking
methods. In addition, to handle the problems of sample inadequacy and class
imbalance during online tracking, we adopt a data augmentation technique based
on the Generative Adversarial Network (GAN) to facilitate the training of the
LSTM network. Extensive experiments on four visual tracking benchmarks
demonstrate the state-of-the-art performance of our method in terms of both
tracking accuracy and speed, which exhibits great potentials of recurrent
structures for visual tracking.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Adaptive Siamese Tracking with a Compact Latent Network [219.38172719948048]
We present an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification.
Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples.
We apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN.
arXiv Detail & Related papers (2023-02-02T08:06:02Z) - Towards Sequence-Level Training for Visual Tracking [60.95799261482857]
This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning.
Four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training.
arXiv Detail & Related papers (2022-08-11T13:15:36Z) - Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
arXiv Detail & Related papers (2021-08-09T05:56:47Z) - Online Descriptor Enhancement via Self-Labelling Triplets for Visual
Data Association [28.03285334702022]
We propose a self-supervised method for incrementally refining visual descriptors to improve performance in the task of object-level visual data association.
Our method optimize deep descriptor generators online, by continuously training a widely available image classification network pre-trained with domain-independent data.
We show that our approach surpasses other visual data-association methods applied to a tracking-by-detection task, and show that it provides better performance-gains when compared to other methods that attempt to adapt to observed information.
arXiv Detail & Related papers (2020-11-06T17:42:04Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z) - Efficient Scale Estimation Methods using Lightweight Deep Convolutional
Neural Networks for Visual Tracking [16.439797365064003]
This paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods.
The proposed methods are formulated based on either holistic or region representation of convolutional feature maps.
They exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency.
arXiv Detail & Related papers (2020-04-06T18:49:37Z) - Effective Fusion of Deep Multitasking Representations for Robust Visual
Tracking [34.09763324745818]
This paper aims to evaluate the performance of twelve state-of-the-art ResNet-based FENs in a DCF-based framework.
It ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method.
The proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation.
arXiv Detail & Related papers (2020-04-03T05:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.