Effective Fusion of Deep Multitasking Representations for Robust Visual
Tracking
- URL: http://arxiv.org/abs/2004.01382v2
- Date: Mon, 20 Sep 2021 09:24:50 GMT
- Title: Effective Fusion of Deep Multitasking Representations for Robust Visual
Tracking
- Authors: Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei,
Kamal Nasrollahi, Thomas B. Moeslund
- Abstract summary: This paper aims to evaluate the performance of twelve state-of-the-art ResNet-based FENs in a DCF-based framework.
It ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method.
The proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation.
- Score: 34.09763324745818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual object tracking remains an active research field in computer vision
due to persisting challenges with various problem-specific factors in
real-world scenes. Many existing tracking methods based on discriminative
correlation filters (DCFs) employ feature extraction networks (FENs) to model
the target appearance during the learning process. However, using deep feature
maps extracted from FENs based on different residual neural networks (ResNets)
has not previously been investigated. This paper aims to evaluate the
performance of twelve state-of-the-art ResNet-based FENs in a DCF-based
framework to determine the best for visual tracking purposes. First, it ranks
their best feature maps and explores the generalized adoption of the best
ResNet-based FEN into another DCF-based method. Then, the proposed method
extracts deep semantic information from a fully convolutional FEN and fuses it
with the best ResNet-based feature maps to strengthen the target representation
in the learning process of continuous convolution filters. Finally, it
introduces a new and efficient semantic weighting method (using semantic
segmentation feature maps on each video frame) to reduce the drift problem.
Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128 and
VOT-2018 visual tracking datasets demonstrate that the proposed method
effectively outperforms state-of-the-art methods in terms of precision and
robustness of visual tracking.
Related papers
- LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Deep Feature Tracker: A Novel Application for Deep Convolutional Neural
Networks [0.0]
We propose a novel and unified deep learning-based approach that can learn how to track features reliably.
The proposed network dubbed as Deep-PT consists of a tracker network which is a convolutional neural network cross-correlation.
The network is trained using multiple datasets due to the lack of specialized dataset for feature tracking datasets.
arXiv Detail & Related papers (2021-07-30T23:24:29Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks
for Robust Visual Tracking [14.627458410954628]
This paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model.
With the aid of analysis results as attribute dictionaries, adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers.
arXiv Detail & Related papers (2020-08-29T17:09:43Z) - Efficient Scale Estimation Methods using Lightweight Deep Convolutional
Neural Networks for Visual Tracking [16.439797365064003]
This paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods.
The proposed methods are formulated based on either holistic or region representation of convolutional feature maps.
They exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency.
arXiv Detail & Related papers (2020-04-06T18:49:37Z) - Beyond Background-Aware Correlation Filters: Adaptive Context Modeling
by Hand-Crafted and Deep RGB Features for Visual Tracking [16.439797365064003]
An adaptive background-aware correlation filter-based tracker is proposed in this paper.
It effectively models the target appearance by using either the histogram of oriented gradients (HOG) or convolutional neural network (CNN) feature maps.
The proposed method exploits the fast 2D non-maximum suppression (NMS) algorithm and the semantic information comparison to detect challenging situations.
arXiv Detail & Related papers (2020-04-06T18:48:39Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z) - Object-Adaptive LSTM Network for Real-time Visual Tracking with
Adversarial Data Augmentation [31.842910084312265]
We propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations.
Experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed.
arXiv Detail & Related papers (2020-02-07T03:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.