Single Object Tracking through a Fast and Effective Single-Multiple
Model Convolutional Neural Network
- URL: http://arxiv.org/abs/2103.15105v1
- Date: Sun, 28 Mar 2021 11:02:14 GMT
- Title: Single Object Tracking through a Fast and Effective Single-Multiple
Model Convolutional Neural Network
- Authors: Faraz Lotfi, Hamid D. Taghirad
- Abstract summary: Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area.
In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot.
The presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object tracking becomes critical especially when similar objects are present
in the same area. Recent state-of-the-art (SOTA) approaches are proposed based
on taking a matching network with a heavy structure to distinguish the target
from other objects in the area which indeed drastically downgrades the
performance of the tracker in terms of speed. Besides, several candidates are
considered and processed to localize the intended object in a region of
interest for each frame which is time-consuming. In this article, a special
architecture is proposed based on which in contrast to the previous approaches,
it is possible to identify the object location in a single shot while taking
its template into account to distinguish it from the similar objects in the
same area. In brief, first of all, a window containing the object with twice
the target size is considered. This window is then fed into a fully
convolutional neural network (CNN) to extract a region of interest (RoI) in a
form of a matrix for each of the frames. In the beginning, a template of the
target is also taken as the input to the CNN. Considering this RoI matrix, the
next movement of the tracker is determined based on a simple and fast method.
Moreover, this matrix helps to estimate the object size which is crucial when
it changes over time. Despite the absence of a matching network, the presented
tracker performs comparatively with the SOTA in challenging situations while
having a super speed compared to them (up to $120 FPS$ on 1080ti). To
investigate this claim, a comparison study is carried out on the GOT-10k
dataset. Results reveal the outstanding performance of the proposed method in
fulfilling the task.
Related papers
- UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - IoU-Enhanced Attention for End-to-End Task Specific Object Detection [17.617133414432836]
R-CNN achieves promising results without densely tiled anchor boxes or grid points in the image.
Due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention.
This paper proposes to use IoU between different boxes as a prior for the value routing in self attention.
arXiv Detail & Related papers (2022-09-21T14:36:18Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Novel Perception Algorithmic Framework For Object Identification and
Tracking In Autonomous Navigation [1.370633147306388]
This paper introduces a novel perception framework that has the ability to identify and track objects in autonomous vehicle's field of view.
The framework makes use of ego-vehicle's pose estimation and a KD-Tree-based goal segmentation algorithm.
The effectiveness of the methodology is tested on a KITTI dataset.
arXiv Detail & Related papers (2020-06-08T18:21:40Z) - Applying r-spatiogram in object tracking for occlusion handling [16.36552899280708]
The aim of video tracking is to accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence.
In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e. object modeling, object detection and localization, and model updating.
arXiv Detail & Related papers (2020-03-18T02:42:51Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.