AttTrack: Online Deep Attention Transfer for Multi-object Tracking
- URL: http://arxiv.org/abs/2210.08648v1
- Date: Sun, 16 Oct 2022 22:15:31 GMT
- Title: AttTrack: Online Deep Attention Transfer for Multi-object Tracking
- Authors: Keivan Nalaie, Rong Zheng
- Abstract summary: Multi-object tracking (MOT) is a vital component of intelligent video analytics applications such as surveillance and autonomous driving.
In this paper, we aim to accelerate MOT by transferring the knowledge from high-level features of a complex network (teacher) to a lightweight network (student) at both training and inference times.
The proposed AttTrack framework has three key components: 1) cross-model feature learning to align intermediate representations from the teacher and student models, 2) interleaving the execution of the two models at inference time, and 3) incorporating the updated predictions from the teacher model as prior knowledge to assist the student model
- Score: 4.5116674432168615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-object tracking (MOT) is a vital component of intelligent video
analytics applications such as surveillance and autonomous driving. The time
and storage complexity required to execute deep learning models for visual
object tracking hinder their adoption on embedded devices with limited
computing power. In this paper, we aim to accelerate MOT by transferring the
knowledge from high-level features of a complex network (teacher) to a
lightweight network (student) at both training and inference times. The
proposed AttTrack framework has three key components: 1) cross-model feature
learning to align intermediate representations from the teacher and student
models, 2) interleaving the execution of the two models at inference time, and
3) incorporating the updated predictions from the teacher model as prior
knowledge to assist the student model. Experiments on pedestrian tracking tasks
are conducted on the MOT17 and MOT15 datasets using two different object
detection backbones YOLOv5 and DLA34 show that AttTrack can significantly
improve student model tracking performance while sacrificing only minor
degradation of tracking speed.
Related papers
- Progressive Representation Learning for Real-Time UAV Tracking [20.76053366492599]
This work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL-Track.
For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information.
For fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations.
arXiv Detail & Related papers (2024-09-25T06:16:32Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - InterTrack: Interaction Transformer for 3D Multi-Object Tracking [9.283656931246645]
3D multi-object tracking (MOT) is a key problem for autonomous vehicles.
Our proposed solution, InterTrack, generates discriminative object representations for data association.
We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements.
arXiv Detail & Related papers (2022-08-17T03:24:36Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous
Driving [22.693895321632507]
We propose a probabilistic, multi-modal, multi-object tracking system consisting of different trainable modules.
We show that our method outperforms current state-of-the-art on the NuScenes Tracking dataset.
arXiv Detail & Related papers (2020-12-26T15:00:54Z) - TRAT: Tracking by Attention Using Spatio-Temporal Features [14.520067060603209]
We propose a two-stream deep neural network tracker that uses both spatial and temporal features.
Our architecture is developed over ATOM tracker and contains two backbones: (i) 2D-CNN network to capture appearance features and (ii) 3D-CNN network to capture motion features.
arXiv Detail & Related papers (2020-11-18T20:11:12Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.