MeMOT: Multi-Object Tracking with Memory
- URL: http://arxiv.org/abs/2203.16761v1
- Date: Thu, 31 Mar 2022 02:33:20 GMT
- Title: MeMOT: Multi-Object Tracking with Memory
- Authors: Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu,
Stefano Soatto
- Abstract summary: Our model, called MeMOT, consists of three main modules that are all Transformer-based.
MeMOT observes very competitive performance on widely adopted MOT datasets.
- Score: 97.48960039220823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an online tracking algorithm that performs the object detection
and data association under a common framework, capable of linking objects after
a long time span. This is realized by preserving a large spatio-temporal memory
to store the identity embeddings of the tracked objects, and by adaptively
referencing and aggregating useful information from the memory as needed. Our
model, called MeMOT, consists of three main modules that are all
Transformer-based: 1) Hypothesis Generation that produce object proposals in
the current video frame; 2) Memory Encoding that extracts the core information
from the memory for each tracked object; and 3) Memory Decoding that solves the
object detection and data association tasks simultaneously for multi-object
tracking. When evaluated on widely adopted MOT benchmark datasets, MeMOT
observes very competitive performance.
Related papers
- Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation [28.16053631036079]
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to locate an arbitrary number of target objects in a video.
We introduce a compact Transformer-based method, termed TenRMOT, to exploit the advantages of Transformer architecture.
TenRMOT demonstrates superior performance on both the referring multi-object tracking and the segmentation tasks.
arXiv Detail & Related papers (2024-10-17T11:07:05Z) - TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking [6.91631684487121]
Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences.
We propose a novel memory-based approach that selectively stores critical features based on object motion and overlapping awareness.
Our approach significantly improves over MOTRv2 in the DanceTrack test set, demonstrating a gain of 2.0% AssA score and 2.1% in IDF1 score.
arXiv Detail & Related papers (2024-07-05T07:55:19Z) - Transformer Network for Multi-Person Tracking and Re-Identification in
Unconstrained Environment [0.6798775532273751]
Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics.
We put forward an integrated MOT method that marries object detection and identity linkage within a singular, end-to-end trainable framework.
Our system leverages a robust memory-temporal memory module that retains extensive historical observations and effectively encodes them using an attention-based aggregator.
arXiv Detail & Related papers (2023-12-19T08:15:22Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using
Spatial and Temporal Transformers [20.806348407522083]
MO3TR is an end-to-end online multi-object tracking framework.
It encodes object interactions into long-term temporal embeddings.
It tracks initiation and termination without the need for an explicit data association module.
arXiv Detail & Related papers (2021-03-27T07:23:38Z) - Memorizing Comprehensively to Learn Adaptively: Unsupervised
Cross-Domain Person Re-ID with Multi-level Memory [89.43986007948772]
We propose a novel multi-level memory network (MMN) to discover multi-level complementary information in the target domain.
Unlike the simple memory in previous works, we propose a novel multi-level memory network (MMN) to discover multi-level complementary information in the target domain.
arXiv Detail & Related papers (2020-01-13T09:48:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.