Multi-Object Tracking and Segmentation with a Space-Time Memory Network
- URL: http://arxiv.org/abs/2110.11284v2
- Date: Tue, 16 May 2023 01:16:56 GMT
- Title: Multi-Object Tracking and Segmentation with a Space-Time Memory Network
- Authors: Mehdi Miah, Guillaume-Alexandre Bilodeau and Nicolas Saunier
- Abstract summary: We propose a method for multi-object tracking and segmentation based on a novel memory-based mechanism to associate tracklets.
The proposed tracker, MeNToS, addresses particularly the long-term data association problem.
- Score: 12.043574473965318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for multi-object tracking and segmentation based on a
novel memory-based mechanism to associate tracklets. The proposed tracker,
MeNToS, addresses particularly the long-term data association problem, when
objects are not observable for long time intervals. Indeed, the recently
introduced HOTA metric (High Order Tracking Accuracy), which has a better
alignment than the formerly established MOTA (Multiple Object Tracking
Accuracy) with the human visual assessment of tracking, has shown that
improvements are still needed for data association, despite the recent
improvement in object detection. In MeNToS, after creating tracklets using
instance segmentation and optical flow, the proposed method relies on a
space-time memory network originally developed for one-shot video object
segmentation to improve the association of sequence of detections (tracklets)
with temporal gaps. We evaluate our tracker on KITTIMOTS and MOTSChallenge and
we show the benefit of our data association strategy with the HOTA metric.
Additional ablation studies demonstrate that our approach using a space-time
memory network gives better and more robust long-term association than those
based on a re-identification network. Our project page is at
\url{www.mehdimiah.com/mentos+}.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - Tracking Objects and Activities with Attention for Temporal Sentence
Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment.
We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z) - STURE: Spatial-Temporal Mutual Representation Learning for Robust Data
Association in Online Multi-Object Tracking [7.562844934117318]
The proposed approach is capable of extracting more distinguishing detection and sequence representations.
It is applied to the public MOT challenge benchmarks and performs well compared with various state-of-the-art online MOT trackers.
arXiv Detail & Related papers (2022-01-18T08:52:40Z) - Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.
Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations.
Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - MeNToS: Tracklets Association with a Space-Time Memory Network [12.416351779111864]
The proposed method addresses particularly the data association problem.
MeNToS is the first to use the STM network to track object masks for MOTS.
We took the 4th place in the RobMOTS challenge.
arXiv Detail & Related papers (2021-07-15T01:33:21Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - TDIOT: Target-driven Inference for Deep Video Object Tracking [0.2457872341625575]
In this work, we adopt the pre-trained Mask R-CNN deep object detector as the baseline.
We introduce a novel inference architecture placed on top of FPN-ResNet101 backbone of Mask R-CNN to jointly perform detection and tracking.
The proposed single object tracker, TDIOT, applies an appearance similarity-based temporal matching for data association.
arXiv Detail & Related papers (2021-03-19T20:45:06Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.