TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training
Model
- URL: http://arxiv.org/abs/2006.05683v1
- Date: Wed, 10 Jun 2020 06:45:05 GMT
- Title: TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training
Model
- Authors: Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, Cewu Lu
- Abstract summary: Multi-object tracking is a fundamental vision problem that has been studied for a long time.
Despite the success of Tracking by Detection (TBD), this two-step method is too complicated to train in an end-to-end manner.
We propose a concise end-to-end model TubeTK which only needs one step training by introducing the bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
- Score: 51.14840210957289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-object tracking is a fundamental vision problem that has been studied
for a long time. As deep learning brings excellent performances to object
detection algorithms, Tracking by Detection (TBD) has become the mainstream
tracking framework. Despite the success of TBD, this two-step method is too
complicated to train in an end-to-end manner and induces many challenges as
well, such as insufficient exploration of video spatial-temporal information,
vulnerability when facing object occlusion, and excessive reliance on detection
results. To address these challenges, we propose a concise end-to-end model
TubeTK which only needs one step training by introducing the ``bounding-tube"
to indicate temporal-spatial locations of objects in a short video clip. TubeTK
provides a novel direction of multi-object tracking, and we demonstrate its
potential to solve the above challenges without bells and whistles. We analyze
the performance of TubeTK on several MOT benchmarks and provide empirical
evidence to show that TubeTK has the ability to overcome occlusions to some
extent without any ancillary technologies like Re-ID. Compared with other
methods that adopt private detection results, our one-stage end-to-end model
achieves state-of-the-art performances even if it adopts no ready-made
detection results. We hope that the proposed TubeTK model can serve as a simple
but strong alternative for video-based MOT task. The code and models are
available at https://github.com/BoPang1996/TubeTK.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - MAML MOT: Multiple Object Tracking based on Meta-Learning [7.892321926673001]
MAML MOT is a meta-learning-based training approach for multi-object tracking.
We introduce MAML MOT, a meta-learning-based training approach for multi-object tracking.
arXiv Detail & Related papers (2024-05-12T12:38:40Z) - Bridging Images and Videos: A Simple Learning Framework for Large
Vocabulary Video Object Detection [110.08925274049409]
We present a simple but effective learning framework that takes full advantage of all available training data to learn detection and tracking.
We show that consistent improvements of various large vocabulary trackers are capable, setting strong baseline results on the challenging TAO benchmarks.
arXiv Detail & Related papers (2022-12-20T10:33:03Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple
Object Tracking [73.52284039530261]
We present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.
We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association.
We show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input.
arXiv Detail & Related papers (2022-10-12T15:47:36Z) - Multimodal Channel-Mixing: Channel and Spatial Masked AutoEncoder on
Facial Action Unit Detection [12.509298933267225]
This paper presents a novel multi-modal reconstruction network, named Multimodal Channel-Mixing (MCM) as a pre-trained model to learn robust representation for facilitating multi-modal fusion.
The approach follows an early fusion setup, integrating a Channel-Mixing module, where two out of five channels are randomly dropped.
This module not only reduces channel redundancy, but also facilitates multi-modal learning and reconstruction capabilities, resulting in robust feature learning.
arXiv Detail & Related papers (2022-09-25T15:18:56Z) - Semi-TCL: Semi-Supervised Track Contrastive Representation Learning [40.31083437957288]
We design a new instance-to-track matching objective to learn appearance embedding.
It compares a candidate detection to the embedding of the tracks persisted in the tracker.
We implement this learning objective in a unified form following the spirit of constrastive loss.
arXiv Detail & Related papers (2021-07-06T05:23:30Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Probabilistic Tracklet Scoring and Inpainting for Multiple Object
Tracking [83.75789829291475]
We introduce a probabilistic autoregressive motion model to score tracklet proposals.
This is achieved by training our model to learn the underlying distribution of natural tracklets.
Our experiments demonstrate the superiority of our approach at tracking objects in challenging sequences.
arXiv Detail & Related papers (2020-12-03T23:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.