Unifying Short and Long-Term Tracking with Graph Hierarchies
- URL: http://arxiv.org/abs/2212.03038v2
- Date: Thu, 30 Mar 2023 13:47:25 GMT
- Title: Unifying Short and Long-Term Tracking with Graph Hierarchies
- Authors: Orcun Cetintas, Guillem Bras\'o, Laura Leal-Taix\'e
- Abstract summary: We introduce SUSHI, a unified and scalable multi-object tracker.
Our approach processes long clips by splitting them into a hierarchy of subclips, which enables high scalability.
We leverage graph neural networks to process all levels of the hierarchy, which makes our model unified across temporal scales and highly general.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tracking objects over long videos effectively means solving a spectrum of
problems, from short-term association for un-occluded objects to long-term
association for objects that are occluded and then reappear in the scene.
Methods tackling these two tasks are often disjoint and crafted for specific
scenarios, and top-performing approaches are often a mix of techniques, which
yields engineering-heavy solutions that lack generality. In this work, we
question the need for hybrid approaches and introduce SUSHI, a unified and
scalable multi-object tracker. Our approach processes long clips by splitting
them into a hierarchy of subclips, which enables high scalability. We leverage
graph neural networks to process all levels of the hierarchy, which makes our
model unified across temporal scales and highly general. As a result, we obtain
significant improvements over state-of-the-art on four diverse datasets. Our
code and models are available at bit.ly/sushi-mot.
Related papers
- Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering [2.763111962660262]
We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences.
We propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters.
Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other.
arXiv Detail & Related papers (2024-04-12T10:04:03Z) - Multi-Scene Generalized Trajectory Global Graph Solver with Composite
Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association.
In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction.
Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z) - Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking [51.16677396148247]
Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames.
In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues.
Our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack.
arXiv Detail & Related papers (2023-08-01T18:53:24Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - A Novel Long-term Iterative Mining Scheme for Video Salient Object
Detection [54.53335983750033]
Short-term methodology conflicts with the real mechanism of our visual system.
This paper proposes a novel VSOD approach, which performs VSOD in a complete long-term way.
The proposed approach outperforms almost all SOTA models on five widely used benchmark datasets.
arXiv Detail & Related papers (2022-06-20T04:27:47Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation [0.5591659577198183]
We propose a strategy that treats the temporal identification task as a heterogeneous-temporal clustering problem.
We use a convolutional and fully connected autoencoder to learn discriminative features from segmentation masks and detection bounding boxes.
Our results show that our technique outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-07-14T16:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.