Related papers: Two is a crowd: tracking relations in videos

Two is a crowd: tracking relations in videos

URL: http://arxiv.org/abs/2108.05331v1
Date: Wed, 11 Aug 2021 17:19:34 GMT
Title: Two is a crowd: tracking relations in videos
Authors: Artem Moskalev, Ivan Sosnovik, Arnold Smeulders
Abstract summary: We propose a plug-in Relation Module (REM) to extend current state-of-the-art trackers. REM encodes relations between tracked objects by running a message passing over a corresponding graph-temporal embeddings for the tracked objects. REM allows tracking severely or even fully occluded objects by utilizing relational cues.
Score: 2.1485350418225244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tracking multiple objects individually differs from tracking groups of related objects. When an object is a part of the group, its trajectory depends on the trajectories of the other group members. Most of the current state-of-the-art trackers follow the approach of tracking each object independently, with the mechanism to handle the overlapping trajectories where necessary. Such an approach does not take inter-object relations into account, which may cause unreliable tracking for the members of the groups, especially in crowded scenarios, where individual cues become unreliable due to occlusions. To overcome these limitations and to extend such trackers to crowded scenes, we propose a plug-in Relation Encoding Module (REM). REM encodes relations between tracked objects by running a message passing over a corresponding spatio-temporal graph, computing relation embeddings for the tracked objects. Our experiments on MOT17 and MOT20 demonstrate that the baseline tracker improves its results after a simple extension with REM. The proposed module allows for tracking severely or even fully occluded objects by utilizing relational cues.

Related papers

CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking [34.90147791481045]
SynCL is a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking.<n>We propose a Task-specific Hybrid Matching module for a weight-shared cross-attention-based decoder.<n>We also introduce Instance-aware Contrastive Learning to break through the barrier of self-centric attention for track queries.
arXiv Detail & Related papers (2024-11-11T08:18:49Z)
OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline. Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z)
MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking [56.92165669843006]
We propose MotionTrack, which learns robust short-term and long-term motions in a unified framework to associate trajectories from a short to long range. For dense crowds, we design a novel Interaction Module to learn interaction-aware motions from short-term trajectories, which can estimate the complex movement of each target. For extreme occlusions, we build a novel Refind Module to learn reliable long-term motions from the target's history trajectory, which can link the interrupted trajectory with its corresponding detection.
arXiv Detail & Related papers (2023-03-18T12:38:33Z)
Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching. Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips. The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames. Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z)
Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z)
Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking. Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z)
Discriminative Appearance Modeling with Multi-track Pooling for Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene. Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory. We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z)
TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture. New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time. TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.