Towards Discriminative Representation: Multi-view Trajectory Contrastive
Learning for Online Multi-object Tracking
- URL: http://arxiv.org/abs/2203.14208v1
- Date: Sun, 27 Mar 2022 04:53:31 GMT
- Title: Towards Discriminative Representation: Multi-view Trajectory Contrastive
Learning for Online Multi-object Tracking
- Authors: En Yu, Zhuoling Li, Shoudong Han
- Abstract summary: We propose a strategy, namely multi-view trajectory contrastive learning, in which each trajectory is represented as a center vector.
In the inference stage, a similarity-guided feature fusion strategy is developed for further boosting the quality of the trajectory representation.
Our method has surpassed preceding trackers and established new state-of-the-art performance.
- Score: 1.0474108328884806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discriminative representation is crucial for the association step in
multi-object tracking. Recent work mainly utilizes features in single or
neighboring frames for constructing metric loss and empowering networks to
extract representation of targets. Although this strategy is effective, it
fails to fully exploit the information contained in a whole trajectory. To this
end, we propose a strategy, namely multi-view trajectory contrastive learning,
in which each trajectory is represented as a center vector. By maintaining all
the vectors in a dynamically updated memory bank, a trajectory-level
contrastive loss is devised to explore the inter-frame information in the whole
trajectories. Besides, in this strategy, each target is represented as multiple
adaptively selected keypoints rather than a pre-defined anchor or center. This
design allows the network to generate richer representation from multiple views
of the same target, which can better characterize occluded objects.
Additionally, in the inference stage, a similarity-guided feature fusion
strategy is developed for further boosting the quality of the trajectory
representation. Extensive experiments have been conducted on MOTChallenge to
verify the effectiveness of the proposed techniques. The experimental results
indicate that our method has surpassed preceding trackers and established new
state-of-the-art performance.
Related papers
- Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models [27.316692263196277]
MVTraj is a novel multi-view modeling method for trajectory representation learning.
It integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data.
Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views.
arXiv Detail & Related papers (2024-10-17T03:56:12Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - Masked Momentum Contrastive Learning for Zero-shot Semantic
Understanding [39.424931953675994]
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data.
This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks.
arXiv Detail & Related papers (2023-08-22T13:55:57Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Dynamic Attention guided Multi-Trajectory Analysis for Single Object
Tracking [62.13213518417047]
We propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy.
In particular, we construct dynamic appearance model that contains multiple target templates, each of which provides its own attention for locating the target in the new frame.
After spanning the whole sequence, we introduce a multi-trajectory selection network to find the best trajectory that delivers improved tracking performance.
arXiv Detail & Related papers (2021-03-30T05:36:31Z) - Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network.
Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance.
Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z) - Centralized Information Interaction for Salient Object Detection [68.8587064889475]
The U-shape structure has shown its advantage in salient object detection for efficiently combining multi-scale features.
This paper shows that by centralizing these connections, we can achieve the cross-scale information interaction among them.
Our approach can cooperate with various existing U-shape-based salient object detection methods by substituting the connections between the bottom-up and top-down pathways.
arXiv Detail & Related papers (2020-12-21T12:42:06Z) - Visual Object Tracking by Segmentation with Graph Convolutional Network [7.729569666460712]
We propose to utilize graph convolutional network (GCN) model for superpixel based object tracking.
The proposed model provides a general end-to-end framework which integrates i) label linear prediction, and ii) structure-aware feature information of each superpixel together.
arXiv Detail & Related papers (2020-09-05T12:43:21Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.