Unveiling the Power of Self-supervision for Multi-view Multi-human
Association and Tracking
- URL: http://arxiv.org/abs/2401.17617v1
- Date: Wed, 31 Jan 2024 06:12:28 GMT
- Title: Unveiling the Power of Self-supervision for Multi-view Multi-human
Association and Tracking
- Authors: Wei Feng, Feifan Wang, Ruize Han, Zekun Qian and Song Wang
- Abstract summary: Multi-view multi-human association and tracking (MvMHAT) is a new but important problem for multi-person scene video surveillance.
We tackle this problem with a self-supervised learning aware end-to-end network.
We build two new large-scale benchmarks for the network training and testing of different algorithms.
- Score: 22.243799150495487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view multi-human association and tracking (MvMHAT), is a new but
important problem for multi-person scene video surveillance, aiming to track a
group of people over time in each view, as well as to identify the same person
across different views at the same time, which is different from previous MOT
and multi-camera MOT tasks only considering the over-time human tracking. This
way, the videos for MvMHAT require more complex annotations while containing
more information for self learning. In this work, we tackle this problem with a
self-supervised learning aware end-to-end network. Specifically, we propose to
take advantage of the spatial-temporal self-consistency rationale by
considering three properties of reflexivity, symmetry and transitivity. Besides
the reflexivity property that naturally holds, we design the self-supervised
learning losses based on the properties of symmetry and transitivity, for both
appearance feature learning and assignment matrix optimization, to associate
the multiple humans over time and across views. Furthermore, to promote the
research on MvMHAT, we build two new large-scale benchmarks for the network
training and testing of different algorithms. Extensive experiments on the
proposed benchmarks verify the effectiveness of our method. We have released
the benchmark and code to the public.
Related papers
- Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition [6.995226697189459]
We employ a multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data.
Our results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks.
We release our pre-trained models as well as source code publicly.
arXiv Detail & Related papers (2024-04-16T20:51:36Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Multi-view Tracking Using Weakly Supervised Human Motion Prediction [60.972708589814125]
We argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these.
This enables to enforce consistency both over time and across views of a single temporal frame.
We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T17:58:23Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Multi-View representation learning in Multi-Task Scene [4.509968166110557]
We propose a novel semi-supervised algorithm, termed as Multi-Task Multi-View learning based on Common and Special Features (MTMVCSF)
An anti-noise multi-task multi-view algorithm called AN-MTMVCSF is proposed, which has a strong adaptability to noise labels.
The effectiveness of these algorithms is proved by a series of well-designed experiments on both real world and synthetic data.
arXiv Detail & Related papers (2022-01-15T11:26:28Z) - Multi-target tracking for video surveillance using deep affinity
network: a brief review [0.0]
Multi-target tracking (MTT) for video surveillance is one of the important and challenging tasks.
Deep learning models are known to function like the human brain.
arXiv Detail & Related papers (2021-10-29T10:44:26Z) - Multi-object tracking with self-supervised associating network [5.947279761429668]
We propose a novel self-supervised learning method using a lot of short videos which has no human labeling.
Despite the re-identification network is trained in a self-supervised manner, it achieves the state-of-the-art performance of MOTA 62.0% and IDF1 62.6% on the MOT17 test benchmark.
arXiv Detail & Related papers (2020-10-26T08:48:23Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.