Multi-view Tracking Using Weakly Supervised Human Motion Prediction
- URL: http://arxiv.org/abs/2210.10771v1
- Date: Wed, 19 Oct 2022 17:58:23 GMT
- Title: Multi-view Tracking Using Weakly Supervised Human Motion Prediction
- Authors: Martin Engilberge, Weizhe Liu, Pascal Fua
- Abstract summary: We argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these.
This enables to enforce consistency both over time and across views of a single temporal frame.
We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
- Score: 60.972708589814125
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-view approaches to people-tracking have the potential to better handle
occlusions than single-view ones in crowded scenes. They often rely on the
tracking-by-detection paradigm, which involves detecting people first and then
connecting the detections. In this paper, we argue that an even more effective
approach is to predict people motion over time and infer people's presence in
individual frames from these. This enables to enforce consistency both over
time and across views of a single temporal frame. We validate our approach on
the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms
state-of-the-art methods.
Related papers
- No Identity, no problem: Motion through detection for people tracking [48.708733485434394]
We propose exploiting motion clues while providing supervision only for the detections.
Our algorithm predicts detection heatmaps at two different times, along with a 2D motion estimate between the two images.
We show that our approach delivers state-of-the-art results for single- and multi-view multi-target tracking on the MOT17 and WILDTRACK datasets.
arXiv Detail & Related papers (2024-11-25T15:13:17Z) - Tracking Virtual Meetings in the Wild: Re-identification in Multi-Participant Virtual Meetings [0.0]
We introduce a novel approach to track and re-identify participants in remote video meetings.
Our approach reduces the error rate by 95% on average compared to YOLO-based tracking methods as a baseline.
arXiv Detail & Related papers (2024-09-15T19:37:37Z) - EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View [6.093524345727119]
We show that early-fusion in the Bird's Eye View achieves high accuracy for both detection and tracking.
EarlyBird outperforms the state-of-the-art methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and +5.6 IDF1.
arXiv Detail & Related papers (2023-10-20T08:27:21Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Rank-based verification for long-term face tracking in crowded scenes [0.0]
We present a long-term, multi-face tracking architecture conceived for working in crowded contexts.
Our system benefits from advances in the fields of face detection and face recognition to achieve long-term tracking.
arXiv Detail & Related papers (2021-07-28T11:15:04Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Unsupervised Learning on Monocular Videos for 3D Human Pose Estimation [121.5383855764944]
We use contrastive self-supervised learning to extract rich latent vectors from single-view videos.
We show that applying CSS only to the time-variant features, while also reconstructing the input and encouraging a gradual transition between nearby and away features, yields a rich latent space.
Our approach outperforms other unsupervised single-view methods and matches the performance of multi-view techniques.
arXiv Detail & Related papers (2020-12-02T20:27:35Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z) - Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted
Multicuts [11.72025865314187]
We present an unsupervised multiple object tracking approach based on minimum visual features and lifted multicuts.
We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
arXiv Detail & Related papers (2020-02-04T09:42:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.