A Simple Baseline for Pose Tracking in Videos of Crowded Scenes
- URL: http://arxiv.org/abs/2010.10007v2
- Date: Wed, 21 Oct 2020 03:37:18 GMT
- Title: A Simple Baseline for Pose Tracking in Videos of Crowded Scenes
- Authors: Li Yuan, Shuning Chang, Ziyuan Huang, Yichen Zhou, Yunpeng Chen,
Xuecheng Nie, Francis E.H. Tay, Jiashi Feng, Shuicheng Yan
- Abstract summary: How to track the human pose in crowded and complex environments has not been well addressed.
We use a multi-object tracking method to assign human ID to each bounding box generated by the detection model.
At last, optical flow is used to take advantage of the temporal information in the videos and generate the final pose tracking result.
- Score: 130.84731947842664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our solution to ACM MM challenge: Large-scale
Human-centric Video Analysis in Complex Events\cite{lin2020human};
specifically, here we focus on Track3: Crowd Pose Tracking in Complex Events.
Remarkable progress has been made in multi-pose training in recent years.
However, how to track the human pose in crowded and complex environments has
not been well addressed. We formulate the problem as several subproblems to be
solved. First, we use a multi-object tracking method to assign human ID to each
bounding box generated by the detection model. After that, a pose is generated
to each bounding box with ID. At last, optical flow is used to take advantage
of the temporal information in the videos and generate the final pose tracking
result.
Related papers
- Reconstructing Close Human Interactions from Multiple Views [38.924950289788804]
This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras.
We introduce a novel system to address these challenges.
Our system integrates a learning-based pose estimation component and its corresponding training and inference strategies.
arXiv Detail & Related papers (2024-01-29T14:08:02Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras.
Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z) - PoseTrackReID: Dataset Description [97.7241689753353]
Pose information is helpful to disentangle useful feature information from background or occlusion noise.
With PoseTrackReID, we want to bridge the gap between person re-ID and multi-person pose tracking.
This dataset provides a good benchmark for current state-of-the-art methods on multi-frame person re-ID.
arXiv Detail & Related papers (2020-11-12T07:44:25Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - Human in Events: A Large-Scale Benchmark for Human-centric Video
Analysis in Complex Events [106.19047816743988]
We present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve.
It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time.
Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation.
arXiv Detail & Related papers (2020-05-09T18:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.