Towards Accurate Human Pose Estimation in Videos of Crowded Scenes
- URL: http://arxiv.org/abs/2010.10008v2
- Date: Wed, 21 Oct 2020 03:37:40 GMT
- Title: Towards Accurate Human Pose Estimation in Videos of Crowded Scenes
- Authors: Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou,
Yunpeng Chen, Jiashi Feng, Shuicheng Yan
- Abstract summary: We focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.
For one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos.
In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w_AP on test dataset of HIE challenge.
- Score: 134.60638597115872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based human pose estimation in crowded scenes is a challenging problem
due to occlusion, motion blur, scale variation and viewpoint change, etc. Prior
approaches always fail to deal with this problem because of (1) lacking of
usage of temporal information; (2) lacking of training data in crowded scenes.
In this paper, we focus on improving human pose estimation in videos of crowded
scenes from the perspectives of exploiting temporal context and collecting new
data. In particular, we first follow the top-down strategy to detect persons
and perform single-person pose estimation for each frame. Then, we refine the
frame-based pose estimation with temporal contexts deriving from the
optical-flow. Specifically, for one frame, we forward the historical poses from
the previous frames and backward the future poses from the subsequent frames to
current frame, leading to stable and accurate human pose estimation in videos.
In addition, we mine new data of similar scenes to HIE dataset from the
Internet for improving the diversity of training set. In this way, our model
achieves best performance on 7 out of 13 videos and 56.33 average w\_AP on test
dataset of HIE challenge.
Related papers
- Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation [13.40702053084305]
We present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency pose in live stream videos.
A multi-scale convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling.
arXiv Detail & Related papers (2022-07-25T21:21:59Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z) - SMPLy Benchmarking 3D Human Pose Estimation in the Wild [14.323219585166573]
Mannequin Challenge dataset contains in-the-wild videos of people frozen in action like statues.
A total of 24,428 frames with registered body models are then selected from 567 scenes at almost no cost.
We benchmark state-of-the-art SMPL-based human pose estimation methods on this dataset.
arXiv Detail & Related papers (2020-12-04T17:48:32Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - Self-supervised Keypoint Correspondences for Multi-Person Pose
Estimation and Tracking in Videos [32.43899916477434]
We propose an approach that relies on keypoint correspondences for associating persons in videos.
Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation.
Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack $2017$ and PoseTrack $2018$ data sets.
arXiv Detail & Related papers (2020-04-27T09:02:24Z) - Human Motion Transfer from Poses in the Wild [61.6016458288803]
We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
It is a video-to-video translation task in which the estimated poses are used to bridge two domains.
We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
arXiv Detail & Related papers (2020-04-07T05:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.