OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation
- URL: http://arxiv.org/abs/2103.10180v1
- Date: Thu, 18 Mar 2021 11:30:31 GMT
- Title: OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation
- Authors: Bruno Artacho and Andreas Savakis
- Abstract summary: We propose a single-pass, end-to-end trainable framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on multiple datasets demonstrate that OmniPose is a robust and efficient architecture for multi-person pose estimation.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose OmniPose, a single-pass, end-to-end trainable framework, that
achieves state-of-the-art results for multi-person pose estimation. Using a
novel waterfall module, the OmniPose architecture leverages multi-scale feature
representations that increase the effectiveness of backbone feature extractors,
without the need for post-processing. OmniPose incorporates contextual
information across scales and joint localization with Gaussian heatmap
modulation at the multi-scale feature extractor to estimate human pose with
state-of-the-art accuracy. The multi-scale representations, obtained by the
improved waterfall module in OmniPose, leverage the efficiency of progressive
filtering in the cascade architecture, while maintaining multi-scale
fields-of-view comparable to spatial pyramid configurations. Our results on
multiple datasets demonstrate that OmniPose, with an improved HRNet backbone
and waterfall module, is a robust and efficient architecture for multi-person
pose estimation that achieves state-of-the-art results.
Related papers
- SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall
Representations [3.8073142980733]
BAPose is a novel framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework.
arXiv Detail & Related papers (2021-12-20T18:07:09Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z) - UniPose: Unified Human Pose Estimation in Single Images and Videos [3.04585143845864]
We propose a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture.
UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage.
Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation.
arXiv Detail & Related papers (2020-01-22T15:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.