BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall
Representations
- URL: http://arxiv.org/abs/2112.10716v1
- Date: Mon, 20 Dec 2021 18:07:09 GMT
- Title: BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall
Representations
- Authors: Bruno Artacho, Andreas Savakis
- Abstract summary: BAPose is a novel framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose BAPose, a novel bottom-up approach that achieves state-of-the-art
results for multi-person pose estimation. Our end-to-end trainable framework
leverages a disentangled multi-scale waterfall architecture and incorporates
adaptive convolutions to infer keypoints more precisely in crowded scenes with
occlusions. The multi-scale representations, obtained by the disentangled
waterfall module in BAPose, leverage the efficiency of progressive filtering in
the cascade architecture, while maintaining multi-scale fields-of-view
comparable to spatial pyramid configurations. Our results on the challenging
COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust
framework for multi-person pose estimation, achieving significant improvements
on state-of-the-art accuracy.
Related papers
- BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation [2.1028463367241033]
We introduce incremental pose estimation to enhance the accuracy of pose estimations, resulting in significant improvements across all depth metrics.
Our final depth network achieves state-of-the-art performance on KITTI and SYNS-patches datasets without increasing computational complexity at test time.
arXiv Detail & Related papers (2024-07-29T22:05:13Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Comparison of Model-Free and Model-Based Learning-Informed Planning for
PointGoal Navigation [10.797100163772482]
We compare state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem.
We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.
arXiv Detail & Related papers (2022-12-17T05:23:54Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments [45.89629401768049]
Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments.
However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices.
We propose a novel framework-IndoorMono++ to improve the performance of self-supervised monocular depth estimation for indoor environments.
arXiv Detail & Related papers (2022-07-18T21:34:43Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation [3.8073142980733]
We propose a single-pass, end-to-end trainable framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on multiple datasets demonstrate that OmniPose is a robust and efficient architecture for multi-person pose estimation.
arXiv Detail & Related papers (2021-03-18T11:30:31Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z) - UniPose: Unified Human Pose Estimation in Single Images and Videos [3.04585143845864]
We propose a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture.
UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage.
Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation.
arXiv Detail & Related papers (2020-01-22T15:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.