BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall
Representations
- URL: http://arxiv.org/abs/2112.10716v1
- Date: Mon, 20 Dec 2021 18:07:09 GMT
- Title: BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall
Representations
- Authors: Bruno Artacho, Andreas Savakis
- Abstract summary: BAPose is a novel framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose BAPose, a novel bottom-up approach that achieves state-of-the-art
results for multi-person pose estimation. Our end-to-end trainable framework
leverages a disentangled multi-scale waterfall architecture and incorporates
adaptive convolutions to infer keypoints more precisely in crowded scenes with
occlusions. The multi-scale representations, obtained by the disentangled
waterfall module in BAPose, leverage the efficiency of progressive filtering in
the cascade architecture, while maintaining multi-scale fields-of-view
comparable to spatial pyramid configurations. Our results on the challenging
COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust
framework for multi-person pose estimation, achieving significant improvements
on state-of-the-art accuracy.
Related papers
- UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation.
We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation [2.1028463367241033]
We introduce incremental pose estimation to enhance the accuracy of pose estimations, resulting in significant improvements across all depth metrics.
Our final depth network achieves state-of-the-art performance on KITTI and SYNS-patches datasets without increasing computational complexity at test time.
arXiv Detail & Related papers (2024-07-29T22:05:13Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Comparison of Model-Free and Model-Based Learning-Informed Planning for
PointGoal Navigation [10.797100163772482]
We compare state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem.
We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.
arXiv Detail & Related papers (2022-12-17T05:23:54Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments [45.89629401768049]
Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments.
However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices.
We propose a novel framework-IndoorMono++ to improve the performance of self-supervised monocular depth estimation for indoor environments.
arXiv Detail & Related papers (2022-07-18T21:34:43Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation [3.8073142980733]
We propose a single-pass, end-to-end trainable framework that achieves state-of-the-art results for multi-person pose estimation.
Our results on multiple datasets demonstrate that OmniPose is a robust and efficient architecture for multi-person pose estimation.
arXiv Detail & Related papers (2021-03-18T11:30:31Z) - UniPose: Unified Human Pose Estimation in Single Images and Videos [3.04585143845864]
We propose a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture.
UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage.
Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation.
arXiv Detail & Related papers (2020-01-22T15:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.