Related papers: BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations

BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations

URL: http://arxiv.org/abs/2112.10716v1
Date: Mon, 20 Dec 2021 18:07:09 GMT
Title: BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations
Authors: Bruno Artacho, Andreas Savakis
Abstract summary: BAPose is a novel framework that achieves state-of-the-art results for multi-person pose estimation. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework.
Score: 3.8073142980733
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multi-scale representations, obtained by the disentangled waterfall module in BAPose, leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, achieving significant improvements on state-of-the-art accuracy.

Related papers

Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion [43.59385149982744]
Single-frame pose estimation has seen significant progress, but it often fails to capture the temporal dynamics for understanding complex, continuous movements. We propose Poseidon, a novel multi-frame pose estimation architecture that extends the ViTPose model by integrating temporal information. Our approach achieves state-of-the-art performance on the PoseTrack21 and PoseTrack18 datasets, achieving mAP scores of 88.3 and 87.8, respectively.
arXiv Detail & Related papers (2025-01-14T21:34:34Z)
Waterfall Transformer for Multi-person Pose Estimation [3.2771631221674333]
We propose the Waterfall Transformer architecture for Pose estimation (WTPose) WTPose is a single-pass, end-to-end trainable framework designed for multi-person pose estimation.
arXiv Detail & Related papers (2024-11-28T06:24:40Z)
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation. Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation. We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z)
DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation. We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features. We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z)
BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation [2.1028463367241033]
We introduce incremental pose estimation to enhance the accuracy of pose estimations, resulting in significant improvements across all depth metrics. Our final depth network achieves state-of-the-art performance on KITTI and SYNS-patches datasets without increasing computational complexity at test time.
arXiv Detail & Related papers (2024-07-29T22:05:13Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation [10.797100163772482]
We compare state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.
arXiv Detail & Related papers (2022-12-17T05:23:54Z)
AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method. With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose. We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z)
MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments [45.89629401768049]
Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments. However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices. We propose a novel framework-IndoorMono++ to improve the performance of self-supervised monocular depth estimation for indoor environments.
arXiv Detail & Related papers (2022-07-18T21:34:43Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation [3.8073142980733]
We propose a single-pass, end-to-end trainable framework that achieves state-of-the-art results for multi-person pose estimation. Our results on multiple datasets demonstrate that OmniPose is a robust and efficient architecture for multi-person pose estimation.
arXiv Detail & Related papers (2021-03-18T11:30:31Z)
Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
UniPose: Unified Human Pose Estimation in Single Images and Videos [3.04585143845864]
We propose a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation.
arXiv Detail & Related papers (2020-01-22T15:59:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.