Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery
- URL: http://arxiv.org/abs/2403.09063v1
- Date: Thu, 14 Mar 2024 03:07:58 GMT
- Title: Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery
- Authors: Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek,
- Abstract summary: We introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture.
Our approach demonstrates superior performance in handling OOD data in certain scenarios.
- Score: 7.339380415551658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision. Existing works resort to either pose priors or multi-modal data such as multi-view or point cloud information, though their methods often overlook the valuable scene-depth information inherently present in a single image. Moreover, achieving robust HMR for out-of-distribution (OOD) data is exceedingly challenging due to inherent variations in pose, shape and depth. Consequently, understanding the underlying distribution becomes a vital subproblem in modeling human forms. Motivated by the need for unambiguous and robust human modeling, we introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture meticulously designed to minimize the disparity between distributions and incorporate scene-depth leveraging prior depth information. Our approach demonstrates superior performance in handling OOD data in certain scenarios while consistently achieving competitive results against state-of-the-art HMR methods on controlled datasets.
Related papers
- FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation [31.06080108012735]
FiffDepth is a framework that transforms diffusion-based image generators into a feedforward architecture for detailed depth estimation.
It achieves enhanced accuracy, stability, and fine-grained detail, offering a significant improvement in MDE performance across diverse real-world scenarios.
arXiv Detail & Related papers (2024-12-01T04:59:34Z) - DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery.
It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z) - UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z) - Distribution-Aligned Diffusion for Human Mesh Recovery [16.64567393672489]
We propose a diffusion-based approach for human mesh recovery.
We propose a Human Mesh Diffusion (HMDiff) framework which frames mesh recovery as a reverse diffusion process.
Our method achieves state-of-the-art performance on three widely used datasets.
arXiv Detail & Related papers (2023-08-25T13:29:31Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - Self-supervised Human Mesh Recovery with Cross-Representation Alignment [20.69546341109787]
Self-supervised human mesh recovery methods have poor generalizability due to limited availability and diversity of 3D-annotated benchmark datasets.
We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints)
This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: richness from sparse representation and robustness from dense representation.
arXiv Detail & Related papers (2022-09-10T04:47:20Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Unsupervised Domain Adaptation in the Dissimilarity Space for Person
Re-identification [11.045405206338486]
We propose a novel Dissimilarity-based Maximum Mean Discrepancy (D-MMD) loss for aligning pair-wise distances.
Empirical results with three challenging benchmark datasets show that the proposed D-MMD loss decreases as source and domain distributions become more similar.
arXiv Detail & Related papers (2020-07-27T22:10:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.