W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration
- URL: http://arxiv.org/abs/2311.17460v6
- Date: Mon, 9 Sep 2024 07:19:08 GMT
- Title: W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration
- Authors: Wei Yao, Hongwen Zhang, Yunlian Sun, Yebin Liu, Jinhui Tang,
- Abstract summary: Previous methods for 3D motion recovery from monocular images often fall short due to reliance on camera coordinates.
We introduce W-HMR, a weak-supervised calibration method that predicts "reasonable" focal lengths based on body distortion information.
We also present the OrientCorrect module, which corrects body orientation for plausible reconstructions in world space.
- Score: 57.37135310143126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous methods for 3D human motion recovery from monocular images often fall short due to reliance on camera coordinates, leading to inaccuracies in real-world applications. The limited availability and diversity of focal length labels further exacerbate misalignment issues in reconstructed 3D human bodies. To address these challenges, we introduce W-HMR, a weak-supervised calibration method that predicts "reasonable" focal lengths based on body distortion information, eliminating the need for precise focal length labels. Our approach enhances 2D supervision precision and recovery accuracy. Additionally, we present the OrientCorrect module, which corrects body orientation for plausible reconstructions in world space, avoiding the error accumulation associated with inaccurate camera rotation predictions. Our contributions include a novel weak-supervised camera calibration technique, an effective orientation correction module, and a decoupling strategy that significantly improves the generalizability and accuracy of human motion recovery in both camera and world coordinates. The robustness of W-HMR is validated through extensive experiments on various datasets, showcasing its superiority over existing methods. Codes and demos have been made available on the project page https://yw0208.github.io/w-hmr/.
Related papers
- DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery [2.1653492349540784]
DeforHMR is a novel regression-based monocular HMR framework designed to enhance the prediction of human pose parameters.
DeforHMR leverages a novel query-agnostic deformable cross-attention mechanism within the transformer decoder.
It achieves state-of-the-art performance for single-frame regression-based methods on the widely used 3D HMR benchmarks 3DPW and RICH.
arXiv Detail & Related papers (2024-11-18T00:46:59Z) - World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system.
The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame.
Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z) - Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection [10.782354892545651]
We present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos.
We reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features.
Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates.
arXiv Detail & Related papers (2024-07-23T18:41:16Z) - OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration [32.69343215997592]
This paper presents a novel framework that utilizes prior knowledge from human mesh recovery (HMR) models to directly calibrate the unknown scale factor.
Our method sets a new standard for global human mesh estimation tasks, reducing global human motion error by 60% over the prior SOTA.
arXiv Detail & Related papers (2024-06-30T03:31:21Z) - P2O-Calib: Camera-LiDAR Calibration Using Point-Pair Spatial Occlusion
Relationship [1.6921147361216515]
We propose a novel target-less calibration approach based on the 2D-3D edge point extraction using the occlusion relationship in 3D space.
Our method achieves low error and high robustness that can contribute to the practical applications relying on high-quality Camera-LiDAR calibration.
arXiv Detail & Related papers (2023-11-04T14:32:55Z) - Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images.
We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body.
We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.